The Impact of Electronic Publishing: The Future for Publishers and Librarians

David J. Brown Richard Boulderstone The Impact of Electronic Publishing The Future for Publishers and Librarians K · ...

Author: David J. Brown | Richard Boulderstone

23 downloads 699 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

David J. Brown Richard Boulderstone

The Impact of Electronic Publishing The Future for Publishers and Librarians

K · G · Saur München 2008

Bibliographic information published by the Deutsche Nationalibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at http://dnb.d-nb.de. U Printed on acid-free paper © 2008 K . G . Saur Verlag München An Imprint of Walter de Gruyter GmbH & Co. KG All Rights Strictly Reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission in writing from the publisher Typesetting by ptp graphics, Berlin Printed and bound by Strauss GmbH, Mörlenbach, Germany. ISBN 978-3 - 598 -11515-8

Contents

Abstract

xv

Acknowledgements

xix

Chapter 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

1 Background Introduction . . . . . . The Book . . . . . . . . Purpose of the Book . . ‘Change management’ Target Audiences . . . Definitions . . . . . . . Objectives . . . . . . . . Approach adopted . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

1 1 1 2 4 6 6 7 7

Chapter 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13

2 Industry Evolution Tragedy of the Commons . . . . . . . . . Frustration Gap . . . . . . . . . . . . . . . Publisher and Library dissentions . . . . Big Deals . . . . . . . . . . . . . . . . . . . The Tipping Point . . . . . . . . . . . . . . Open Access . . . . . . . . . . . . . . . . . The Long Tail . . . . . . . . . . . . . . . . Disenfranchised Researchers . . . . . . . . Knowledge Workers and the Thinkforce Emergence of Search Engines . . . . . . . Something is Good Enough . . . . . . . . The new market for research material . . Overall Trends . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

11 11 12 13 15 16 17 19 20 23 24 25 26 26

Chapter 3.1 3.2 3.3 3.4 3.5

3 End User Behaviour Change in User Behaviour Who are the users? . . . . Typology of Users . . . . . Information Overload . . . Research Studies . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

29 29 30 31 32 33

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

Phase One

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

vi Contents

3.5.1

. . . . . . . . . . . . . .

33 33 34 35 35 37 37 37 39 43 43 44 44 45

Chapter 4 Measuring the Value of Information 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Peer Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Citation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Document Downloads . . . . . . . . . . . . . . . . . . . . . . COUNTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SUSHI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of Robots . . . . . . . . . . . . . . . . . . . . . . . . . . Case Study – The Mesur Project . . . . . . . . . . . . . . . . 4.1.4 Focus Groups and investigating individual usage patterns 4.1.5 Document Delivery statistics . . . . . . . . . . . . . . . . . . 4.1.6 Questionnaires . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.7 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.8 Scientometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Research Assessment Exercises . . . . . . . . . . . . . . . . . . . . . 4.2.1 The United Kingdom’s RAE . . . . . . . . . . . . . . . . . . 4.2.2 Criticisms of UK’s RAE . . . . . . . . . . . . . . . . . . . . 4.2.3 UUK report looks at the use of bibliometrics . . . . . . . . 4.2.4 Australia’s research assessment exercise (RQF) . . . . . . . 4.3 The Future of BiblioMetrics . . . . . . . . . . . . . . . . . . . . . . .

47 47 47 47 49 50 51 52 52 53 54 55 55 55 56 56 57 58 60 60

3.6 3.7 3.8 3.9

Industry wide Studies . . . . . . . . . . . Tenopir/King research . . . . . . . . . . Collection development . . . . . . . . . . 3.5.2 Library sourced initiatives . . . . . . . . The eJUSt report on E-Journal Users . . Faculty Attitudes at Univ California . . 3.5.3 Publisher commissioned studies . . . . . CIBER Studies . . . . . . . . . . . . . . . Elsevier/Mabe research . . . . . . . . . . Author versus Reader . . . . . . . . . . . . . . . . Digital Natives and the Millennium generation Forecasts . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 The Outsell View . . . . . . . . . . . . . . User perceptions of Value . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

Phase Two Chapter 5.1 5.2 5.3 5.4 5.5

5 Electronic Information Industry Structure How much Information? . . . . . . . . . . . . . . . The Information Industry . . . . . . . . . . . . . . Corporate Size . . . . . . . . . . . . . . . . . . . . . The Scientific, Technical and Medical Information Challenges facing the information industry . . . .

. . . . .

65 65 66 66 67 70

Chapter 6 The Key Players 6.1 Industry Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Overall Scholarly Trends . . . . . . . . . . . . . . . . . . . . 6.2 Structure of the Journal Publishing System . . . . . . . . . . . . . .

73 73 73 73

. . . . . . . . . . . . sector . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Contents vii

6.3 6.4

6.5

6.6

6.7

Market Estimates . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 STM Publishers . . . . . . . . . . . . . . . . . . . . Key Stakeholders . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Publishers and Information Providers . . . . . . Leading Publishers . . . . . . . . . . . . . . . . . . Learned Society Publishers . . . . . . . . . . . . . University Presses . . . . . . . . . . . . . . . . . . 6.4.2 Life Cycle of scholarly communication . . . . . . The Future of the Big Deal . . . . . . . . . . . . . Trend towards Open Access . . . . . . . . . . . . Versioning . . . . . . . . . . . . . . . . . . . . . . . Other Challenges facing Publishing . . . . . . . . Hybrid Journals and document delivery . . . . . Refereeing . . . . . . . . . . . . . . . . . . . . . . . Peer Review in scholarly journals . . . . . . . Alternative review procedures . . . . . . . . . Publishers and the ‘Valley of Death’ . . . . . . . The Prisoner’s Dilemma . . . . . . . . . . . . . . . Future of Publishing . . . . . . . . . . . . . . . . . Research Libraries . . . . . . . . . . . . . . . . . . . . . . . ARL statistics . . . . . . . . . . . . . . . . . . . . . UK university library expenditure . . . . . . . . Librarian relationship to their customers (users) 6.5.1 The European Digital Libraries . . . . . . . . . . 6.5.2 The Future of the Librarian . . . . . . . . . . . . . 6.5.3 Understanding the new user . . . . . . . . . . . . Other Stakeholders . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Collaboratories . . . . . . . . . . . . . . . . . . . . 6.6.2 Funding Agencies . . . . . . . . . . . . . . . . . . Government involvement . . . . . . . . . . . . . . . . . . 6.7.1 Case Study . . . . . . . . . . . . . . . . . . . . . . Joint Information Services Committee . . . . . . 6.7.2 National Priorities on toll free or toll paid . . . . 6.7.3 Emerging Competition . . . . . . . . . . . . . . .

Chapter 7 Publication Formats 7.1 Journals and e-Journals . . . . . . . . . 7.1.1 Multi author articles . . . . . . 7.1.2 The evolution of the electronic Article Readership . . . . . . . Concerns about Journals . . . Electronic journal use . . . . . Purpose of Reading . . . . . . The Value of reading Journals e-Journals in industry . . . . . 7.1.3 Future of the Journal . . . . .

. . . . . . . . . . journal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75 76 78 78 78 78 79 80 82 82 83 86 86 87 88 89 89 90 91 92 93 93 94 95 96 98 98 98 99 100 100 100 101 102

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

103 103 106 107 109 110 110 111 112 113 114

viii

Contents

7.2

Books and e-Books . . . . . . . . . . . . . . . 7.2.1 The e-Book phenomenon . . . . . . . Ebrary results . . . . . . . . . . . . . . Oxford Scholarship Online . . . . . . e-Books on other platforms . . . . . . A Strategy for Book Digitisation . . . Document Delivery . . . . . . . . . . . . . . . 7.3.1 The market for article supply . . . . InterLibrary Loans . . . . . . . . . . . Subito analysis of document delivery

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

115 115 117 117 118 118 120 120 122 124

Chapter 8 Legal Developments 8.1 Legal Initiatives . . . . . . . . . . . . . . . . . 8.1.1 Creative Commons . . . . . . . . . . . 8.1.2 Science Commons . . . . . . . . . . . 8.1.3 JISC and SURF’s Licence to Publish . 8.1.4 Future of Copyright . . . . . . . . . . Orphan Works . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

127 127 127 127 128 129 129

Chapter 9 Geographical Trends 9.1 Globalisation of Research . . . . . . . . . . . . . . . . 9.2 Movement of global funds for research . . . . . . . 9.2.1 Regional variations . . . . . . . . . . . . . . . Industrial R&D . . . . . . . . . . . . . . . . . Academic R&D . . . . . . . . . . . . . . . . . 9.3 Implications on scholarly publishing . . . . . . . . . 9.3.1 Worldwide Trends in Article Output . . . . 9.3.2 Trends in Three Major Publishing Regions Europe . . . . . . . . . . . . . . . . . . . . . . United Kingdom . . . . . . . . . . . . . . Asia . . . . . . . . . . . . . . . . . . . . . . . . China . . . . . . . . . . . . . . . . . . . . . Australia . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

131 131 132 132 133 133 134 135 135 136 136 136 137 138

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

141 141 142 143 143 144 147 147 148 149 149 150 151 151 152

7.3

Chapter 10.1 10.2 10.3 10.4 10.5

10.6 10.7

10 Research Disciplines Sources of Funds for Research . . . . . . . . . . . . . Research Trends . . . . . . . . . . . . . . . . . . . . . The changing R&D process in large corporations . 10.3.1 Social Collaboration . . . . . . . . . . . . . . Behavioural Trends . . . . . . . . . . . . . . . . . . . Specific Disciplines . . . . . . . . . . . . . . . . . . . 10.5.1 Physics and Mathematics . . . . . . . . . . . 10.5.2 Astronomy . . . . . . . . . . . . . . . . . . . 10.5.3 BioSciences and Medicine . . . . . . . . . . . United Kingdom . . . . . . . . . . . . . . . . 10.5.4 General Considerations for Biomedicine . . Case Study . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 Case Study – Information Search (Biology) Arts and Humanities . . . . . . . . . . . . . . . . . .

Contents ix

10.8

10.9

Other Subjects . . . . . . 10.8.1 Chemical Biology 10.8.2 Geosciences . . . Summary . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

153 153 153 154

Phase Three Drivers for Change 157 Change can be complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Financial and Administrative Drivers Chapter 11 Business Models as Driver for Change 11.1 Opening up the Market? . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Open Access Initiatives . . . . . . . . . . . . . . . . . . . 11.1.2 Open Access Journals – the Gold Route to open access Gold OA Journals – The Current Situation . . . . . . . 11.1.3 Author self-depositing articles – the Green Route to open access . . . . . . . . . . . . . . . . . . . . . . . . . . Subject-based E-Print services . . . . . . . . . . . . . . . Institutional-based repositories . . . . . . . . . . . . . . . The DRIVER project . . . . . . . . . . . . . . . . . . . Author Participation rates . . . . . . . . . . . . . . . Voluntary or Mandatory? . . . . . . . . . . . . . . . . 11.1.4 Harvesting the open access material . . . . . . . . . . . Usage of OA . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.5 Open Access projects . . . . . . . . . . . . . . . . . . . . BioMed Central . . . . . . . . . . . . . . . . . . . . . . Hindawi Publishing . . . . . . . . . . . . . . . . . . . SCOAP3 – OA publishing of physics journals . . . Case Study – The US IR scene . . . . . . . . . . . . . . . 11.1.6 Economics supporting open access . . . . . . . . . . . . 11.1.7 Impact of OA on Publishers . . . . . . . . . . . . . . . . 11.1.8 Trends favouring Open Access . . . . . . . . . . . . . . 11.1.9 Implications for Authors . . . . . . . . . . . . . . . . . . 11.1.10 Implications for Publishers . . . . . . . . . . . . . . . . . 11.2 Online Advertising as a new business model . . . . . . . . . . . 11.2.1 Online Advertising . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Advertising in the scholarly area . . . . . . . . . . . . . 11.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12.1 12.2 12.3 12.4

12 Funding Research as a Driver for Change Political developments . . . . . . . . . . . . . . . . . . . . . . Open Access Initiatives . . . . . . . . . . . . . . . . . . . . . . Ranking countries by research output . . . . . . . . . . . . . National and International government initiatives . . . . . . 12.4.1 A model for a new electronic publishing paradigm 12.4.2 European Commission FP7 e-infrastructures . . . .

. . . . . .

. . . . . .

. . . .

. . . .

161 161 161 163 163

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

165 165 165 167 168 168 170 170 171 171 172 172 174 176 177 178 181 181 182 182 182 184

. . . . . .

. . . . . .

185 185 186 187 187 187 188

x Contents

12.4.3 EU Study of Scientific Publishing (2006) . . . . . . . . 12.4.4 EU open access developments . . . . . . . . . . . . . . 12.4.5 European Research Council . . . . . . . . . . . . . . . 12.5 Publisher Initiatives . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 US publishers’ PR campaign . . . . . . . . . . . . . . . 12.5.2 PRISM – Advocacy programme from the publishers 12.5.3 The European PEER Project . . . . . . . . . . . . . . . 12.5.4 Publishers’ White paper on academic use of journal content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Library Initiatives . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.1 SPARC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.2 SHERPA/ OpenDOAR . . . . . . . . . . . . . . . . . . 12.7 Global Research Trends . . . . . . . . . . . . . . . . . . . . . . . 12.8 Research funding as a driver for change . . . . . . . . . . . . 12.8.1 Public funded R&D in the UK . . . . . . . . . . . . . . 12.8.2 Structure of Research Funding in the UK . . . . . . . 12.9 Research assessment . . . . . . . . . . . . . . . . . . . . . . . . 12.9.1 A Dangerous Economy (RCUK) . . . . . . . . . . . . . 12.9.2 The Death of Peer Review (RAE) . . . . . . . . . . . . 12.9.3 The 2008 RAE . . . . . . . . . . . . . . . . . . . . . . . . 12.10 Other funding agencies . . . . . . . . . . . . . . . . . . . . . . . 12.10.1 JISC in UK . . . . . . . . . . . . . . . . . . . . . . . . . . 12.10.2 MPS and DFG in Germany . . . . . . . . . . . . . . . . 12.10.3 Charities . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

190 193 195 195 195 197 198

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

199 200 200 201 201 203 203 204 204 204 205 206 207 207 207 208 208

. . . . . . . . . . . . .

. . . . . . . . . . . . .

211 211 211 211 212 213 214 215 217 217 218 219 219 220

Technological Drivers Chapter 13 Efficiency Improvements as a Driver for Change 13.1 Industry Collaboration to achieve improved efficiency in EP . 13.1.1 Trade Associations . . . . . . . . . . . . . . . . . . . . . . Publisher Trade Associations . . . . . . . . . . . . . . . . Library Trade Associations . . . . . . . . . . . . . . . . . 13.1.2 Research Information Network (RIN) . . . . . . . . . . . 13.1.3 Publishers Research Consortium . . . . . . . . . . . . . . 13.1.4 Publishing Cooperatives . . . . . . . . . . . . . . . . . . 13.2 Changes in Format . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Markup Languages . . . . . . . . . . . . . . . . . . . . . 13.2.2 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Structural Efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Mergers and Acquisitions . . . . . . . . . . . . . . . . . . 13.3.2 Economies of Scale . . . . . . . . . . . . . . . . . . . . . . 13.3.3 Why is market consolidation taking place in scholarly publishing? . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Why are the larger publishers able to succeed where small publishers find it difficult? . . . . . . . . . 13.4 Standards and Protocols . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 ONIX for Publisher Licences . . . . . . . . . . . . . . . .

. . 221 . . 222 . . 223 . . 223

Contents xi

13.5

13.6 Chapter 14.1 14.2 14.3

14.4

14.5 14.6 14.7 Chapter 15.1 15.2 15.3 15.4

13.4.2 ACAP (Automated Content Access Protocol) 13.4.3 Refereeing . . . . . . . . . . . . . . . . . . . . . Nature’s open peer review . . . . . . . . . . . New Technical Offerings . . . . . . . . . . . . . . . . . 13.5.1 Current Awareness and Alerting . . . . . . . 13.5.2 Publishing a semantic journal . . . . . . . . . 13.5.3 Publishing in virtual reality . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

224 225 225 225 225 226 226 228

14 Technology as a Driver for Change Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Past impact of Technology . . . . . . . . . . . . . . . . . . . . The technological infrastructure . . . . . . . . . . . . . . . . . 14.3.1 Digital Resource Management (DRM) . . . . . . . . 14.3.2 Athens . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Shibboleth . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 UK Access Management Federation . . . . . . . . . 14.3.5 OpenID . . . . . . . . . . . . . . . . . . . . . . . . . . Technology and Standards . . . . . . . . . . . . . . . . . . . . 14.4.1 Digital Object Identifier (DOI) . . . . . . . . . . . . . Concerns . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 CrossRef . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Other Identifiers . . . . . . . . . . . . . . . . . . . . . New Products and Services . . . . . . . . . . . . . . . . . . . Other Technical applications . . . . . . . . . . . . . . . . . . . 14.6.1 The “Cloud” . . . . . . . . . . . . . . . . . . . . . . . Three predictions on scholarly communication technology .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

229 229 229 230 231 231 232 233 233 234 234 234 235 236 236 237 238 239

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

243 243 243 244 246 246 246 247 247 249 249 249 250 251 251 252 252 253 253 254

15 Data and Datasets as a Driver for Change Background . . . . . . . . . . . . . . . . . . . . . Main data centres . . . . . . . . . . . . . . . . . The Data Challenge . . . . . . . . . . . . . . . . Standards and Procedures . . . . . . . . . . . . 15.4.1 Data Management Systems . . . . . . . 15.4.2 Metadata of Data . . . . . . . . . . . . . 15.4.3 Data Webs . . . . . . . . . . . . . . . . . 15.5 The Researcher’s wishes . . . . . . . . . . . . . 15.6 Reproducible Results . . . . . . . . . . . . . . . 15.7 Integration between Data and Text . . . . . . . 15.8 Business Model for Data . . . . . . . . . . . . . 15.8.1 NSF funds for data compilations . . . 15.8.2 Google and Datasets . . . . . . . . . . . 15.9 Impact of data on Libraries . . . . . . . . . . . 15.9.1 Data Curation . . . . . . . . . . . . . . . 15.10 Impact of data on Publishers . . . . . . . . . . 15.11 Impact on other institutions . . . . . . . . . . . 15.11.1 Research Councils . . . . . . . . . . . . 15.12 Summary . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

xii

Contents

Chapter 16.1 16.2 16.3

16 Mining of Text and Data Background . . . . . . . . . . . . . . . . . . . . . . Implications . . . . . . . . . . . . . . . . . . . . . . The mechanism of Text Mining . . . . . . . . . . Information Retrieval (IR) . . . . . . Natural Language Processing (NLP) Information Extraction (IE) . . . . . . Data Mining (DM) . . . . . . . . . . . Visualisation . . . . . . . . . . . . . . 16.4 Recent History . . . . . . . . . . . . . . . . . . . . 16.5 Challenges facing Text and Data mining . . . . 16.5.1 Structure of database . . . . . . . . . . . 16.5.2 Legal and licensing framework . . . . . 16.5.3 Legal status of text mining . . . . . . . . 16.5.4 Computation by machines . . . . . . . . 16.6 Practical Examples . . . . . . . . . . . . . . . . . . 16.7 Implications in applying text mining . . . . . . . 16.8 The Future . . . . . . . . . . . . . . . . . . . . . . 16.9 Impact on Libraries . . . . . . . . . . . . . . . . . 16.10 Impact on Publishers . . . . . . . . . . . . . . . .

Chapter 17.1 17.2 17.3 17.4 17.5 Chapter 18.1 18.2 18.3 18.4 Chapter 19.1 19.2 19.3 19.4 19.5 Chapter 20.1 20.2 20.3 20.4

17 E-science and Cyberinfrastructure Background . . . . . . . . . . . . . . . . The e-Science Challenge . . . . . . . . Visions for e-Science . . . . . . . . . . Overall context of e-Science . . . . . . 17.4.1 Public Engagement . . . . . . . Future role of e-Science . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

as Drivers for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18 Workflow Processes and Virtual Research Integration into Work Flow Process . . . . . . The research process . . . . . . . . . . . . . . . Examples of a work bench approach . . . . . . 18.3.1 Virtual Research Environments (VRE) Summary . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

255 255 256 256 256 256 257 257 257 257 258 258 259 259 260 260 261 261 262 262

Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

263 263 263 264 264 265 265

. . . . .

. . . . .

. . . . .

267 267 268 268 269 270

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19 The Semantic Web as a Driver for Change The Challenge of the Semantic Web . . . . . . . . . Critiques . . . . . . . . . . . . . . . . . . . . . . . . . . Web Science Research Initiative . . . . . . . . . . . . Examples of Semantic Web in scholarly publishing 19.4.1 Knowlets . . . . . . . . . . . . . . . . . . . . . Implications of Semantic Web for EP . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

271 271 271 272 272 273 274

20 Mobile Devices as Driver for Background . . . . . . . . . . . . . The wireless economy . . . . . . . Intelligent spectacles? . . . . . . . Amazon’s ‘Kindle’ . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

275 275 275 276 276

Change . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Contents xiii

20.5 20.6 Chapter 21.1 21.2 21.3 21.4

Google’s Open Handset Alliance . . . . . . . . . . . . . . . . . . . . 277 Future Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 21 Archiving and Preservation as Drivers for Change The Challenge of Archiving and Preservation . . . . . . . . . . . Preservation and Access . . . . . . . . . . . . . . . . . . . . . . . . Archive requirements . . . . . . . . . . . . . . . . . . . . . . . . . . International Collaboration . . . . . . . . . . . . . . . . . . . . . . . 21.4.1 US-based Task Force on sustainable digital preservation and access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4.2 The European Alliance for Permanent Access . . . . . . .

. . . .

279 279 279 281 282

. 282 . 283

Social Drivers The Google Generation Chapter 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 22.9

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

289 289 289 290 292 293 296 297 298 299

23 Web 2.0 and Social Collaboration as Drivers for Change Wisdom of the crowds . . . . . . . . . . . . . . . . . . . . . . . The Challenge of Web 2.0 . . . . . . . . . . . . . . . . . . . . . Critiques of the Web 2.0 movement . . . . . . . . . . . . . . . Case Study – O’Reilly . . . . . . . . . . . . . . . . . . . . . . . . The Web 2.0 business model . . . . . . . . . . . . . . . . . . . . 23.5.1 Blogs and Wikis . . . . . . . . . . . . . . . . . . . . . . 23.5.2 Mash-ups . . . . . . . . . . . . . . . . . . . . . . . . . . 23.6 Drive towards Consumer-based Collaborative systems . . . . 23.7 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.8 Case Study – Wikipedia and online encylopedias . . . . . . . 23.9 Wikinomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.10 Case Study: InnoCentive . . . . . . . . . . . . . . . . . . . . . . 23.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

301 301 302 303 305 307 309 309 309 311 311 313 315 317

Chapter 23.1 23.2 23.3 23.4 23.5

22 Findability as a Driver for change The rise of Search Engines . . . . . . . . . . Resource Discovery and Navigation . . . . How users find information . . . . . . . . . The Findability Challenge . . . . . . . . . . Case Study: The Google mantra . . . . . . . Other search engines . . . . . . . . . . . . . Impact of search engines on publishers . . Book digitisation and the Copyright issue . What of the Future? . . . . . . . . . . . . . .

287 . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

Chapter 24 Trust 319 24.1 Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 24.2 Fraud and Plagiarism . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Chapter 25 Timeline – Emergence of Electronic Publishing 323 25.1 Where we come from . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 25.2 Users of scholarly communication . . . . . . . . . . . . . . . . . . . 324

xiv

Contents

25.3 25.4 25.5 25.6

The Industry Structure Drivers for Change . . Separating the Drivers Summary . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

324 325 325 327

Chapter 26 Summary and Recommendations 26.1 Planning for Change . . . . . . . . . . . . . . . 26.2 A vision for Scholarly Communications . . . . 26.2.1 User Behaviour . . . . . . . . . . . . . . 26.2.2 Effect of government intervention . . . 26.2.3 New information service requirements 26.2.4 Market Trends . . . . . . . . . . . . . . 26.2.5 The Information Process . . . . . . . . 26.2.6 Business Models . . . . . . . . . . . . . 26.2.7 New Products and Services . . . . . . . 26.2.8 Stakeholders . . . . . . . . . . . . . . . . 26.2.9 Legal issues . . . . . . . . . . . . . . . . 26.3 The future role of the Publisher . . . . . . . . . 26.4 The future role of Libraries . . . . . . . . . . . 26.5 The future role of Intermediaries . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

329 329 331 333 334 334 335 335 336 336 337 337 337 338 340

References

341

Figures and Tables

343

Index

345

Abstract

The traditional publishing model is being challenged as it adapts from a print-only to a hybrid print and digital mode of communication, and in some instances to a fully electronic publication system. There is no single issue that is driving this adaptation – it comes in a variety of forms with different drivers, though technologyrelated stimuli, emerging social trends and business model changes are the most prominent. This book reviews these various drivers for change and assesses their impact on an industry sector in transformation. An underlying theme is the effect which such changes is having on user behaviour – on whether the emerging structure is being led by changes in information habits by the user population or whether end users are being buffeted and driven into new ways of doing things by developments and trends outside their control. The jury is still out on the crucial question of the power of users and their behaviour patterns as the ultimate determinant of how electronic publishing will emerge and what this means for existing stakeholders such as publishers and librarians. It is a jury without solid evidence, however. The data which could be used to assess the current social aspects of scholarly communication have in the main been under-analysed and under-reported. We know more, but not much more, about user behaviour in a print dominated information system. We know much less about the way the various user communities have acclimatised themselves to information as it embraces the digital age. The Google Generation in particular remains something of an enigma. However, there is a new environment which is opening up for those involved in electronic publishing. In the past, in the print paradigm, a cottage industry mentality prevailed as many of the costs – notably typesetting and printing – were outsourced. In the electronic publishing world, to be a key player, content providers need to invest in platforms, in retro-digitisation, in acquiring ICT skills all of which have implications on economies of scale. Large conglomerates are now emerging, fuelled in some cases by venture capital money, which have been active in acquiring other publishers to achieve the new economy of scale. Other publishers have sought protection through aggregation and sharing some of their operating functions. New large players have entered this business sector, notably search engines which have become sizeable elephants in the emerging EP room. The culture of publishing is changing. Combined with the change in industry structure there have been the Drivers, twelve of which have been identified and will be described later in this book. Some of the key features of these drivers are that ‘openess’ or freedom to access and use the results of the publication process has become high on the industry agenda. In

xvi

Abstract

addition, a collaborative, community-based approach is emerging in some areas where ‘correspondence’, ‘communication’ and ‘computation’ – the three ‘C’s – have become the by-words as a counter to the traditional formalised science, technology and medical publishing. This is interacting with another major force – alternative information sources or formats to the publication of record. These are emerging as highly desirable aspects of the new communication system. Raw data and datasets, the ability to mine these to identify new relationships, and incorporating data and publications into an integrated work flow system which closely matches the way researchers work, has become a key topic – these are new information services which now sit on top of traditional content and are changing the nature of electronic publishing into a service rather than product-orientatedbusiness. In addition there is a growing demand for ‘reproducible results’. If you can’t reproduce the original results described in the published article through the reader’s own desk top there is no result. Supporting all these broad trends is public investment in the networks which allow the new information to flow between author and reader, networks which are huge in their technical capacity and are part of national and global e-Science and cyberinfrastructual investments. Electronic publishing is changing faster now than it was ten years ago. In the next ten years the changes are likely to escalate rather than diminish. There are however many issues which remain unresolved. Whether open access as a business model is viable and sustainable, and whether it offers up publicly funded research results to a traditionally disenfranchised community. Or whether the so-called ‘data deluge’ will change the way scholarly research is conducted and reported. What role will e-Science and the Grid take in communicating the results of research effort, and what impact will grassroot social collaborative tools, often referred to as Web 2.0 developments, have on the creation and assessment of research? This is a confusing mixture of trends, drivers and developments with no clear indication of how each will determine the shape of industry in years to come. An attempt will be made later to approximate a timeline for some of these drivers for change. There is currently uncertainty among the main stakeholders in the industry. Given the changes, will ‘publishers’ survive with their alleged fixation over ownership of content and fail to provide new value-added services in a digital world? Will ‘librarians’, worried about the storage of material in physical buildings and the dwindling number of visits to access their collections, be bypassed in future by new information gatekeepers with new skill sets? Will the existing intermediaries – booksellers and subscription agents – who are losing ground to direct transactions between publishers and end users – be totally disintermediated? And if these traditional key players lose ground, which new agencies will emerge to take over the role of communicators of information in a truly digital world? These are some of the key issues which will be raised in the following chapters. Given the volatile nature of the industry at present it is by no means clear that anyone has all the answers. This is a snapshot description of an electronic publishing system in 2007/8, a system which is on the move, and the move can follow a number of routes. An audit is therefore attempted of available information and opinions which is related to this emerging situation, and this audit is culled from a wide variety of sources.

Abstract

xvii

Keywords Electronic publishing, journal publishing, scholarly and STM publishers, research libraries, intermediaries, science policy, technology, user behaviour, authors, collaboratories, open access, institutional repositories, e-Science, e-Research, Grids, data and text mining, e-books, mobile devices. social networking, social collaboration, blogs, wikis, podcasts, Web 2.0, semantic web, folksonomies, ontologies.

Acknowledgements

This book rests heavily on the shoulders of giants. Without access to some of the legendary work undertaken by experts such as professors Carol Tenopir and Donald King, from the universities of Tennessee and North Carolina respectively, the first phase of this book which explores some of the background and genesis of the electronic publishing industry, would have been all the poorer. The second phase, that of assessing the current structures and drivers rests on the shoulders of other giants such as professor David Nicholas and Dr Ian Rowlands from University College London whose work through UCL’s Centre for Publishing and CIBER has been both pioneering and illuminating. Also the many research studies commissioned by and undertaken by Michael Mabe whilst at Elsevier and more recently as chief executive of the STM Association has been an important source of industry insight as change takes place. Many of the scenarios which are painted in the third phase, the assessment of future trends and drivers, have relied on the thoughts and vision of David Worlock, senior research fellow at Outsell Inc, amongst others. Research commissioned in the UK by agencies such as Research Information Network (RIN) and the Joint Information Services Council (JISC) have also figured prominently in compiling this book. The support and help from colleagues at the British Library has been exceptional, and grateful thanks are extended to all concerned. This book is truly a collective effort in pulling together developments, ideas and strands from a diverse group in an attempt to describe the current confusion that is electronic publishing. Notwithstanding the eminence of the sources used, any omissions, mistakes or false interpretations made are entirely those of the authors. After all, it is a diverse and diffuse industry segment, one which is undergoing dramatic change which is itself a breeding ground for inaccurate subjective assessments and misrepresentations. And this is the by-word of the book – that Change is dominating the scene and all players involved in electronic publishing need to come to terms with Change in its many guises. To make the assessment more manageable, a carefully circumscribed aspect of electronic publishing has been made – electronic publishing as it affects libraries and publishers is a broad church. This book is focused on that one part which is at the forefront of adaptation to the changing environment – the scholarly information area dealing with research, particularly in the hard sciences. It is contended that work being done in these areas will spill over into other publishing areas in due course. But even this circumscription on science, technology and medicine does not guarantee that mistakes and misinterpretations will not be made. Electronic publishing is an important topic. It is a function that lubricates the wheels of the scientific research effort in particular, providing essential informa-

xx

Acknowledgements

tion to ensure that new research findings are part of a consistent corpus of activity, and not undertaken in isolation. Over the decades and centuries mechanisms have been established by the academic and industrial research communities to ensure that project findings are credible and that recognition is given to those who merit opprobrium from their peers. It is this mechanism that the new electronic publishing system is testing, and this book builds on the suggestions, advice and information supplied by many in the industry in looking at how robust the current mechanism is.

Chapter 1

Background

1.1 Introduction This book is about the mechanisms which facilitate the exchange of ideas and information within those sectors of the modern economy which advance the cause of innovation, new ideas and cultural exchange. Electronic publishing is that mechanism. The book describes the changes which are taking place in the way individuals and organisations communicate, and how they adapt to the new information systems which electronic publishing is creating. Electronic publishing is often the hidden process at the heart of academic and corporate research, of professional and learned society communication, of collaboration between research communities. It fuels the research effort, both nationally and internationally. It is a harmonising activity, and in its most effective form has a democratising role as it facilitates dissemination both widely and rapidly. However, it is in a state of flux. As indicated in the Abstract, there are many drivers for change impacting on electronic publishing. They come from different quarters, and their confluence is producing turmoil. As will be emphasised during the book there is concern that the traditional stakeholders in the information sector are at risk. Their survival is threatened. Talk of dinosaurs and ‘heads-in-sand’ have frequented meetings at which one stakeholder takes pot shots at another. As a consequence, at stake is the health of the very social and industrial processes which are at the heart of a vibrant and growing economy. As much as possible, evidence will be substituted for emotion – emotion has tended to enter into much of the dialogue between the separate parties involved in the communication process. This unfortunately often obscures the real facts and emerging trends. Where evidence and hard facts exist these will be highlighted, though their paucity is frequently lamented upon. But it is a critical conclusion from this book that more attention be given to extracting facts from the current situation in scholarly communication and isolate these facts from the hype and rhetoric which unfortunately dominates the listservs, blogs and wikis. These latter have emerged to provide sometimes unhelpful commentary on what has become the ‘scholarly communication problem’.

1.2 The Book In 1996, Bowker Saur published a book entitled “Electronic Publishing and Libraries – Planning for the impact and growth to 2003”, authored by David J. Brown

2

Background

of DJB Associates on behalf of the British Library. It painted a scenario whereby the Internet was just beginning to grapple with the scholarly communications process, and CD-ROMs were felt to be the most likely digital message carrier of the future. As has become evident, the past ten years have seen a dramatic change in the way electronic publishing has emerged and transformed traditional print-based publishing systems. These electronic publishing systems have also affected the relative importance of different electronic media in communicating scholarly information. The importance of CD-ROM technology as a support mechanism was over-exaggerated in the original book, whereas the effect of the Internet and Web developments were greatly underestimated. New business opportunities have been awakened by open access developments, e-Science trends and Web 2.0 based social collaborative tools, all of which were barely evident in 1996. As such the publisher, Saur and its patent company de Gruyter, has commissioned a follow-up report to address these unforeseen and unforeseeable changes and to assess their impact on the current publishing and library sectors in particular. The approach adopted in this book, as in 1996, is to take a snap-shot of electronic publishing developments and make an assessment of them based on currently available knowledge and evidence. It offers no ‘final solution’ for how the electronic publishing industry will actually emerge from the buffeting it is taking from many external events. However, it will attempt to put the many changes which are taking place into some semblance of order and describe their potential impact on the overall scholarly communications process. The journey is still underway – given the experience of Google, for example, we can expect a rocky ride. New challenges and opportunities are likely to appear in some cases from entirely unforeseen quarters. It has meant that this update of the first attempt at investigating ‘Electronic Publishing and Libraries’ (1996) will not follow the earlier practice of forecasting the future in a precise and quantitative way. Change is currently too volatile and the risks of getting it wrong are too great. Instead this book follows a more qualitative assessment of trends and developments, underpinned by quantitative data where available, in the hope of enabling the reader to come to his or her own judgement about the current and future state of Electronic Publishing.

1.3 Purpose of the Book So why write a book now about electronic publishing when little is settled and change is being claimed by us to be so rampant? It is evident that the rules of engagement are altering dramatically in electronic publishing (EP) from a situation where print publication and dissemination dominated, as recently as two decades ago, to a point where digital publication and dissemination plays an increasingly important role. This raises questions about the very direction and key players in the scholarly communications sector. Is this critical communications system stable or will it fragment into many separate pieces? Except for a few specialised exceptions, we are not yet in a situation where digital information resources totally dominate the information-seeking behaviour of researchers. We are still in a hybrid system, though there is undoubtedly a rapid migration from print to digital as the IT infrastructure becomes more reliable,

Purpose of the Book

3

ubiquitous and secure. This book identifies some of the key issues which have arisen as this hybrid model envelops the scholarly communication process. As such it is intended to become a platform from which the debate about a digital communication system can take place. This book aims to bring together in one place those important features of the changes that are currently in evidence. It also describes the tensions that currently exist between the main players in the digital publishing sector. Some of these tensions are inherited from the past, and often revolve around the legal ownership of content and its use. But new tensions are also evident as they are driven by the enabling power of technology. Some of these tensions arise from the ease with which content, in a digital world, can be disseminated and replicated without recourse to the content owners. It is a significant feature and debates about the future development of the research sector’s information requirements have become quickly polarised. There is nonetheless the need for mutual cooperation and collaboration to ensure that a smooth transition to the new digital information order is effective, efficient and viable. There is a risk that we focus on short-term issues at the expense of moving towards a long-term vision about which there is currently limited consensus. This is where emotion has taken hold, and conflict arises as each part of the information sector tries to protect its own short-term patch without any real understanding of the other sectors’ positions and how the whole fits together. There is the lack of a consistent, holistic vision of the future which all stakeholders can buy into. A fundamental issue which this book has to face is whether publishers and librarians as we know them have outlived their purposes in the new digital environment. Are they hanging on to functions that have a diminished or have no place whatsoever in the new Millennium? Are they dinosaurs, with the impact of digital trends having the same consequence on publishing and librarianship as meteors did on dinosaurs? Are they clinging on to out-dated concepts, particularly of ownership when openess is increasingly the name of the game? Are they showing signs of failure to adapt given new circumstances and environmental change? Changes can be evolutionary – a gradual change over decades – or alternatively they can be revolutionary, making substantial inroads into the way communication takes place. Both types of changes are seen within scholarly communications. In the traditional aspects of communication, the way people work with information resources, the changes have been evolutionary in nature as most people have slowly adapted their habits to the new (digital) information formats and technical infrastructure. More recently, the rapid development of the Internet and associated technologies has had a major impact on the way in which information can be obtained, consumed and created. This in turn has opened up new avenues of opportunity for the users of these information resources as long as they are prepared to adapt to technological advances. The younger generation, and scholars operating in high-technology areas have become particularly influenced by such developments. As such there is a mix of both evolution and revolution in the emergence of scholarly communication as we see it. One striking aspect of this confluence is that it has produced a new social phenomenon, it has created the ‘digital native’ or the millennium generation. These are individuals, usually young, who have been brought up on a diet of video games, interactive online services and technological gizmos which have dampened their enthusiasm to gain information and knowledge from the highly structured printed publication system, either on paper or its digital version. Instead they expect to

4

Background

find their information through comprehensive and powerful search engines, such as Google or Yahoo, or increasingly that information will be delivered to them in anticipation of their demand for it by new specialised and targeted information services. Some are moving into a social networking form of mass collaboration, potentially eschewing the traditional highly structured refereeing systems that have been the bulwark of traditional publishing. The increased participation in services such as MySpace, FaceBook and SecondLife – with now millions of adherents – point the way to a new form of collaboration in content creation. Intelligent information systems are also offering enticing but as yet ill-defined opportunities for the emerging digital natives. This book will look at the background to these Changes, where they originated, what the implications are and how things may develop in the future. The central purpose is to highlight the complexity of the current situation and the need to be aware of the holistic nature of the scholarly communication process in coming up with solutions which meet the external conditions and criteria.

1.4 ‘Change management’ Fundamentally, there has been a growing disconnect between the supply and demand forces for published information supporting research in recent decades. These forces have been rooted in old systems which served print-based book and journal publishing reasonably well – that authors submitted works to publishers who created product which they sold, through intermediaries, to the library who paid for them from budgets allotted by their institution or company. All this was in anticipation of a demand for information and knowledge being met by the library for their constituency. However, in a major part of the digital publishing system – in scholarly and research publishing – there has been a growing tension caused by the imbalance between the forces (R&D funding) which fund the creation of the traditional product (books and journals) and those forces which are responsible for buying and using the product (libraries within their respective institutions and companies). The two are financed from separate agencies and budgets, with little or no structural interconnectivity. This lack of balance has created the so-called ‘frustration gap’ between supply and demand. A practical illustration of this can be seen from the different growth rates of ARL research budgets in the US and the funds identified by the NSF in undertaking research (see Chapter Two). During the 1990’s in particular there was little correlation between the supply and demand curves for research-focused information. In addition, any assessment of the scholarly communication process should be mindful of the changes which are occurring as a result of the impact which the Internet, the web, the semantic web, cyber-infrastructural developments and social publishing is having on user behaviour. Changes in interpretation by some sectors of the role of copyright, intellectual property rights, legal deposit, interlibrary lending, etc, are also adding confusion. Some of the many specific areas of ‘change’ include: • Changes in Technology. We are in transition between a print and a digital publish-

ing market. Increased power of desktop computers and speed of communica-

‘Change management’ 5

•

•

•

•

•

•

•

tion is having a powerful influence on the form which scholarly information is delivered. In this respect the e-Science and e-Research movements are becoming significant. Changes in the Business Model. Coinciding with the technology change, there are also pressures on developing new ‘free’ ways to access information in line with social imperatives. Open access takes many forms, most of which are still unproven in a commercial sense, but nonetheless is still dominating the debate about the current publishing landscape and is central to any realistic discussion about Change. Changes in the Product/Service concept. Instead of a total reliance on formally published mainly textual research articles there is a growing requirement to provide access to supplementary material, grey literature, datasets, manipulation software, video and sound clips etc, which are increasingly part of the research output but are yet to establish widely accepted mechanisms for certification, dissemination and preservation. Changes in User Behaviour. Partly as a consequence of the available cheap and ubiquitous technology, authors and readers are changing their patterns of accessing and assimilating information and adopting digital and often informal communication methods. Changes in Scientific Disciplines. It is no longer possible to consider scholarly or research information services as a ‘one size fits all’ – each research area has its own culture, tradition, scale and format needs. Even more apparent is that they are each moving in different directions in their adoption of digital media. Some are embracing the new information technologies much faster than others. Changes in Funding Agencies. The approach that research funders are adopting to evaluate the results of their fund allocations (for example, through research assessment exercises) is resulting in a more metric-focused and evidenced-based evaluation of their grant programmes. Changes in Copyright. Whilst academic institutions are demanding new rights over work undertaken within their campuses, the issue of deposit mandates to ensure broad access to research outputs has become a growing trend. This has been fuelled by agencies such as Creative Commons and Science Commons that make the transfer of such rights away from publishers, the traditional holders of published copyright, easier. Changes in Demography. The rapid rise in social collaboration tools and usergenerated media is breaking down the traditional formal scholarly publication systems producing an increasingly hybrid environment.

The combination and interaction of these changes creates an urgent need to understand and tackle the problems facing the scholarly communication industry. The danger of ignoring the effect of these is to foster an unstable and unreliable dissemination of research outputs. Changes need to be made in an incremental way that is viable, sustainable and optimal in terms of providing users with the information they need. Revolutionary changes may destabilise this communication system causing loss of information, insight, time and money.

6

Background

1.5 Target Audiences This book is therefore aimed at all the current and future key stakeholders in the scholarly communication process, but particularly focused on publishers and libraries. It endeavours to highlight the underlying, and in some instances the unreported, trends that are creating Change, and the significance of these trends. The book speculates on the impact that these changes will have on the existing stakeholders. Whilst it will not be dogmatic in describing such impacts, it will draw on best evidence to produce scenarios within which existing players can assess their own strengths or weaknesses. A basic fear – the thing that keeps many pundits awake at night – is that the simultaneous adoption of all that is claimed to be good for electronic publishing in principle could in effect throw the baby out with the bathwater and leave a future electronic publishing (EP) structure with no content. A viable mechanism must be found to ensure that authorship is rewarded, and reward in this instance may take a variety of financial or social forms. Ignoring this central tenant of the scholarly communications process would mean that the main purpose of electronic publishing – to keep people informed in a timely, efficient and cost-effective way – would not be achieved. There has to be value and purpose which attracts authors to disseminate the results of their ‘sweat of the brow’. This book is addressed to all those sectors of the scholarly communication industry that are instrumental in bringing a new EP system into place. To create a common understanding of the issues among these partners. To ensure that sound bytes do not dominate the debate. Where possible use evidence and metrics to inform decisions; and that appreciation of the needs of the links in the information chain also be made.

1.6 Definitions Before delving into the substance of the challenges facing electronic publishing, a few brief definitions of intent need to be made. This is a book primarily about the scholarly information sector, the area that is in support of the scientific, technical, engineering and mathematical (STM, or STEM) research activities. This sector has particular needs that are often at the cutting edge of commercial, social and technological change. Innovation, leading to improved industrial products and services of benefit to society, is often at the basis of such needs, though basic research to understand the world we live in is also a key driver in academia. Such information is needed to push back the boundaries of knowledge, each research result building on others, which has led to the concept that advances are made by ‘standing on the shoulders of giants’. (a quote inaccurately attributed to Sir Isaac Newton in the first instance – it predates him by several centuries, ostensibly to Bernard of Chartres in the 12th century). In addition, the book is focused on research, not education, nor entertainment, as the main stimulus for information demand. The tradition over the past four centuries has been for the slow evolution of the journal to become the leading format for delivering scholarly information. For reasons that lie within the current budgetary mechanisms, such serials as journals have gradually replaced books as the main format in the more popular STM

Objectives 7

sectors. The so-called ‘article economy’ was flaunted as a more recent way to unpick the relevant items from a journal and deliver just what is needed on demand, but the inroads which such article delivery has had has been at best spotty. In effect, for centuries, the printed book and journal formed the mainstay of scholarly communication. During the past twenty years this is showing signs of change. For example, a significant and recent development has been the emergence of data and datasets as the primary research resource that some specialised researchers need to conduct their work. The data itself is not encumbered with interpretations by other researchers about how it can be applied in other situations for other purposes. In most cases the data is neutral and as such a valuable new asset for the STM research community. This was rarely available in the pre-digital era as a scholarly resource. So in terms of product format for electronic publishing the net is being cast wide to include such new developments.

1.7 Objectives This book attempts to pull all these strands together to allow the reader a broader perspective of all aspects of the electronic publishing industry into which we are being driven by events over which we do not necessarily have control. Whilst we may not have total control, it behoves those of us who are active in the industry to understand all the issues facing the parties involved in the sector and exert influence. This book comes with a plea – that we remove the blinkers from our respective sectoral eyes and understand that in the volatile information world into which we are moving we all need to stand together, to work together, through a common understanding of each other’s needs. To use the perhaps over-worked and inappropriate recommendation from Benjamin Franklin at the US Declaration of Independence, 1776, “We must all hang together, or assuredly we shall all hang separately”. More importantly, to ignore legitimate roles for libraries, intermediaries, data providers, publishers and researchers all for the sake of protecting a particular narrow vested interest could hasten the demise of the current publishing system and lead to something totally new, something which could be alien to the best interests of the industry and its users. This is not a plea for a Luddite approach or support for a no-change scenario. There are undercurrents that already exist driving a different pattern of relationships between the present links in the information chain. What is intended from this book is that there are consequences that arise from our respective responses to change, and these actions should be as much positive as defensive, in harmony rather than self-serving. Ultimately the market will decide. The next chapter of this book looks at some of the issues that will determine this emerging market.

1.8 Approach adopted This report deals with three main phases in the way electronic publishing has impacted on publishers and libraries. Phase 1 – This is the period up until the early 1990’s when publishers and libraries were faced with a largely print-based information system. Though electronic me-

8

Background

dia existed it was nevertheless early days and dominance was with publishing and curating the printed page. It was the period covered by the earlier book on “Electronic Publishing and Libraries – planning for the impact and growth to 2003”. It was a simple period in media terms, with the main problems encountered being economic rather than technical or social. Phase 2 – From the mid 1990’s to the early 2000’s was a period of confusion. The arrival of the Internet created a new dimension to the information industry, bringing with it a whole set of new legal, business and technical challenges many of which remain unresolved. Phase 3 – From the early 2000’s there has been a strong electronic publishing drive, with digital versions of information outselling and outperforming their analogue equivalents. Adjusting to this new all-digital environment has created new strains, with mergers and acquisitions dominating the publishing scene, massive investments in infrastructure being put in place by policy makers, and a period of ‘openess’ overtaking the traditional ‘toll-based’ access to information as a business model. In this phase three the dozen or so main drivers for change will be identified and described. Throughout these separate phases the user needs have tended to be ignored, under-researched or marginalised. Nevertheless, whilst it may be somewhat of a clich´e to claim that it is through an understanding of users that one achieves insight into the new electronic publishing system, this will be the undercurrent which runs through this book. Understanding the information habits of those who are being provided with information is fundamental. Clich´e or not we will begin this tour by identifying some of the macro trends which have impacted on the social trends within the industry and do this by drawing on parallels from other industries and situations. Examples of those which relate to the traditional print-based publication system, Phase One, will be used as a starting point. Such examples paint a clearer picture of where we have come from.

Phase One

Chapter 2

Industry Evolution

We can look back in history to see whether there are lessons to be learnt about how to assess and cope with the current situation facing the publication system. A number of conceptual models have been proposed in the past which have some relevance to our ability to understand aspects of the current position. Whilst it is fair to say that we have never before trodden the path we are now following, some conceptual models from history may help us place things in perspective. These conceptual models are abstracts. They provide a framework within which we can measure the significance of past events and place them in relation one with another. The first of these concepts addresses the key departure point for many aspects of electronic publishing – the financial stresses which lay within the printed publication system and why these were important in setting in train a series of developments which culminate in discussions over the appropriateness of prevailing business models.

2.1 Tragedy of the Commons The concept of the Tragedy of the Commons involves a conflict over resources between individual interests and the common good. It comments on the relationship between free access to, and unrestricted demand for, a finite resource. The term derived originally from a comparison identified by William Forster Lloyd in his 1833 book on population, and was then popularised and extended by Garrett Hardin in his classic 1968 essay published in Science entitled “The Tragedy of the Commons.” Under the Tragedy of the Commons concept, the common land would be overgrazed in medieval times until such a time as one extra beast tipped the scales and made the ‘common land’ or good totally useless for all. It would reach a stage where nothing could survive on the common land. All the herdsmen would suffer not just the last one. This collapse would happen quickly and was irreversible. What is its relevance to the scholarly communication process? The often unwritten assumption by critics of the scholarly communication process as it existed up until the 1980’s was that scholarly publishing was headed in this same direction – that at some stage the collective library budget, the source for purchasing most of scholarly communications, would be insufficient to cope with the ever-expanding individually produced research output. The system would self-destruct dramatically and quickly under the strain. As new media and new versions of existing publications emerged the stresses would be ever greater and the collapse of the

12

Industry Evolution

system more imminent and catastrophic. The Tragedy conceptualises, in a way unintended by Hardin, the problem facing the pre-digital publication system in which publishers produced books and journals on an uncoordinated basis. This took no account of the fixed resource – the library budget – available from which to purchase their output. As such the extra book or journal subscription could become an unbearable burden for the library budget and new ways of serving their clients would have to be found by the librarian to justify their existence. The economics of publishing is driven by the research output produced from an expanding R&D effort by society. This bears little relationship to the budgets being allocated by individual institutions to their libraries. The Tragedy of the Commons did not happen. The switch from a print-based publication system to a hybrid and increasingly digital one has produced solutions which have given flexibility to the buying system, and enabled more information to be absorbed without causing the budgetary system to collapse. But it does indicate that there was something inherently flawed with the traditional mainly serial-based publishing system. There was a disaster waiting to happen. Any suggestion that there were halcyon days of scholarly publishing is pure myth.

2.2 Frustration Gap Another way of describing this imbalance between supply and demand forces is to look at the expenditure by a country in its national research and development budget as compared with the expenditure on research libraries during the same period. For the United States such a comparison is possible and is illustrated below. The growing gap between the R&D infrastructure and the information support system is evident. Growth in Research vs. Library Spending Growth in Research & Library Spending 1976-2003

390

Main cause of the crisis

Average ARL Expenditure Constant $ 340 Academic R&D Constant $

Index (1976=100)

290

240

190

140

90

1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 est

Year

Figure 2.1 US Academic R&D Expenditure and ARL Library Budgets 1982 Constant Dollars Sources: ARL and NSF

Publisher and Library dissentions 13

Percentage of total institution spending

The frustration gap is the difference between the supply of publications arising from the R&D spend, compared with the ability of the library to buy these publications (the demand) as reflected in their overall budget increases. Another significant metric showing how librarians have had difficulty coping with the inexorable growth in publishing output during this period is the falling share of the library budget, in comparison with the overall institutional budget, during the past decade. Between 1994 and 1999 the percentage share of the institutional budget of the leading higher education (university) establishments in the UK was approximately 3.1 %. Since then there has been a gradual decline each year to 2.7 % in 2005. Whether this reflects increased competition for the institutional budget, or whether it indicates the declining relevance being attached to the library and its role in the institution, is difficult to say. But there is a real concern that libraries cannot use the same metrics to advance their financial cause as can other departments within the same institution. It may be easier for a research centre to demonstrate tangible financial payback from additional investment in terms of additional students, extra postdocs, more research grants, etc. Payback which comes from an expansion in the library has less direct and immediate financial consequences. It is an intangible infrastructural item – in many cases a necessary rather than desirable entity. The following chart illustrates more vividly the decline in the library share of all types of higher education establishments in the UK. 4.0 3.8

3.7

3.8

3.7

3.8

3.8 3.7

3.6

3.6

3.6

3.6 3.5

3.7

3.5

3.4

3.6 3.5 3.2

3.4

3.4

3.1

3.3

3.0

3.2

3.2

3.0

3.2 3.1

3.1

3.2

3.1

3.0 2.0

2.9

2.7

2.9

2.9

2.8

2.8

2.8

2.5 92-93 93-94 94-95 95-96 96-97 97-98 98-99 99-00 00-01 01-02 02-03 03-04 04-05 old universities

new universities

HE colleges

Figure 2.2 Library as % of total institutional spend, 1992–2005 UK Higher Educational establishments

2.3 Publisher and Library dissentions Some prominent librarians have complained that publishers, notably the large commercial publishers who have become the dominant players in this sector, were sucking out the lifeblood of Science by not only overcharging for its publications, but also by restricting access to most of the corpus of relevant published information. Furthermore, the surpluses, the profits, were taken from the public science

14

Industry Evolution

effort and used to line the pockets of the rich capitalists who owned shares in the major European commercial publishers. There was no loop back into investment by and within the publicly-funded scientific effort. As a consequence, it has also been claimed, libraries have been compelled to cut back on acquisitions and other operating costs on a regular and sustained basis. Cancellation has become a keyword, and it has become more difficult to justify expenditure on new information products. In the face of what may be perceived to be an erosion of the library as a core function or service within research institutions, some library advocates have recommended a much more radical approach to adapting the publication paradigm in favour of one which takes a higher moral ground, one which is more favourable to society’s overall needs. This has involved questioning the dominant position which the publishers have in the flow of information from creator or author to user. This is particularly the case in science where it is pointed out that that only 1–2 % of the total research effort is in the communication process yet it is the control which publishers have in this which is stultifying the whole information dissemination system. On the other hand, publishers have claimed that the so-called ‘Serials Crisis’ was largely of the making of librarians – that they had failed in their public relations and organisational politicking to convince their financial paymasters of the need for additional funds to buy essential publications. The importance of the library as the centre for knowledge exchange within an institution was being weakened and librarians had failed to come up with satisfactory arguments to protect themselves and their budgets. This antagonistic atmosphere has existed for many years, and was enhanced more recently by bitter wrangling over specific areas of disagreement such as: • • • • •

Document Delivery particularly in the 1980’s and early 1990’s Interlibrary Loans particularly in the 2000’s Open Access as a business model (ongoing) Support for Institutional Repositories (IRs) just beginning Derivative works as an acceptable research procedure by third parties (under discussion)

It has furthermore spilled over into questions about the validity of legally sanctioned activities (in some countries) such as e-Legal Deposit. It also now threatens to undermine efforts to make effective and appropriate the application of new technology, new business models, adopting new paradigms and to generally ensure that the scholarly communication process in an electronic/digital environment is made efficient. There is a tradition of mistrust between some sectors. Within this ongoing debate, the sociology of scholarly communication is emerging as a discipline in its own right. It embraces a number of motivational changes that are impacting the way individuals, kindred groups, and disciplines come to terms with the many external forces affecting them as a result of the change from a print to a print plus digital paradigm in the scholarly sector. But because these changes are mainly recent, the full impact of them on the scholarly communication process is difficult to measure.

Big Deals 15

2.4 Big Deals Towards the end of the 1990’s a new business model emerged which toned down some of the conflicts which were threatening to engulf publisher/library relations. It arose out of a proposal first put forward in 1995 by Academic Press, now part of Reed-Elsevier, that libraries could be offered access to a complete package of Academic Press journals for a small additional charge over their current subscriptions with the publisher. The IDEAL (International Digital Electronic Access Library) and APPEAL (AP Print and Electronic Access Licence) was borne out of a new way to offer a consortial licence. It was a business model driven programme, not one which relied particularly on either end user feedback of technological development for its conception. A key point was reached when the Higher Education Funding Councils in the UK supportedthe idea of Big Deals and financed an experiment with a small group of significant journals publishers to see how the principle would work in practice within all UK higher education institutions. This included APPEAL as its basic template. It was evident that libraries rarely subscribed to all journals from any one publisher, so in many cases the small additional charge which the publisher would levy would increase the holdings of the library dramatically. Albeit that many of these added titles were not always central to the needs of the users of the library in question. Nevertheless, from 1998/9 Big Deals a new business relationship took off. The net effect has been that the average costs per title purchased fell whilst those publishers able to offer Big Deals have increased their revenues through the marginal increase in the price for accessing their total package. Publishers have been able to show that the price index for journals has fallen and that there are now more titles being subscribed to by libraries in comparison with the pre-Big Deal period. This process was made even more visible as these so-called Big Deals were extended beyond a deal done between a library and publisher to a consortium of libraries and a publisher, and in some cases consortia of publishers. There are unforeseen consequences as a result of adopting Big Deals which will be explored in more depth later. In brief, the main adverse impacts have been: • Selection of materials for the library collection are being taken out of the hands

of the library collection development staff and shared with publishers • It has been contended that there is a large amount of unwanted, unused material

being put into the library domain • Only those (large) publishers that have an extensive journal publication list

stand to benefit, whereas the smaller often specialised learned society publishers lose out. Whilst Big Deals have been seen as a godsend to the library community at a time when the serials crisis was about to bite hard, some five years later we are seeing a return to a similar set of concerns facing the librarian. It is recognised that the library budget is finite; that those who are able to offer Big Deals have made their offers; and that there is a growing impression that Big Deals have distorted the publishing system in favour of the large publishers’ lists irrespective of the quality of individual titles. The Big Deal itself is now coming under closer scrutiny.

16

Industry Evolution

2.5 The Tipping Point But at this stage we need to consider the mechanisms that lead to a Change occurring in the adoption of electronic material within the library and society in general. Will other new media formats have to follow a particular route in order to become established? Can adoption of more efficient means of scholarly communication be brought about quickly and effectively, reducing the cost to society of perpetuating old, inefficient systems? The concept of the ‘tipping point’ may help here. According to Malcolm Gladwell, in his book entitled “The Tipping Point – How little things can make a big difference”, (2000), new innovations do not necessarily occur or take hold for logical or apparent reasons. Changes or tipping points can be stimulated by a number of events. Some of these are informal, subjective factors which defy the norms of efficiency. In fact, according to Gladwell, successful ideas, products and messages behave as epidemics and viruses. They become contagious. Even the smallest of factors can have a profound effect on creating a virus, and this can happen quickly. After the tipping point is reached subsequent progress as a result of the virus taking hold occurs geometrically. Gladwell claims there are three rules which set off an epidemic. 1. The first is the Law of the Few – which means that a few key individuals can have a significant influence on creating change. They are connectors, mavens and salesmen. Connectors know lots of people – the average personal contact network is claimed to be twenty-one people but connectors tend to know far more than this. They are creators and sustainers of a wide network. They are on first name terms with movers and shakers in the industry. Mavens, however, are people who are very well informed and share this knowledge willingly. Maven is someone who accumulates knowledge. They keep the marketplace honest. They are not persuaders – they are teachers. Salesmen have the skills of persuasion. They tend to be subtle. They cannot be resisted. These different ‘people types’ can have a profound influence in effecting a tipping point, in making a new product or service successful. Some combination of the three ‘types’ are required to get things moving, to effect a change or an innovation which succeeds spectacularly. There are some obvious candidates who we see as connectors, mavens and salesmen in the current controversies over aspects of scholarly communication – notably in the area of open access (OA) adoption. 2. The second rule of the epidemic is to create ‘stickiness’. There has to be quality in the message. The elements of stickiness in the message may be very small. There is usually a simple way to package information that can make it irresistible. For electronic publishing this can be tied to inherent technological progress within society leading to a ‘better’ information service – a key stickiness factor. 3. The final epidemic creator is the power of context. Epidemics are sensitive to conditions prevailing at the time. Getting an epidemic going involves a different set of human profiles – innovators, early adopters, early majority, late majority and finally the laggards. The first two are visionaries and risk takers, whereas the early majority avoid risks and are pragmatists. There is a chasm between the two. This is where the connecters, mavens and salesmen have a role in generating the epidemic. They translate the message from the first group to the second. The point is that there are a variety of social mechanisms behind a change in acceptances and attitudes within society. This is as relevant in electronic publishing

Open Access

17

as elsewhere. It means that technological efficiency by itself is not enough. Some of the Tipping Point ingredients, the social mechanisms, are necessary. Have the more significant aspects of electronic publishing achieved Tipping Point? Some have, some still have some way to go. The next sections of this book will look at the evidence to see whether the above has made a substantive impact on the way researchers and academics communicate, or whether the preference remains with the printed word. Several studies have been undertaken which suggest there is indeed a radical change in user behaviour some ten years after the commencement of the electronic publishing revolution. But it is not an evenly distributed progress. Some parts of the electronic publishing still need the connectors, mavens and salesmen to be more active – for example many of the recent author studies show that, despite the claimed advantages for authors in having their articles published in open access format, as many as 90 % of authors are still unconvinced. ‘Tipping point’ issues have not yet taken hold on the research author community. A key feature of the scholarly communication process, and its adoption of electronic publishing, is that the numbers of individuals involved are large. It is also very diffuse, with huge differences between subject areas in how they adapt to EP. There are strong geographical differences, with the western world separating itself from the developing world in its ability to adopt and leap-frog innovative communication systems. There is no easy compartmentalisation of the scholarly communication process – it is a spectrum of interests, motives, drives and ambitions. Each has its own tipping point.

2.6 Open Access The tipping point is frequently applied to a new business model which has arisen in electronic publishing. It is suggested that the failure of this model, the ‘open access’ model, to make significant inroads into the traditional subscription-based publication system is that the key elements of tipping point have not yet been reached. In effect the tipping point concept has to compete with a well-established process in publication of research results, honed to a level of acceptability by the community over centuries. Despite the publisher/librarian tensions referred to above, there is a clear set of activities undertaken by journal publishers, who produced more and more articles (growing at 3.5 % to 4 % per annum) in response to the greater investment in scientific research since World War II. This has been balanced by librarians who bought annual subscriptions to these journals largely on a title-bytitle basis, though with a growing number of cancellations as their largely static real materials budgets bit. Superimposed on this structure which showed signs of some dysfunction a new movement appeared on the scene. This ‘open access’ model has as its dominant feature the provision of ‘free’ published information to all those within the community who might want to see and make use of the research results. Supporting this concept of ‘freedom’ is the much bigger open source movement which was sweeping through parts of the computer, IT and telecommunications industries. This has begun to spill-over into scholarly communication. Many experts questioned why publishers have in the past increased their prices way over comparable price indices – publishers replied that the output of articles

18

Industry Evolution

was growing inexorably and that due to cancellations of journal subscriptions, they were forced to increase prices to compensate. Was this an indication of a market mechanism which was unstable? Both the Scientific and Technical Committee of the House of Commons in the UK, and the European Commission, have investigated the issue, with the former coming down strongly in favour of changing the market mechanism to correct its inadequacies (and the Commission still ruminating over the issue). There is a sense that instability is rife and growing. Some point the finger at publishers as reactionaries, protective over their profit margins and potential dinosaurs as the market mechanisms become more efficient. The alternative market mechanism being proposed is a switch from the library bearing the brunt of journal subscription pricing to either the author paying to have his article refereed and copy-edited by the publisher, or for the institution to load the author’s final article (before publication) onto their server. This is the so-called Gold Route to open access (author pays) or the Green Route (whereby articles are hosted on the local or subject-based institutional repository). There are other variants around this theme, including the important source whereby authors make their articles available through their own personal web site. In all cases the articles are then accessible for free. Anyone – fellow researchers, peers both home and abroad, professionals, amateur scientists or laymen – all can access and download the article, in theory. The problem comes with copyright. Copyright is what protects the author from seeing his or her work misused and misappropriated. The publisher has traditionally taken over the responsibility for ensuring that such works are protected by inviting authors to sign a copyright transfer form on acceptance by the publisher for publishing the article, and then policing this to see that no infringements occur. This has led to complaints that publishers have restricted the access to the fruits of society’s investment in R&D, and that such a restriction on the final stage of the research process by organisations that had no role in the rest of the research cycle was unacceptable. In response, copyright forms have been made available online through a Stanford University based operation headed by Professor Lawrence Lessig and entitled the Creative Commons. This gave power to the author to decide what aspects of copyright should be exerted over access to their article. Crucial to the future structure of STM is how much of a stranglehold the copyright and intellectual property rights (IPR) issue will continue to have over published output. There are signs that this is weakening as Creative Commons licences, and licences to publish from authors, gain pace, but as long as STM publishers prevent the reuse of their final published version (the Record of Science) many of the new STM information services will be stymied. Nevertheless, the legal changes being experienced in this area are causing concerns among traditional journal publishers who see their business models being challenged and their power base eroded. The larger publishers in particular are looking at redefining their business and moving from a content-focus to a service orientation. More of this later. The whole edifice in favour of the open access school of thought is built around the theory that there are more people who stand to benefit from access to the published research results than those who have bought access through a subscription or licence. In this respect the concept of the Long Tail is intrinsic to establishing whether there is a bigger ‘latent’ market from the one being reached traditionally.

The Long Tail 19

The Long Tail is central to the conflict which is being waged between the Open Access advocates and the Traditional Publishers.

2.7 The Long Tail Chris Anderson, Editor in Chief of Wired Magazine, unleashed a global debate with an article on “The Long Tail,” the huge portion of content that is thought to be of residual value to companies catering to mass audiences. He claimed that this residual portion of a demand curve (see below) is both powerful and profitable. It opens up a whole new audience which needs to be considered in assessing business models for scholarly communication. Companies such as Google and Amazon prove how important the ‘long tail’ of users really is and how profitable the aggregation of the tail can become. ‘The long tail’ refers to the hundreds of thousands of products that are not number one bestsellers. It is all those products that form a tail to an organisation’s sales activities. However, in the digital and on-line world, these products are booming precisely because they are not constrained by the demands of a physical retail space. What once had to be stored and accessed from physical buildings and shelving now resides on a computer and can be retrieved using online systems. The term ‘the long tail’ has since caught on in technology and media circles worldwide. Anderson says that in an era of almost limitless choice, many consumers will gravitate toward the most popular mass-market items, but just as many will move toward items that only a few, niche-market people want. For example, with music, buyers want the hot new releases, but just as many buy music by lesser-known artists or older music – songs that record stores never would be able to carry but that can now be offered online. All that small-market, niche music makes for, in aggregate, a substantial demand. Until the past few years, massmarket entertainment ruled the industry. In this new digital era, the long tail is the new and powerful force. Some open access advocates claim that scholarly and research information displays many aspects of The Long Tail principle. There are a handful of large commercial and society publishers but these are being complemented by thousands of smaller publishers which together make up half the market. Users of published information are mainly in the university and corporate research centres worldwide but they are equally matched (and exceeded in number) by trained and educated professionals in wider society. However, it is in established areas of online buying of books, such as through Amazon, where the real effects of The Long Tail are noticeable. A large proportion of all books bought through Amazon are not available on the shelves of local bookstores. The Long Tail of book purchasing reflects the same sales profile which exists in many other online catalogue and sales services. Anderson claims there is still demand for big ‘cultural buckets’ (subscriptionbased journals for academia), but this is no longer the only market. The hits now compete with an infinite number of niche markets. The niches are not ‘misses’ – they are ‘everything else’. The mass of niches has always existed, but as the cost of reaching it falls – consumers finding niche products and niche products finding consumers – it suddenly becomes a cultural and economic force to be reckoned with. Keeping the niches alive is a much easier task in the digital world than it ever was in the print-based publication era. Traditionally we have lived with the 80:20

20

Industry Evolution

Figure 2.3 Theory of the Long Tail

rule. However, current online systems mean that 98 % of items are used. The main elements of this use are: • the tail of availability is far longer than we realise • it is now within reach economically • all the niches when aggregated can make a significant market

(See Chris Anderson’s book that builds on the article was published by Random House Business Books as “The Long Tail” (ISBN 184413850X)).

2.8 Disenfranchised Researchers As indicated by the Long Tail there is in theory a large number of niche users who might benefit from access to published material, the key plank in the open access manifesto. Is it possible to identify a market sector where the current publishing system brushes over a niche that ‘open access’ could capture? The advocates for open access claim this does exist but in true tradition of the messianic, chooses not to quantify or describe where it exists. It might be possible to find this new market sector among the ‘disenfranchised’. The term ‘disenfranchisement’ is not popular within the scholarly publishing community. It carries overtones of artificial barriers being placed in the way of researchers in accessing published information. Whilst the term itself may be emotive, the fundamental principle is that the publication process carries costs with it. These costs relate to the quality control mechanism which is in place to restrict the scholarly communication process being overwhelmed by too much information being published. This quality control apparatus, the so-called refereeing system, has to be established, organised and maintained. Together with the desk editorial process, which publishers have created to provide additional quality to the published articles, there is a considerable investment in the infrastructure and the amount of post-research work which has to be undertaken, all of which needs to be financed.

Disenfranchised Researchers 21

The traditional business model which has evolved is based around the existence of libraries which have become the central purchasing point for research-level books and journals. This has meant that there is a buffer between the supply and demand sectors for scholarly publications, and this buffer has meant that buying procedures have evolved which relate more to the needs of library administration than the ultimate consumer (researcher). Libraries need to be assured of comprehensive collections of material; that it is archived in a professional way and that the library’s end users can easily gain access to the material for which publishers charge. The journal subscription has therefore become a key element in the scholarly communication process. Research articles are in many instances shoe-horned within journal issues which contain a package of related and non-related articles. The packaging of such articles is a result of the time-flow in the quality control process (refereeing) as much as through a planned approach to ensure consistency and similarity of content. But for publishers this mechanism has been a godsend. Libraries subscribe to these journals on an annual basis, paying upfront usually before the beginning of each calendar year. Publishers therefore know in advance how much money will be received prior to them investing in the production of the journal. But to ensure that these costs are not for the benefit of those individuals or institutions which have not subscribed to the journal, copyright law is enforced. Publishers obtain copyright transfer of the published articles and as such are able to determine who is able to read the articles and who is not. Those who are not able to look at the full articles are the disenfranchised – those who are on their own, or are attached to institutions which do not have subscriptions to the journals in question. They are locked out of the main communication system for that particular journal or package of research results. This has become a leading driver for those who claim that the current scholarly communication process is inequitable and dysfunctional. In many cases society (the tax payer) pays for research to be conducted but only a relatively few are the real beneficiaries. An estimate of the extent of this inequity has been given by Forrester Research (in 1999/2000) in the USA. It was then estimated that the total number of world ‘knowledge workers’ amounted to 170 million. Outsell, another US-based research consultancy, put the figure at 125 million. These consist of professionals working in offices, engineers working on remote sites, distance learners working at home, those moving between professions, amateur scientists, etc. However, the number of ‘enfranchised’ – those who are members of institutions which have libraries, such as corporate R&D departments, university and colleges, etc – amounted to 7.2 million (academic-based R&D personnel based on Unesco Statistical Handbook 2000). Indications of the number of disenfranchised include: Engineers (working in the field) Accountants Distance learners Entrepreneurs and SMEs Managers SMEs = small and medium-sized enterprises (Data based on private research, August 2004)

UK 436,000 805,000 211,470 3,162,944 2,463,000

USA 1,158,370 2,432,730 3,077,000 17,646,062 5,654.800

22

Industry Evolution

City & Financial analysts

Professionals

Engineers in remote sites

Entrepreneurs and Innovators

Researchers in Academia

Distance Learners

Researchers in Industry Interested Layperson and Amateur Scientists

Patients and Healthcare workers

Figure 2.4 The “Disenfranchised” information sector

The above exclude the ‘amateur hobbyists’, those who trained in a particular discipline but moved elsewhere yet still have a residual interest in their former subject. Other types of ‘disenfranchised’ include: • Virtual learning groups and collaboratories • Organisations operating at the fringes of the research effort (equipment manu-

facturers, etc) • Researchers at institutions which are focused on topics not related to the title’s

main subject area • The general public (particularly for medical information)

At a Science and Technology Committee of the UK House of Commons public meeting in 2004, one representative from the publishing industry claimed that the number of people who would benefit from the output of research, and who were not currently covered by the subscription-based system, was few. Partly because they would not understand the specialised, high level content in some of these publications. This caused an outcry from supporters of the open access movement who pointed to the huge interest by the lay public in research articles which related to medical complaints, for example, or environmental issues, global warming, food technology, etc. The numbers would be considerable; particularly when patients and those afflicted strove to learn more about their complaints than the general medical profession itself. Microsoft, in promoting its new Microsoft Live Academic Search service, revealed statistics which indicated that 80 % of online users have searched for health information. 87 % of online users have searched for scientific information. 25 % of an individual’s time is spent in searching for information. This supports the contention that the new users of online scientific databases are not confined to the

Knowledge Workers and the Thinkforce 23

university or corporate based research sector. That the Web, the collaborative processes, and open access are all interacting to produce a more aware community, one which has hitherto been kept in the dark.

2.9 Knowledge Workers and the Thinkforce This is another related market dynamic. Traditionally those who graduated but stayed in the university loop to do research and teaching remained ‘enfranchised’ by the nature of the subscription-based publishing model adopted by the publishing sector. Each year some 10.5 million graduates represents the output from universities worldwide and this generates even more and more knowledge workers, as only a fraction stay on within the university/college to continue in research (and benefit from the university’s licensing arrangements with electronic publishers). The rest swell the ranks of ‘knowledge workers’. Those moving away from academia are increasingly using the same search tools to get answers for their work and home. They are all highly qualified, and distinguish themselves from the broader category of the ‘disenfranchised’ by their graduate and postgraduate education and qualifications.

Figure 2.5 Thinkforce employees in OECD countries per thousand, 2004 Source: OECD in Figures 2007, OECD Observer 2007/Supplement 1

24

Industry Evolution

The OECD (Organisation for Economic Cooperation and Development) issue statistics on the Thinkforce numbers within member countries. The number of professionals engaged solely in what they define as ‘the conception and creation of new knowledge spanning all industrial and academic sectors’ shows that there are nearly 4 million R&D professionals in the OECD area, of which about two-thirds are in the business sector. This gives about 7 researchers per thousand employed in the OECD area, compared with 5.8 in 1992. Finland, Japan, New Zealand and Sweden have the highest numbers of knowledge-based researchers per thousand persons employed. Outside the OECD, China has also seen dramatic growth, but at 1.2 per thousand employees in 2004 still remains low in the scale. This ‘professional market’ is not necessarily as ‘intense’ a market as that offered by the researchers in universities or corporate organisations. It is scattered and fragmented. It belongs to the ‘long tail’ of the scientific publications market. Nevertheless, knowledge workers could become a new, viable if a long-tailed’ market for sci/tech information services. However, whilst knowledge workers may have the training and discipline of their chosen subject, they rarely knew what was available and what could be of use to them. So they lived in blissful ignorance of information which would be potentially valuable to them in their work. Now things have changed with the arrival of powerful search engines.

2.10 Emergence of Search Engines Besides the creation of search engines there are also listservs, blogs and wikis which let knowledge workers know just how much does exist and in many cases allows them to get access to the information they need. Particularly as scientific-focused search engines such as OAISter, Elsevier’s Scirus, and the WorldWideScience.org trawl for free publications as well as those hidden behind toll-accessed barriers. One particular development which has revolutionised the way the ‘disenfranchised’ and the knowledge workers are informed about relevant publications is through Google. This powerful search engine crawls the Internet searching for terms which it builds into and incorporates within its central indexing system. With its unique way of prioritising the results of a search (through the proprietary PageRank algorithm) the new users of the web are able to be alerted to the content from a wide spectrum of information sources. These include scholarly communication output. Given the changes to current needs and the changing behaviour patterns, what does this mean for people trying to find information? • 87 % start with Google – and say it is excellent • Search engines are used ◦ 20 % for navigation purposes ◦ 30 % for transactional purposes ◦ 48 % as a source for information ◦ 2 % for other reasons

This process partially ‘enfranchises’ those people who had remained outside the traditional subscription-based publication system. Information, usually in the form

Something is Good Enough 25

of short synopses or metadata, is brought to their attention. The problem comes when the metadata leads the user to an online service which denies the user access to the fulltext. This is where the current authentication service stands in the way of users accessing material, some of which they feel they have – as taxpayers – a legitimate right to access. However, the open access movement – see above – has created an alternative ‘free’ version of the published article, some of which is deposited in institutional repositories and is hoovered up by Google and its competitors.

2.11 Something is Good Enough It has been suggested that an unintended consequence of the adoption of search engines is that end users come to rely on Google etc as their one and only source for external information. This can be dangerous as it instils a false sense of confidence in what they get as a ‘hit list’ from a search. The belief may be that the search engine identifies everything that is available. Search engines, despite their powerful coverage and vast aggregation of index terms, are generic tools. They cover the universe whereas scholars and researchers are often focused on a highly specialised and small niche. Search engines are not necessarily attuned to the nuances of such specialist requirements. There are cultural approaches which require differing information needs to be met. The structure of one research area may be unsuitable for a single generic approach. Multimedia needs are often not included; software manipulation tools are sometimes necessary to enable information to be interrogated; grey literature, not included in the crawling of sites by search engines, may often be required in order to achieve a complete picture. Despite this there is a general feeling that the search engines do identify sufficient information to provide a basis if not a comprehensive coverage of the literature in a specific area. As such there is no need to look further. Something is Good Enough often rules. After all, it is only recently that an information system can bring such a lot of material to one’s attention, so quickly and easily. This can delude the academic or researcher or knowledge worker into thinking that that is all there is, and that to delve further, into specialist online portals is not necessary. The issue is made even more problematic if undue reliance is placed on the prioritisation given to the output of results by the search engine. The search engines have their own unique way of presenting the results, and has been referred to, in the case of Google this is the PageRank system. PageRank analyses the pattern of links between items to build, in the results list, a prioritisation which projects relevance for that particular search term or terms. So powerful have these automated prioritisation techniques become that end users frequently just look at the first page of the results, the first ten or a dozen items found, and ignore the rest. This is where the system breaks down. Intelligent information retrieval systems can compete with the large generic search engines – by giving a more comprehensive and total view of the relevant material available. Whilst generic search engines are useful to give a first cut at a selection of possible information items of relevance, in many cases other services may also be required to ensure total relevancy. More generally, Search has to be magical, and the results need to be trusted. This has led to the need for more underlying features to search engine technology

26

Industry Evolution

which would make them useful and relevant – the application of ‘visualisation’ techniques, text-mining and multimedia searching, ‘same as’ search tools and federated searching underpinned by dynamic ontologies, all have to be considered. As long as search tools for scholarly information remained a small market niche ($ 400 million) the full power of technology companies such as Autonomy/Verity and FAST would remain at best marginal in this area. In general, more investment needs to be made in improving ‘search’ technologies and make them specifically relevant for the scholarly communication sectors. But for the time being they are helpful in giving the ‘disenfranchised’ and knowledge workers the same potential access to the world’s research literature as is available to researchers operating in academia and the corporate world within the licensed walls.

2.12 The new market for research material To return to a main theme of this chapter, is there evidence that open access will reach a much bigger market than the traditional subscription/licensing system? Will society benefit by switching the business model and allowing unrestricted access to the highly specialised research output? There is no clear evidence either way. The assumption is that within certain disciplines and subject areas there may well be a ‘new currently latent market’ which could be composed by the highly educated, and which could be reached as a result of the activities of the search engines. Health care is one in particular. Some areas are so specialised and esoteric that the bulk of the audience for the research article is already open to them. More data is required to substantiate the thesis that open access leads to greater readership. Anecdotal and small experiments are not sufficient.

2.13 Overall Trends The traditional print-based publication system has produced products and services which have withstood the test of time and become essential features of the scholarly communication process. These are part of the infrastructure of the research process, accepted by authors, researchers, publishers and librarians alike. There is a strong commitment to them – they have a comfort factor, they are popular. But they were based on a technology and approach which is being tested by the Internet and its derivatives. Based on the above overview of concepts which have been put forward by strategic thinkers a new set of developments can be identified which apply specifically to the electronic publishing sector. These new concepts are gradually building up support of their own and sit side-by-side with the traditional ones, creating the current hybrid publication system. Some of these new concepts are summarised below. The inference is that the cosy world of electronic publishing will be transformed by the impact of a number of significant changes. The academy has been typified by stable institutions but simultaneously faced with increasing costs. However, there are indications that the ‘networked economy’ is coming into play. Hotmail accounts increased by 12 million within 18 months; Google, Linux and Wikipedia

Overall Trends 27

Table 2.1 Developments in the Information Culture Now

Leading to

Leading further to

Cataloguing

– folksonomies – ontologies

semantic web

– interlibrary loans – document delivery

– pay-per-view

Journals

– big deals – data/datasets

– open access – e-science

Serials crisis

– tragedy of commons – tipping point

– big deals – new ep projects

Wisdom crowds

– search engines (google)

– ambient findability

The long tail

– open access – search engines

– knowledge workers – disenfranchised

Web 2.0

– wikinomics

– semantic web

Articles

came of age, FaceBook has 42 million users and is growing at 200,000 per day, all heralding the arrival of new services, some of which are based on mass social collaboration within the evolving Web 2.0. There have been failures along this path of transformation. This includes AOL which is forfeiting $ 2 billion in its move from subscription revenues as a business model towards advertising. Libraries also have suffered from a change in the search and retrieval process – Google came along and demonstrated that it is all about speed and comprehensivity. Such major changes indicate what has to be done – bold steps are needed. Three main forces have emerged. • The first is the Network Effect which stipulates that any service becomes more

valuable the more people use it. This is the so-called ‘fax phenomenon’ – fax machines only become useful if more than one person uses it (see later). In fact there is a geometric (rather than arithmetic) progression as the network takes hold. Growth can be quick as ‘viral marketing’ occurs. • The second force is a two sided market – that stakeholders and users recognise there are several ways revenues can be earned – not just one. Online advertising is one which is currently being looked at. • Finally, there is the ‘wisdom of the crowd’ which postulates a diverse, decentralised approach to getting an answer. Examples include the dramatic success of Wikipedia as a new way of publishing using the collective wisdom of the community. The sociological analysis of trends impacting on electronic publishing for the scholarly research community in particular illustrates how deep-seated some of the trends are. Calling on concepts which summarise group behaviour at the community level it can be seen that these have relevance in reaching an understanding of

28

Industry Evolution

the particular challenges facing the specific academic/research sector. These behavioural trends have as much relevance in dictating the shape of the future market for electronic publishing products as does the technology which underpins them. The Long Tail and Wisdom of the Crowd are increasingly becoming relevant as the scholarly information system migrates from being a relatively small, librarybudget determined, industry sector, to one where open access, social networking and collaborative publishing mix with freely accessible datasets and e-Science developments. It is truly the conditions set for a ‘perfect storm’. Evidence on how the end users in the specific areas of scholarly communication are currently being affected by this confluence will be looked at in the next chapter.

Chapter 3

End User Behaviour

3.1 Change in User Behaviour To quote from Charles Dickens: “It was the best of times, it was the worst of times”, which rings true of the information industry today (and is repeated in the EPIC2015 service (www.epic2015.com) created by Robin Sloan and Matt Thompson. This freely accessible video starts out with the above quotation and traces the revolution which could take place by 2015 in the global newspaper industry as a result of the emergence of Googlzon, a hypothetical merger between Google and Amazon. The good times are because everyone is on the Web, enabling rapid response to user needs; it is the worst of times because everyone is on the Web and their needs are changing constantly, with new players and services emerging overnight to satisfy these needs. This is causing anxiety and concern within the traditional information system. Some of the specific issues which are underlined by this change include: • Instant gratification with regard to information supply • The trend towards going beyond published books and journals and reaching

for the primary source material • Integrating information into the work flow of users • Free at the point of use becomes the norm • The need for access to the library is no longer a priority

This has resulted in a change in user behaviour. Content creators today are faced with a range of options to deliver their messages. By the same token, users also have a wide range of services they can use to access the information they want. Since value and beauty is typically defined as being as unto the eyes of the beholder, authors and publishers need to explore the full range of options to disseminate their content if they are to reach all potential audiences. Users will vary in their choice of which medium or service best suits their own unique purpose. Particularly the latent users whose needs have not been considered in the past by the traditional scholarly publication process. Access to information has become decentralised, and the individual use of premium content in enterprises exceeds use of centralised resources. According to a recent Outsell report, the top criteria for information use are quality and relevance, update frequency and ease of use. Users are spending 58 % of their time gathering information, up from 44 % a year ago, and they are demanding that information be integrated in their workflow.

30

End User Behaviour

This has meant that services will no longer be in the business of offering “searching for information” but rather of “finding answers” on behalf of their clients. By aggregating authoritative sources, from traditional, blogs, wikis and a number of new ways, services such as Answers.com are providing a different, more rich and relevant information package. Providing answers rather than just information is becoming the key. Other recent findings from the Outsell study include: • 74 % of US students prefer to use online information to consulting printed text-

books • There is the emergence of personal information portals • The online advertising market is maturing (with a shift to targeted online ad-

vertising such as Google’s) A disparate set of trends but all emphasise the centrality of online information systems for current information users.

3.2 Who are the users? What determines the viability of the present and future systems is the willingness of the research or scholarly community to adapt themselves to the formatting options being made available by publishers and librarians. Users (or ‘consumers’) of STM information are mainly researchers. They are postgraduates, working at the frontiers of knowledge doing exploratory research. They are rarely students, and even more rarely people in the professions. These latter categories have only sporadic interest in research articles, and these are usually reviews rather than the results of primary research. (But it is a market which, through the provision of better navigational tools, could be pulled back into the mainstream of buying articles more regularly on demand when the item meets specific needs and tasks). The university sector is the prime sector for core scholarly information ‘users’ – hence the importance of university library budgets in supporting the system. But there is a substantial demand based in corporations where R&D is important to maintain competitive advantage. This does not include all industry sectors – there are half a dozen key ones such as pharmaceuticals, aerospace and IT, which dominate the corporate research sector. Both corporate and university researchers are active in similar areas and projects – in fact there is an increasing cross-fertilisation between the research and funding in these areas. Other research-based areas include government funded research councils which perform similarly to the university sector. The big difference between users in universities and corporations is whether the research is funded by the public purse. There is a Scientific Ethic of free disclosure of the results to the world in the academic system – in corporations the users are more closed about what they are researching into (and publishing) for fear of giving away competitive commercial advantage. In this instance the Commercial Ethic of secrecy holds sway. Despite this very little is known about the behaviour patterns of researchers in general. There is no ‘typical’ scholarly user. There is a fundamental difference created by the culture of the discipline itself. Some end users are highly data oriented (for example, radio astronomers), others rely on manipulating compounds

Typology of Users 31

on screen (such as chemists), others have traditional allegiance to preprints (physicists), others to refereed articles (biomedics), others focus on genomes and protein structures (bioinformatics). The limited studies done to date have pointed to marked differences in use of electronic material between these different disciplines. No one size fits all. Nor is there any one type of information appropriate to one individual at any one point in time. As was highlighted by Lorcan Dempsey (formerly of UKOLN at Bath University, now at vice-president at OCLC, Ohio, USA) the individual is a complex mix of attitudes and emotions. At one point he/she may be a researcher; later that day a teacher or lecturer; in the evening an editor of a journal, and a referee of a submitted article; at several points they may be a mediator or contributor to a forum, a coordinator of a wiki, a publisher of a blog, a sports fanatic, a father, etc. All require different types of ‘information’ and participation with the information in different ways. All in the same day. They will be faced with numerous sources of information and varying amounts of relevant information from each source.

3.3 Typology of Users There have been a number of early studies on user behaviour, though usually of limited extent. In 1991 and 1992, before the Internet really took hold, there was already an element of ‘switching off’ as identified by some Faxon Institute studies. Using diaries maintained by a sample of researchers, a psychology research unit established some profiles of typical researchers in that period. The results were: Table 3.1 Breakdown of profiles of researchers (1991/92) ‘Information Zealots’ Classic Scientists Young Technologists ‘Information Anxious’ Older Teachers Product Researchers

24 % 25 % 12 % 16 % 10 % 13 %

Source: Faxon Institute 1992

The interesting point is not the percentage share achieved against each profile but rather the fact that it exposes different categories of how researchers behave in their approach to information collection. The ‘Information Anxious’ category was similar in their approach to a category identified by the then Mercury Enterprises in their collaboration with the British Library Documer Supply Centre in the mid 1990’s – they pictured a world in which fully 50 % of the researchers in the UK could be identified as being ‘Out-in-theCold’ (or OINCs) as far as the available information services were concerned. They had either found the information overload problem too stressful and switched themselves off, or were switched off by the system as they fell between institutional purchasing schemes. Either way they were seen ripe candidates for a new document delivery system which never transpired. Back to the Faxon Institute studies on information overload, a self-perception of information competency among 600 researchers was measured by the psychologists. One third of the respondents felt they read less than 20 % of what they needed

32

End User Behaviour

to in order to do their job well. Only 27 % felt they read more than half of what was required. And this was fifteen years ago when the frustration gap was not as wide as it has since become. (See: Electronic Publishing and Libraries – Planning for the impact and growth to 2003, David J Brown, 1996). Though more recent evidence is limited, a gross approximation of users by type has been undertaken within the US’s Pew Internet and America Life Project (May, 2007). The typology identified there includes: 1. Omnivores. These represent about 8 % of the US population. They enthusiastically use everything related to mobile communications technology. 2. Connectors. These represent 7 % of the population. They tend to be older females and tend to focus on the communication aspects of the new technologies. 3. Lackluster Veterans. These are 8 % of the population. They tend to use the Internet frequently but are less avid about cell phones. 4. Productivity Enhancers. These are also about 8 % of the population. They have strongly positive views about how technology helps them increase their productivity at work and at home. 5. Mobile Centrics. With 10 % of the population fitting into this category, they fully embrace their cell phones but make little use of Internet. 6. Connected but Hassled. These are 10 % of the US nation. They find all connectivity intrusive and information something of a burden. They often experience information overload (see below). 7. Inexperienced Experimenters. These are about 8 % of the US population. These casual users occasionally take advantage of the interactivity on offer. 8. Light But Satisfied. These are 15 % of the country. They have some technology but it does not play a major role in their lives. They love the TV and radio. 9. Indifferent. These 11 % of users proudly proclaim that they do not like this technology, but they begrudgingly use it a little. 10. Off The Network. These are 15 % of the population. They have neither a cell phone nor an Internet connection. Older females dominate this group. The enthusiastic and prolific users of the new technologies represent less than half of the US inhabitants. There is still a great deal of potential to increase the Internet and mobile communications user base in a well-developed economy such as the US. In the rest of the world the potential new demand could be even greater. It reinforces the concept that there are new markets – not just the ‘disenfranchised’ – out there who could become serious users of electronically-delivered scholarly publications.

3.4 Information Overload As indicated earlier, there is a ‘serials crisis’ created by too much published literature chasing budgets that are too small. By the same token there is also ‘information overload’ as the increasing amount of published information has to be absorbed by an individual researcher within the constraints of a limited time span available for reading such material. As will be commented in later, there are indications that over the decades researchers have had to increase the amount of their available time to read published research results but this is not in step with the contin-

Research Studies 33

ued, inexorable 3.5 % to 4 % annual growth in scholarly information output every year. As such researchers adjust to this challenge in a number of ways. Some do increase their time spent reading research publications. Others rely on other people, such as ‘information gatekeepers’, to keep them up-to-date with new published material. Yet others ignore the problem and continue their research oblivious of external developments. While yet others rely on search engines, portals and alerting services to keep them informed. Everyone copes with information overload in the sciences in their own way. One much commented on result of ‘information overload’ has been the adoption of simple but comprehensive search engines such as Google. The scholarly communication industry has to adapt to the emergence of such powerful search engines as a major finding tool, one with which they have little involvement. But what they do is help resolve, in their own eyes at least, the complex user’s information overload problem. This is in many respects only a partial solution, with the adverse consequence that it can lull the online searcher into a false sense of comfort that he or she has cracked the ‘information overload’ problem.

3.5 Research Studies Despite the above, over the years there have been a number of unsystematic and uncoordinated approaches taken to feeling the pulse of the research user community. Individual studies have been conducted, often with a partial need or objective in mind. Libraries and library associations have commissioned studies which look at user requirements from the point of view of the library and its operating needs. A few studies have also been commissioned by publishers, looking at the situation from the viewpoint of a content supplier and developer of an editorial programme. Few have adopted a truly holistic approach with the overall and balanced needs of the research community in mind. Even fewer have speculated on how the many trends impacting on the researcher will create a new paradigm either in the short or long-term future. This is the particular space being approached by this book. 3.5.1 Industry wide Studies Tenopir/King research A groundbreaking expose’s of the changing nature of scholarly publishing was made by professors Carol Tenopir and Donald King in their book published in 2000 entitled ‘Towards Electronic Journals’. This brought together many studies undertaken by the authors over a thirty year period and which primarily focused on the motivational aspects of researchers in various disciplines and how they adapted to publication formats. The focus of the book was on the participation of scientists, libraries and scientists in the journal communication process, ending with a large section on Electronic Publishing as it then existed (pre 2000). The commercial underpinning of the journal publishing sector was described, without passion and emotion. This book represents an important step forward in understanding the whole industry sector, one which has so far not been repeated. Frequent reference will be made in the following pages to some of the specific findings made by Tenopir and King.

34

End User Behaviour

However, the problem in using the Tenopiv/King book to explain current usage behaviour is that the studies were undertaken largely in an era where print publications still reigned supreme. It was before the full impact of large search engines, of the advent of global collaboration, of data-rich research output, of user generated media – these and many others have transformedthe nature and process of scholarly communication since 2000, in some areas almost beyond recognition. Many pundits in this area focus on technology as the key driver leading to the changing face of scholarly communication. It is our contention that it is the users and authors, who together adapt to a whole range of external developments including technology, which is determining the current evolution of scholarly communication. Administrative systems, changes in attitudes of funding agencies, new business models, absorption of practices originating in the entertainment sector (online games, music) are all as much responsible for changing users approach to information collection as technology itself. Collection development In the old days of print only, we knew very little about how researchers used published information in their daily routine. Librarians, who acted as custodians of the core scientific, technical and medical journal titles and books, often used basic methods to monitor usage – such as auditing publications left out for filing at the end of the day. But the main mechanism for selecting material for purchase by the librarians was feedback of recommendations from the faculty. It was often the case that decisions were made on collection development by librarians, as a result of such dialogue with faculty staff, on the principle of ‘he who shouts loudest gets the spoils’, rather than on the basis of sophisticated professionally endorsed metrics, efficacy and models. In the hybrid, and increasingly digital world, there are new means to check on usage patterns. Everyone going online leaves a ‘digital footprint’. These are the routes which people take in searching for and retrieving information from online bibliographic databases. Because every click on the computer’s mouse represents a decision point, these can be aggregated in their millions to form a picture of what actual use is being made of online publications within any particular institution or library. There are several centres worldwide which have developed sophisticated skills to count and analyse such traffic. One is the CIBER centre at University College London; another is the MESUR programme being run from Los Alamos National Laboratory in New Mexico, USA. Both have agreements with large database hosts, be they publishers or libraries, to enable them to follow the digital footprints left by users in conducting their research. The numbers involved – millions of instances – overcomes the more anecdotal approach which has bedevilled monitoring the print publication system. To help individual libraries understand the data supplied by publishers and intermediaries, the industry has created a body – COUNTER – which is trying to achieve standardisation in the analysis of digital footprints. But published articles online is only one source of material for the users in the new digital era. In molecular life sciences the researchers need access to the DNA, protein structures, macromolecular structures, pathogenetic mutations, etc. At present the latest data on these are all accessible online, and free, from centres in the USA (NCBI), from Europe (EBI) and from Japan. They have 7–10 core databases, with up to 160 other databases and 2,000 specialised resources. Increasingly researchers in bioinformatics rely on access to such raw data – the scale is

Research Studies 35

immense and growing. 200,000 users per month are accessing the EBI service and 1.3 million web hits per day is being achieved. Researchers in these areas are entirely dependent on electronic information. Public deposition of data in the public domain is not only common practice but is absolutely essential for researchers in astronomy, in physics, in parts of chemistry, crystallography, biosciences, etc. So we are entering a new world, where traditional methods of assessing usage in a print-based environment now have to be adapted to cope with new online challenges. And the old methods are no longer applicable. There is a need to look at the users of electronic publications in new ways, ways which are more in tune with how they are coping with the new media and formats. The Tenopir and King results need to be looked at as a valuable starting point, but not as a basis for extrapolation into what is in essence a totally new information world in this new millennium. 3.5.2 Library sourced initiatives For many years, libraries have shared their operational data. Through aggregation of data about how much usage is made of library collections various trends have been identified. In the UK this sharing of data has been facilitated by the centralised funding structure within UK higher education. A key point was the Follett Review which was undertaken in 1993 by a committee chaired by Sir Brian Follett, then vice-chancellor of Warwick University. As part of this study the shortfall in library budget expenditure was explicitly highlighted. This shortfall can be summarised as follows: Table 3.2 Shortfall in UK Higher Education Library Budget Expenditure Item Textbooks Research monographs Journals CD-ROMs

1986/87 −6 % −13 % −11 % 0%

1991/92 −11 % −18 % −15 % −21 %

One key conclusion was that the government should invest £ 20 million over three years in a programme which would have as its focus the application of information technology towards the resolution of university library’s operating problems. However, though this was a landmark study some fifteen years ago, its focus on the needs of users was largely absent. But there have been other centres which have developed our knowledge of user needs. The eJUSt report on E-Journal Users Regular studies prepared for the Stanford University Libraries by the Institute for the Future have been published. They focus on trying to understand the needs for e-journals mainly of people operating in the life science areas. Over a two year period data was collected using three quantitative user surveys, a Web log data mining analysis, and an ethnographic study of e-journal usage. This resulted in a number of outputs including a white paper on e-journal features and their use, and a further white paper on e-journals and branding.

36

End User Behaviour

Some of the key conclusions reached from these US-based studies are that it appears that online scholarly activity is shifting action beyond the journal article to the supporting matrix of data and alternative media. It was also established that users will pay for online access to selected journal titles but will not pay for individual articles (ie, document delivery). For successful transfer to an online information system, to achieve ‘tipping point’, the researchers at the Institute of the Future suggested that the following are required: • Users want deep and comprehensive archives of online material • Scholars use features which tie in more closely with the culture of their core

discipline rather than being generic in origin • Scholars need better mapping of the online scholarly landscape • Users want more choice in subscription and membership packages

Recommendations were given in the report for publishers to adapt to the new e-user needs during the next decade. These changes included: • New publishing structures and economies of scale should be adopted • There should be closer relationships by service providers with the user com-

munities • There should be new methods of measuring impact (not just citation analysis) • There should be more diverse peer-review models • There should be alternatives to reviewing articles instead of reliance on journal

editorial boards The recommendations for librarians include that they should ensure that their limited budgets should not be locked into a single publisher (through Big Deals), and that it should be recognised that the university or host institution is a content source of considerable value (leading to the creation of institutional repositories). The authors of the report also felt that the electronic journal itself should change. Before that the publisher should understand the range of community digital information needs and wants. From this a sustainable business model needs to be developed. There is a particular need to rethink the meaning of society membership. In making the e-journal available, consideration should be given to partnering with expert delivery providers. In addition the authors felt that greater activity should be given toward disaggregating electronic content, and also to design content for more flexible future options. But the underlying message is the importance of creating a critical mass of online content. In using e-journals both scholars and clinicians follow a high level consistent pattern. The majority of users start with a multi-journal specialist search engine such as PubMed rather than the web sites of individual journals. Aggregation of large amounts of online published material is important. At a deeper level, however, eJUSt studies found that scholars use e-journals in idiosyncratic and personalised ways. These ways are often determined by the culture of the subdiscipline within which they operate, but also by whether they are searching for information, reading online or acting as authors. Each of these activity domains has its own individual and unique usage characteristics.

Research Studies 37

Faculty Attitudes at Univ California The University of California has also been active in driving through innovative information policies, as for example with its open access policy, its digital library (CDL) and the eScholarship repository within it. Again with the help of external consultants (Greenhouse Associates) the University of California’s Office of Scholarly Communication (OSC) has investigated faculty understanding of scholarly publishing issues. Over 1,000 University of California faculty (13 % of the total) participated in a questionnaire survey during 2006, and the results indicated that faculty were generally conventional in their behaviour patterns. The current tenure and promotion system impeded much change in faculty behaviour. They tend to see scholarly communication problems affecting others rather than themselves. Preservation of their current publishing outlet is an area of great concern to the faculty. Any suggestion of a centrally mandated policy on open access is likely, according to the OSC, to stir up an intense debate. Any centrally mandated change could undermine the quality of scholarship. On specific issues such as copyright, there is an acute disconnect between attitude and behaviour. Senior faculty are the ones who may be the most fertile targets for innovation in scholarly communication. (See ‘Faculty Attitudes and Behaviors Regarding Scholarly Communication: Survey Findings from the University of California’, August 2007. http://osc.universityofcalifornia.edu/responses/activities.html) What is intriguing is that the nearby Berkeley Center for Studies in Higher Education (CSHE) has also produced reports on the transition to digital scholarly communication and open access, and it appears that the conclusions made by these two centres are contradictory. While CSHE interprets its findings (based on in-depth interviews) as indicating that academic values stand in the way of progress, the OSC results show that institutional policies are the primary obstacle. (See the CSHE report: Scholarly Communication: Academic Values and Sustainable Models. C. Judson King, Diane Harley, Sarah Earl-Novell, Jennifer Arter, Shannon Lawrence, and Irene Perciali. Center for Studies in Higher Education, University of California, Berkeley. (July 27, 2006)) http://cshe.berkeley.edu/publications/publications.php?t=3

The main impression from comparing such reports is that there is little consistency in their interpretation of findings, even from organisations which are physically in close proximity and in the same time frame. Furthermore, it is also perhaps inevitable that such interpretations of user studies often come with an agenda. It may be necessary to disentangle the agenda from the interpretations. It would be disingenuous to promote the case for information policy change based solely on partial standpoints. But this is unfortunately often the case in the current state of electronic publishing. 3.5.3 Publisher commissioned studies CIBER Studies CIBER has its roots in deep-log analysis undertaken at City University, London, in the early 2000’s, and in subsequent years migrated their experience and techniques to the medical information market and more recently to scholarly communications. They also migrated physically from City University to University College London.

38

End User Behaviour

CIBER undertakes contract work on behalf of clients, in many cases this has been the individual publishers trying to get a clearer picture of the dynamic of their online users, but also publisher trade associations have commissioned research from CIBER. Millions of access records are scrutinised and matched against other sources to produce a comprehensive picture of the usage pattern of an online service. Two leading publisher trade associations, the International STM Association, together with the UK Publishers Association, commissioned an update to a 2004 study undertaken by CIBER of scholarly authors worldwide. This latter was also undertaken by CIBER and provided evidence-based results on what authors actually thought about the existing publication process. The updated survey achieved a 7.2 % response to the 76,800 targeted names of published research authors who received emailed questionnaires. There is a slight emphasis on the senior and more experienced end of the author spectrum as reflected in their appearance as principal investigators in the ISI source list of journal article publications from core research journals. Some of the key findings from the analysis by CIBER include: • Authors do not attach much importance to retaining copyright • High importance, however, is attached to peer review being undertaken • Authors believe counting downloads to be a slightly better measure of the

importance of an article than citations from other articles • Authors are now, as comparedwith the earlier 2004 survey, more aware of open

access, and are publishing more in open access journals (29 % in 2005) • However, a majority of authors believe that mass migration to open access

would undermine the scholarly publishing process • The split personality of the author community, as authors and as readers, is

reflected in their showing little enthusiasm for paying page charges, and the unwillingness by a large section to deposit their articles in institutional repositories. There is a hard core of ‘OA enthusiasts’ who make up 8 % of the author population. “This group is characterised by its youth, its geographical composition (with a strong representation from Asia, Africa and Eastern Europe) and a tendency towards more applied and clinical ends of the research spectrum” (CIBER). The other 92 % authors remain to be convinced. This survey is based on completed returns from 5,513 respondents. It complements the earlier survey by CIBER of authors (March 2004) which obtained 3,787 responses, and therefore offers a database of almost 10,000 returns from the author community. It is this hard evidence, rather than anecdotes, which will give a picture of what the author community really wants, and which should be recognised by policy makers in framing their future information systems. Meanwhile, the CIBER individual studies have identified that there are far more people ‘browsing’ the online systems than engage in actual downloading of fulltext material for subsequent detailed analysis. These ‘browsers’ are referred to by CIBER as ‘promiscuous users’, flitting between sites and services, in a non-targeted way. They represent some 60 % of the users entering the Emerald publishing site, for example. What motivates them, and how can this activity be harnessed and made efficient? A wide range of questions can now be posited as a result of having

Research Studies 39

tangible, quantifiable evidence on the way scholarly communications are being used in the new Millennium. Other research undertaken by CIBER on Elsevier’s ScienceDirect users, is still in its infancy, but has shown that people spend more time reading shorter (4–10 page) articles online than longer (21+ pages) ones. Neither of the times, respectively 42 and 32 seconds, suggest anyone was really doing anything more than scanning online but clearly people were spending more time on shorter articles. Do shorter articles have a better chance of being read? The assumption, of course, is that people are downloading and actually reading (especially the long articles) off line, but what evidence do we have that this in fact occurs in the case of all downloads? “Maybe the behaviour constitutes a form of digital osmosis, with users believing that if they have downloaded it they possess the knowledge contained, some magical process occurs by which the contents are uploaded to their brains the next time they log in?” (Professor David Nicholas, CIBER) Add this to other pieces of evidence that CIBER has collected and we might question the assumption that everything downloaded is actually read or used. For example, in answer to a question ‘Do you always read the full paper before you cite it in your work’, 49 % of the 1,000 nucleic acids researchers responding to a CIBER questionnaire for Oxford University Press said it depended, and a further 9 % no. “These were perhaps the honest ones”. Elsevier/Mabe research In recent years Elsevier has displayed a more open and collegiate attitude towards the scholarly communication community by sharing some of its commissioned research findings about users. One of the key investigations was undertaken for Elsevier by NOP and CIBER in 2005, and was a follow-up of a survey undertaken by the late professor Bryan Coles from Imperial College London. The Royal Society published the latter study in 1993. It provided the baseline report against which Elsevier compared attitudes of authors in 2005. The reporting and analysis was

Figure 3.1 Changes in motivations of authors over a 10 year period

40

End User Behaviour

performed by Michael Mabe who, as current chief executive of the International STM Association, is a leading figure in the analysis of the electronic publishing scene. Professor Coles’ 1993 study, published before the World Wide Web was a dominant force in journal publishing, assessed UK researchers’ motivations for publishing. Knowing that people often say what they think their colleagues want to hear, Coles also looked at secondary motivations. Dissemination was found to be the most important of the primary motivations. However, the responses given as secondary motivations reveal some ‘less noble’ aspirations, including research funding, career advancement, recognition, establishing of precedence; all are consistent with the original Oldenburgian functions of the journal. Henry Oldenburg is hailed as the father of the scientific journals, the first being published by him through the Royal Society in 1665. How were these motivations affected by the advent of the Internet? The subsequent study by Elsevier undertaken in 2005 showed that they were not. The results on ‘motivations for publication’ in almost every category are similar, not only in terms of the primary/secondary split but also in terms of how secondary motivations are distributed. However, there are some key differences on how several motivations have shifted, particularly with respect to recognition and establishing precedence, areas that one could argue “have become more important in the electronic babble of the World Wide Web”. A key chart showing how authors changed in their motivations to publish is shown overleaf. Specifically, Mabe summarises some of the other findings from the Elsevier study as follows: • Concerning attitudes towards funding agencies, researchers are essentially am-

•

•

•

•

•

bivalent – 68 % thought that funding bodies have too much power over the research conducted and 50 % felt some pressure to publish in high impact journals. With respect to the number of articles published, quality was overwhelmingly felt to be more important than quantity and 70 % disagreed with the statement “it is better to publish a large number of papers than a smaller number of quality papers”. A paper recently published by Michael Mabe showed that, contrary to popular opinion, productivity rates of papers published per year, per unique author, are static or declining while the number of unique authors and co-authorship in particular is increasing. In assessing whether prestige or niche journals are preferred from the author’s point of view, there is a slight preference for prestige over niche. A significant minority of readers, however, believe that an article’s quality is not determined by the journal. Peer review is overwhelmingly supportedby virtually all the respondents, 88 % of whom agreed on the need for refereed journals. (However, a statisticallysignificant minority of physicists did not agree.) The majority feel that peer review improves an article, although there were some sceptics in engineering. 85 % were willing to review a reasonable amount of their peers’ research (from two to 30 papers per year), but 40 % felt that time constraints prevented thorough refereeing. Other constraints included being sent irrelevant papers, being

Research Studies 41

Figure 3.2 Differentials in motivation change over ten years

•

•

•

•

•

asked to review poor quality articles, not wanting to review twice for the same journal, and receiving papers from lesser-known journals. 42 % thought continuous review was important, but 32 % had real reservations regarding continuity over time of a published source, consistency, and the amount of time required to revise based on continuous comments, some of which may lack relevancy. With regard to the role of the publisher, 60 % believed the publisher added value. Among the 17 % who disagreed there were significant variations: 26 % in computer science felt that the publisher did not add value. The same was true for 22 % of the respondents in mathematics. Researchers who had served on funding panels were also more sceptical. The fact that 17 % overall questioned whether publishers had a role raises some interesting questions. Informal sources of information (conferences, bulletin boards, emails, etc.) were still regarded as being important (although 21 % of the sample disagreed). Informal sources earned greater favour among computer scientists and physicists and less from chemistry and the life sciences. On the whole, collaboration seems to have increased, although this is less the case in physics, perhaps because large-scale collaboration has always been the norm. Reading behaviour has clearly undergone change since the pre-Internet studies. A significant minority preferred to do their browsing from home, but e-versions have not yet taken over. Most disagreed with the statement that “an article will only be read if available electronically” (54 % in the computer sciences, a large minority in the life sciences). The study revealed strong agreement that all supplementary data should be published. Notably, 75 % of respondents want access to others’ data, while only 52 % were willing to share their own – a hesitation based on fears about misinterpretation and/or misuse of data. The most likely primary motivation here is competition, however, with the main aim being to mine as much information from one’s own data as is possible before it is made public.

42

End User Behaviour

• 26 % of respondents always search authors’ websites for the full article, although

figures were somewhat higher for computer science, mathematics, economics, and for graduate students. About half of those in economics agreed that they would place a full version of the article on their website, whereas only a third in computer science and less than a third in mathematics would do the same. The advent of e-publication has replaced what was once a clear demarcation between published and unpublished material with greater obscurity and confusion between the stages of publication. We now have a non peer-reviewed draft, prepublication draft, author’s manuscript, uncorrected proof, final corrected proof (the ‘in-press article’), and final published article in paper and electronic, fullylinked format. According to Mabe, the final article is not only the most used but also the most important (the ‘Record of Science’ or the ‘Minutes of Science’). There are significant differences with earlier versions of the article and the extent of their use. There is concern that the increasing use of the author’s final manuscript after acceptance but before formal publications could create doubt around the value of the final paginated article. This is an issue pointed out in a study by Chris Beckett and Simon Inger (Scholarly Information Systems) which looked at whether librarians will cancel their subscriptions to the final published version as earlier versions become available. The results indicated that this transition or tipping point could be close at hand making it a major strategic issue facing the publishing world. Strong interest in the permanent archived record, especially with respect to articles published over ten years ago, was demonstrated in economics, social and earth sciences, mathematics, physics and astronomy. In these disciplines, a few older articles are considered classics in their field, provide an overview, avoid the repetition of research, provide longitudinal data, and show that ideas have not changed significantly over time in some specific areas. A topical issue is that knowledge and awareness of institutional and subjectbased repositories appears to be low among the respondents, with only 5 % knowing a lot about IRs and 28 % knowing a little. Whereas for subject repositories 9 % know a lot and 29 % know a little. Repositories have not yet entered researcher’s consciousness on a truly global scale. Attitudes to repositories among those who claim to know about them was varied and indicated concern regarding what the purpose of a repository should be, author attribution, repository funding and quality control. Mabe has claimed that behaviour is changing in relation to e-communication issues. Researchers are making greater use of technology and are using social networking software. New challenges apply to electronic peer review and global collaboration. That said, most of the fundamental drivers appear to remain unchanged and the four Oldenburgian functions (registration, certification, dissemination, archiving) remain present in journals. New tools are being used, but they are being used for old purposes. According to Mabe, and based on the results of this questionnaire approach, we are unlikely to see these fundamentals change unless the drivers of researcher motivations change radically.

Author versus Reader 43

3.6 Author versus Reader Another feature of the rich social tapestry of scholarly communication is that there is an inherent dichotomy within the key group of participants. On the one hand there are authors of publications who have distinct needs. On the other there are the users of the published information who often have different needs. Yet in many respects they are the same people. This dichotomy was explored during a survey undertaken by Elsevier Science. From the results of that enquiry it was concluded that there was a conflict between the two roles in the area of journal publishing. The very same person has a completely different requirement when they are an author from when they are a reader. For example, no author would publish his own laboratory book notes, but the same person did want to see other people’s lab books. Also all scientists want to publish more while the same people as readers say they want to read less – there is too much being published. This dichotomy is at the heart of much of the debate relating to key issues currently facing scholarly communication. For example it has reverberations within the highly emotive ‘open access’ discussions.

3.7 Digital Natives and the Millennium generation Much has been written about the different approaches to searching for and using information by those who have been born since the early 1980’s. Unlike the older generation, the digital natives, or the Millennium generation, have been brought up in a different socio/technical environment. They have adopted desktop access to electronic systems as easily as the older generation used pen or the typewriter; they are used to screen-based interfaces; multi-tasking comes second nature to them. They are registered users in Facebook, Myspace, Connotea, etc and are ready customers for virtual reality programmes such as SecondLife. What has this meant for the electronic publishing industry? Is it operating on a market base totally different from that of several decades ago? What was once viewed as a large general researchers market, distinguished by subject, location or age differences, has now seen an additional categorisation added. Due to the various changes referred to in the next chapters, there is a distinction to be made according to ‘generation’. This has loosely been defined as the ‘X’ generation being those born before 1980’s; the ‘Y’ generation being those born since before the 1990’s, and ‘the Google Generation’ as being those born since 1993. This typology highlights the fact that there have, potentially, been major behavioural changes taken place as a result of researchers and users adapting to the many changes described earlier. In a study funded in 2007 by JISC and the British Library, and undertaken by UCL Consultants/CIBER, the notion of the ‘Google Generation’ was explored based on a combination of evaluating relevant literature and assessing the results of log data from online users. The study was mainly focused on the impact of a generational change in user needs on the research library, but the assessments have broader implications. According to the UCL Consultants study there has been the emergence of promiscuous search behaviour, flitting between various information resources,

44

End User Behaviour

bouncing, and horizontal in nature, spending little time on each resource but downloading extensively and squirreling the results – a ‘power browsing’ mentality has taken hold. However, despite being exposed to new forms of visual and interactive entertainment at an early age, the Google Generation is not any better or worse in its information literacy than earlier generations. What it does do is rely on comprehensive ‘brands’ such as Google to be their first and in some cases only source for information gathering. GoogleGen makes little use of advanced search facilities, assuming that search engines ‘understand’ their queries. The study explores some of the myths about the technological excellence of the new generation of users but in general find some of the myths spurious. Whilst they may be more technologically competent this does not translate into adopting more complex applications. A separate study by Synovate (2007) suggests that only 27 % of UK teenagers could be described as having a deep interest in IT. The overall results from the UCL study suggests there has been an overestimate of the impact of ICT on the young (and underestimates its effect on the older generations). Information skills need to be developed, and there is a role for the library in doing this. Despite this study’s destruction of many preconceived notions of the differences between the old and new generations, it nevertheless still indicates a market which is changing – perhaps not as radically as was thought. It still raises questions about whether there are new way of accessing scientific information, and the proposition is not dispelled that the journal as we know it may increasingly be confined in its functionality to becoming the Record or Minutes of Science, and less a serious communication mechanism. For scientific communication new modes of behaviour may be in evidence.

3.8 Forecasts Given the UCL study, what new scientific communication patterns can we expect? There are a number of studies which attempt forecasts – which try to bring available evidence together – and give a ‘vision’ of where the electronic publishing industry is migrating to. 3.8.1 The Outsell View The current scholarly information industry is not hampered by considerations of scale – with an annual revenue of $ 16.5 billion according to David Worlock (senior research fellow at Outsell), this is smaller than pornography (and in the UK less than is spent on the consumption of chocolate!) But elements of it are growing rapidly, and those elements are not the traditional products and services which dominated the scene in the last century. The publication of data has grown at a rate of 41 % per annum, whereas journals have grown by only 7 % pa. In the Top Ten of scholarly information providers, still dominated by Elsevier with a 17.5 % market share, new names which trade in data (eg. WesternGeco/Schlumberger, IHS, Petroleum Geo-Services, TGS NOPEC Geophysical Company ASA and Veritas DGC) have emerged in the past few years to usurp organisations such as Springer and Informa from the top ten rankings.

User perceptions of Value 45

How will such changes work their way through to 2020? Although open access is limited at present, by 2020 it will be ubiquitous. Copyright is an issue now, but by 2020 it will disappear within a broader and more participative licensing environment. As will librarians, whose role will change from maintaining storehouses of printed material (a museum of the Book?) to being information support professionals. Publishers as we know them will also disappear, becoming instead providers of scientific support services. Only a few, perhaps three of the current players, will still exist as consolidation and market changes introduce new players to the industry. Data providers have been highlighted by Worlock as a key new sector. Other changes will be that vertical searching will emerge and the semantic web will become a reality by 2020. Much of the information creation will be done on a ‘community’ basis, using social networking. Journal articles will not entirely disappear but will, arguably, become fewer in number as a new multimedia item appears which has text as a hub and with links to a variety of related information resources. The main focus will be on workflow, as opposed to publication, as the main scholarly communication driver. This will be helped by the need for improved productivity in the research process, a decision-making mentality and the requirement for compliance. Compliance will become a new role for the library community to undertake.

3.9 User perceptions of Value Meanwhile, the net effect is that the electronic publishing industry is a cauldron of activity, and this is having a substantial affect on the way users adapt to the way they identify, retrieve, assimilate and store their required digital information. In order to adapt and change their ways to optimise the new electronic publishing environment end users need to feel that such changes are warranted. They need to have some way of comparing the net benefits from making a change in their traditional habits. In this context the concept of the value of information arises. If there is sufficient quantifiable evidence, which leads the user to believe that new and emerging electronic publishing systems offer better or more effective services over those that they have used in the past then the scope for user adaptation becomes greater. And if enough people believe it, the scope for ‘tipping point’ to be reached becomes stronger. This next chapter explores some of the relevant metrics currently available to the information industry that underpins the assessment of use of information by researchers.

Chapter 4

Measuring the Value of Information

4.1 Background As indicated earlier, until the emergence of online information systems, there was a dearth of quantified data to support decisions being made in libraries, by publishers or even by intermediaries about their separate services and operations. Fingerspitzengefühl (fingertip feeling) was usually the main procedure used for making major strategic decisions in publishing particularly by the many small and medium sized publishers. As library budgets tightened, so the need for better quantification of return on investment was demanded of publishers, and librarians sought better allocation of their budgets based on proven user demand. There are a number of methods currently in use for measuring the output of scientific research. 4.1.1 Peer Evaluation Most funding agencies and national centres have relied on experts in the subject area to give their views on research proposals by researchers. The experts will have the experience, expertise and subject knowledge to assess whether a proposal meets the objectives set by the funding agency in dispersing its funds. This may be considered an expensive and in some instances a flawed system, but it is the best which funding agencies can rely on to assess ‘value’. It is a collegiate system, transparent in its process, and accepted by the academy. But its metrics is based on subjective assessments by a small group of experts, and not a widespread quantification of use. Practical examples of where citation analysis has created some problems have been with “The Bell Curve,” which received a tremendous amount of attention, most of it quite negative, or articles touting “cold fusion,” an equally controversial topic, or “intelligent design.” Peer review can also be slow and is often expensive to administer. Nevertheless it has been the basis for research fund allocations over the years. Only now is it being challenged (see research assessment exercises below). 4.1.2 Citation Analysis Traditionally the methodology surrounding the quantification of science and scholarly information was an output-focused procedure, with the emphasis on assessing how the results of research improved society. Societies at large, and funders

48

Measuring the Value of Information

of research, are more interested in the tangible benefits which accrued from R&D investment either in the short, medium or long term. There was little interest in auditing the ongoing process. This was evident in the lack of concern about duplication of research effort – one study in the 1970’s put the amount of duplicated research in physics at 7 %, but this was three decades ago and no similar large scale evaluation of duplication in global research funding has since been made. Measuring the effectiveness of the scholarly publication process has been left to private enterprise to pursue. This analysis was primarily performed through the citation impact factor as developed in the 1960’s by Dr Eugene Garfield, one of the leading bibliometricians of our time. He built a company, the Institute for Scientific Information (ISI), on his belief that the more a research article was cited by others (in their references) the more likely the cited article would have greater ‘quality’ and impact than an article which had fewer citations. This crude assumption was refined to allow for a number of specific distorting factors, including how recent the citations were, and also taking into account differences in traditional practices between disciplines, etc. These adjustments were incorporated into what has become the Journal Citation Index. Based on the references contained in the articles from some 8–9,000 pre-selected journals, a huge database of references or citations were amassed each year by ISI and this has become the basis for the JCR. It has become the basic comparative indicator between similar journals and is used by librarians as part of their selection criterion for acquisitions, and for publishers in particular to stress how their journals are better than the competition. Until recently, therefore, the only metric available to make value judgements about a research project has been citation data, culled from the references in related journals. In effect citation aggregation was so full of conflicting subjective drives that the value of the assessments made on the basis of citation counts, impact factors, frequency indices, etc, was at best spurious. But it was all that was available, and Research Assessment Exercises and other national mechanisms for judging a scholar or researchers’ scientific worth were predicated on it. Change, however, is sweeping through the bibliometric system. For many years there has been concern voiced by some in the scholarly information fraternity that citations were not a good measure of ‘quality’ of journals. The basic tenet was questioned – citations were made after a research article was written but well before it was published and even more so before it was picked up in the ISI database. The citation pattern therefore reflected a partial view of the author’s inclusions several years earlier. Also, not all citations were listed in the article’s references, there was the so-called ‘halo effect’ of giving preference to one’s own or colleague’s works and not being impartial. An early critique of citations was made by Michael and Barbara MacRoberts in JASIS in 1989 (40 (5):342–349) in an article entitled ‘Problems of Citation Analysis: A Critical Review’. They list the following as reasons why citations are not a perfect metric: • Formal influences, references to those external sources which are critical to the

study are not always cited, albeit often as an oversight. But such omissions disturb the real citation matrix. • There is biased citing • Informal influences are not cited. Only formal literature is referenced even though discussions with peer groups is often a more powerful influence on the research

Background 49

• Self-citing occurs. Some 10 to 30 % of all citations are by the author to his/her

own earlier work • There are different types of citation. Some are affirmative, some are negative • Variation in citations depend on type of publication, nationality, time period,

size and type of speciality. For example, engineering has low references per article; biomedicine very high; Russian papers carry 75 % more references than western papers, etc. • Multiple authorship. ISI only enters the first author yet in many countries the first author is not always the main researcher • Different naming variations that refer to the same person also corrupt the database. There are also clerical errors to contend with. All of this suggests that in order to make citation analysis a realistic mirror of the pattern of relationships between articles each article and each citation or reference needs to be analysed separately for its completeness and relevance. The sheer weight of citation numbers precludes this from taking place. Even the current weight of numbers within the key ISI databases does not necessarily iron out the many inconsistencies. To give due credit to Dr Garfield, he felt that many of the applications currently being made of the journal citation impact were not what he intended when the measure was first developed. However, it filled a void in the metrics of article comparison and has since been used injudiciously and inappropriately on many occasions. Also, Thomson Science (the current owners of the ISI database services) claims that the death of the journal impact factor has been greatly overestimated. In particular it offers longitudinal consistency as well as having done much to raise the quality of the journal. The fundamental issue is that there is a crisis in our understanding of value, and this is reflected in the available choice of a metric system. Citation analysis has been adopted almost out of necessity, because nothing else was available, rather than as a preferred choice because it fulfilled all requirements. 4.1.3 Document Downloads In the early years of the millennium we have seen the arrival of tools which bring us into phase two in the evolution of electronic publishing in the scholarly sector. With the change from a print only paradigm to a hybrid print and electronic information system, new measures of efficiency and effectiveness of published research results are possible. One particular example is the emergence of counting downloads of articles in digital form from a host server. This has the advantage that it is based on actual, current activity around the published articles. Whilst it is not possible to gauge the subsequent use being made of the downloaded item, it nevertheless offers an additional perspective on what the community thinks of the particular research study. Measurement of downloads from document servers is at an early stage in development. As was mentioned in the previous chapter, the two centres which are attempting to harness this technique and make it a credible metric tool are CIBER, based at University College, London, and MESUR at the Los Alamos National

50

Measuring the Value of Information

Laboratory in New Mexico, USA. Both organisations are developing standards and practices, but the greatest challenge facing both is to obtain access to download data, whether under subscription or free. This data must be as comprehensive as possible to make the interpretations realistic. It can only work if the ‘wisdom of the crowds’ stratagem is adopted. A study presented by University of Southampton staff (March 2005) found that early download counts predict later citation counts. Whilst it is understood that download counts do not mean the same thing as citation counts, they are easier to collect, they correlate with subsequent citation counts, and they are boosted by the growing open access movement. One problem with the measurement of downloads, or the broader web log analysis, is that there are differing standards and definitions of what is a download or an online usage. A Publisher and Library/Learning Solutions (PALS) project now known as COUNTER is addressing this problem. The project is trying standardise the procedures and terminology for these measurements. COUNTER The COUNTER project was an industry initiative begun in 2002 that saw the development of two codes of practice to improve the quality, comparability and credibility of vendor-based user statistics. The main elements of the codes of practices include: • Agreement on terms used among publishers. COUNTER has established a clear

set of definitions. • Establishment of specifications of the usage reports which vendors deliver to

libraries (which includes appearance, timing of delivery, etc). • Set data processing guidelines (e.g. date/time filters to address multiple re-

quests). • Implementation of an independent audit of reports and processes. • Creation of compliance procedure.

The first Code of Practice, for online journals and databases, was released in January 2003 and an improved and validated version became available in January 2006. It has been widely adopted and 60 % of Science Citation Index (SCI) articles are covered by the COUNTER code of practice. In 2006, the second code of practice was extended to cover online books and reference works. This proved more challenging to develop as it was harder to define the most meaningful content unit, and the compliance rate is still currently low. General Report 1 (‘Full text article requests by journal’) has led to the creation of two types of metrics: local and global. Local metrics are useful for individual libraries and consortia and can be consolidated at the individual journal, collection or publisher level. COUNTER is also considering the more global measures that could be of use to authors, funding agencies, libraries and publishers. In 2005, JISC funded a national review of online journal usage which led to the creation of a metric (the basis of which was the number of full text article downloads) applicable to all academic institutions in the UK. COUNTER looked at data for 17 participating institutions that were analysed in terms of usage, range of journals (high, medium, low), price bands and subject categories. Among the metrics derived, the most important was the cost of full text article requests and cost

Background 51

per user. Among other conclusions were that the cost of downloads varied greatly across institutions. (A summary of the report is available on the JISC website.) The UK Serials Group (UKSG) then undertook a phased study on the use of metrics within the UK library system. It determined that downloads were useful as a counterweight for impact factors, particularly in fields which were not included extensively in the ISI database. Small, specialised areas with highly-targeted journals had not been included in the citation system. One of the main recommendations was that the metric system needs to be simple and easily understood and requires better definitions on usage, the usage period (for example, at least one year), etc. The result was that around 70 % of web respondents agreed that usage data was a useful metric. The biomedical information sector was very supportive, with social scientists less so. The next step is to adopt the recommendations made by UKSG. A project working group has been formed. Dr Jan Bollen (MESUR/LANL) has been approached to look at issues relating to fraud and the versioning issue, as applied to institutional repositories. A web site has been created where the latest updates can be found (http://www.uksg.org/usagefactors). Over 60 publishers and host services are now involved in COUNTER and the latest development is to introduce an auditing service to COUNTER to ensure credibility. SUSHI As indicated above, usage statistics are widely available from online content providers and Project COUNTER has provided useful guidelines for counting and reporting usage. However, whilst the standards issue is being addressed, the statistics are not yet available in a consistent data container and the administrative cost of individual provider-by-provider downloads is high. SUSHI stands for Standardized Usage Statistics Harvesting Initiative. It is a protocol (and a proposed standard) that can be used by electronic resource management (and other) systems to automate the transport of COUNTER formatted usage statistics. The SUSHI protocol is based on the standard web service protocol SOAP (Simple Object Access Protocol) that is used for exchanging messages between two networked computers. The SOAP specification is maintained by the World Wide Web Consortium (WC3). Electronic Resource Management (ERM) software has been developed for the specific purpose of managing a library’s electronic resource collections and subscriptions. ERM systems, which can be standalone or directly tied to a library system vendor’s other modules, usually track the lifecycle of an electronic resource. With the emergence of ERM the ability for storing usage statistics has accelerated the demand from libraries to create management information based on these statistics. A number of non-ERM vendors are also interested in developing business models for the consolidation of statistics for library customers. All of the interested parties are seeking a standard model for machine to machine automation of statistics harvesting. The primary benefit of SUSHI is that it automates a tedious and repetitive process. Current practice for statistics retrieval calls for library staff to go to each individual publisher’s website and retrieve statistical data. In some cases, this data is in COUNTER format, but sometimes it is the publisher’s own internal format. Occasionally it is available only through a web screen and cannot be downloaded, only

52

Measuring the Value of Information

printed. The SUSHI protocol automates the process but also, by default, causes the publisher to put usage data into a standard format (COUNTER XML). Therefore the retrieval is not only automatic but far easier to use. A cross-industry group of solution-seekers has emerged to apply their skills to building this model. Participants from libraries, ILS vendors and online content providers have collaborated to develop a model that includes an automated request and response methodology for usage statistics. The request and response mechanisms have been designed within a web services framework. All this work indicates that usage data has become centre stage in assessing the quality and value of scholarly publications. Effect of Robots There is some concern that usage statistics (downloads) are inflated by Internet robots and crawlers. These automated services can artificially push up the usage statistics without providing a true indication that the additional traffic results in relevant use of the site or title. As Dr Peter Shepherd, Director of COUNTER, has commented on some listservs, there are two aspects to the robot/crawler issue. One is that usage is controlled within a library Intranet which the robots cannot access, and therefore the download statistics which COUNTER focuses on are a fair reflection of use on that site. The second is that usage data collected from an open access environment, or on other platforms without access control. In such cases the effect of Internet robots should be taken into account, as there are likely to be platforms that have no filters in place to protect against the inflationary effect which such robots have on usage statistics. (It may, however, be justifiable to consider the sweeps which Google, Yahoo and other scholarly focused search engines as legitimate additions to the usage data). COUNTER is investigating possible solutions to mitigate the effect of the more generic and less scholarly-focused Internet robots on open access sites. This becomes increasingly important as the prospect of COUNTER usage statistics being reported in open access and global situations grows. Case Study – The Mesur Project The Andrew W. Mellon Foundation in the US is currently funding a two-year project at Los Alamos National Library (LANL) to develop metrics derived from scholarly usage data. The Digital Library Research & Prototyping Team of the LANL Research Library will carry out the project. Dr Johan Bollen is the principal investigator and Dr Herbert van de Sompel serves as an architectural consultant. The project’s major objective is enriching the toolkit used for the assessment of the impact of scholarly communication items, and hence of scholars, with metrics that derive from usage data. The project will start with the creation of a semantic model of scholarly communication, and an associated large-scale semantic store that includes a range of scholarly bibliographic, citation and usage data obtained from a variety of sources. Next, an investigation into the definition and validation of usage-based metrics will be conducted on the basis of this collection. Finally, the defined metrics will be cross-validated, resulting in the formulation of guidelines and recommendations for future applications of metrics derived from scholarly usage data.

Background 53

Project results will be made public on the project’s website http://www.mesur. org/.

The metrics being investigated include citation and impact factors (with recognition that these can be some 18 months in the past), and usage data. The former data comes from Thomson Scientific, the latter from the University of California usage returns from 23 campuses. Some 70 million ‘usage events’ have been collected. MESUR attempts to create a representative metric system, complete with an ontology to enable the various data and data sources to be organised and made comparable. The de-duplicated data is granulated into subject, predicate and object which then allows the data elements to be interrogated and networked, a prerequisite for the semantic web approach to data analysis. The ultimate aim is to create a system that results in a representative metric. One of the anomalies of this approach is that each data supplier adopts a different way of providing and counting events. Some have inbuilt duplication detection and elimination whereas other suppliers do not provide raw information. To create a consistent statistic all suppliers of usage data should adopt the same procedures and methodology. There is now a mechanism that may enable this to happen. 4.1.4 Focus Groups and investigating individual usage patterns ‘There are lies, damned lies and statistics’ – some claim that the use of statistics in this area is deceptive. What do people do when they come to a site? Is an article on medical science intelligible to someone who is not trained in the field? Unqualified numbers being bandied about from small slices of a download activity must be treated with care. “Access means nothing. Understanding is everything. We have no measurement for that.” (Joseph Esposito, liblicence listserv, July 2007) One way to try to get closer to the ‘understanding’ of what is being accessed and used by individual researchers is through a critical incidence approach. ‘Critical incidence’ involves asking readers about the last articles they read. In one such survey conducted by professor Carol Tenopir the readers/users were also asked what the principle purpose of reading the article was and the responses to this were: • • • • •

For research purposes −50.7 % For teaching purposes −20.6 % For current awareness −8.9 % For writing −10.5 % Other purposes −9.3 %

Their opinions of the value in reading journal articles varied – they included ‘inspiring new ideas’ – 33 %, ‘improving results of existing work’ – 25 %, and ‘changing the focus of the research effort’ – 17 %. The number of articles read varied between disciplines, with medical researchers spending an average of 168 hours reading journals. Those researchers who were high achievers were found to read more than others. Articles still remain the primary source for an investigator – they currently tend not to rely on blogs, wikis and listservs, but use these as secondary resources, according to professor Tenopir. Older articles are judged to be more valuable, particularly by students. See: http://www.utk.edu/∼tenopir/uresearch/survey instruments.html.

54

Measuring the Value of Information

These opinions were gleaned by inviting users to give feedback based on the last work they had read. Individual opinions can be misleading, but biases can be reduced through involving as many users in the survey as possible. However, collecting evidence or data from this method can be time-consuming and expensive. It will rely on sampling techniques which also reduces the potential reliability of the results. Nevertheless, it offers an assessment based on looking at what happens at an individual level. 4.1.5 Document Delivery statistics This metric has largely been overtaken by time. During the 1980’s and 1990’s it was a potentially valuable, though sorely under-used, source for identifying user behaviour. Centres such as the British Library Document Supply Centre processed some 4 million requests per annum by the late 1990’s, and the demand patterns for certain types of journals, in certain disciplines, gave an indication of ‘popularity’ of the title in question. The Canadian and French scientific and technical national libraries (CISTI and INIST respectively) each processed an additional one million requests per annum. The results gave a large data resource to mine for trends and understanding user behaviour and perceptions of value. However there are three main reasons why such a metric no longer appears as a standalone crucial source for usage data analysis. • Firstly, none of the three main national library/centres, nor any of the other doc-

ument delivery agencies (see later), were set up to provide document delivery statistics in a way which was meaningful to a broader usage group, and particularly not for strategic analysis. Their operations were focused elsewhere – in the daily task of turning round individual document requests as quickly as possible preferably within 24 hours of receipt of the request. Collection of traffic data was of secondary concern. • Secondly, as was reported in the previous chapter, Big Deals which came into being in 1998/99 have tended to wipe out document delivery as a growth industry. • Finally, and also related to the previous point, there are dynamics within document delivery supply which run counter to the overall market trends which the industry is seeking to identify. For example, in the 1980’s the most ‘popular’ publisher in terms of its titles listed on the BLDSC request pattern was Elsevier – with almost half of the Top Ten requested titles being from Elsevier Science. Now Elsevier titles barely appear in the top 50 as consortia deals, changes in library acquisition practices, etc, have raised the specialised esoteric and small journals to the most requested titles listing. This indicates that there has been a change in the role of document delivery during the past decade. Initially document delivery was an essential part of the research library’s collection policies – to fill gaps caused by the inability of libraries to buy the large, expensive and core titles. Now document delivery as a process is more likely to be used to fill in gaps in requests for esoteric articles often highlighted from a search on Google or other industry-specific search engines. As such document delivery data can no longer be counted on as being as important as it once was as an indicator of usage behaviour.

Background 55

4.1.6 Questionnaires As reported in the previous chapter, one of the more detailed questionnaire-based investigations into user behaviour and the value of the scholarly publication system was undertaken by Elsevier among its authors in 2005. In the past questionnaires were printed documents, usually mailed unsolicited to a mailing list which was more or less relevant, and response rates of 1–3 % were generally considered satisfactory for large scale surveys. It was an expensive process and the respondents were not always an accurate sample of the target audience. More recent questionnaires, including the Elsevier project, rely on emails to achieve its reach. This cuts down the distribution costs of managing the study, and with the purchase of extensive email addresses from organisations such as ISI which collect them as part of their database record, a wide net of potential respondents can be approached. The objective of the Elsevier-funded study was to understand how the motivation and behaviour of researchers has been affected as the Internet reaches early maturity. It was the largest research project of its kind, with more than 6,000 respondents (representing all age groups in all subject areas and regions of the world) who replied to the online questionnaire. This was followed up with 70 in-depth phone interviews. According to Mabe, and based on the results of this questionnaire approach, we are unlikely to see fundamental change unless and until the drivers of change impact on the researcher motivations radically. Also, as was already described in the survey results, questionnaires have to phrase the questions carefully so that the issue of the respondent answering according to what they feel the questioner wants to hear, rather than what they themselves believe, has to be carefully considered.

4.1.7 Triangulation Given the above, the most we can do to measure understanding rather than just citation counts or downloads or questionnaire and interviews is take the best of each bibliometric measurement system and combine them in a logical to give what is a more likely level of ‘understanding’ in information usage. We are a long way from this and a true understanding of the value of information to the user community will remain a distant goal as long as we rely on only one bibliometric measurement tool. We need to combine all available methods in creative and relevant ways to achieve nirvana. 4.1.8 Scientometrics According to Professor Stevan Harnad (universities of Quebec at Montreal and Southampton) scientometric predictors of research performance need to be validated by showing that they have a high correlation with the external criterion they are trying to predict. The UK Research Assessment Exercise (RAE) – see below – together with the growing movement toward making the full-texts of research articles freely available on the web – offer a unique opportunity to test and validate a wealth of old and new scientometric predictors, through multiple regression analysis.

56

Measuring the Value of Information

“This includes measuring such features as publications, journal impact factors, citations, co-citations, citation chronometrics (age, growth, latency to peak, decay rate), hub/authority scores, h-index, prior funding, student counts, co-authorship scores, endogamy/exogamy, textual proximity, download/co-downloads and their chronometrics, etc. All these can be tested and validated jointly, discipline by discipline, against their RAE panel rankings in the forthcoming parallel panelbased and metric RAE in 2008”. According to Harnad, open access scientometrics will provide powerful new means of navigating, evaluating, predicting and analyzing the growing open access database.

4.2 Research Assessment Exercises 4.2.1 The United Kingdom’s RAE Most of the underlying interest in metrics revolves around the need to convince the grant allocating authorities that the funds they disperse will be in good hands and will be used effectively, and the main indication for this is to show that past research has been undertaken efficiently and successfully by the potential grant recipient. In the UK, one of the leading grant allocating agencies for research conducted in centres for higher education is the Higher Education Funding Councils of England, Wales, Scotland and Northern Ireland. They have relied on the Research Assessment Exercise (RAE) to help them make decisions about grant allocations between departments at universities. The research assessment exercise is a peer review process that supports selective funding and provides an indicator of comparative (and absolute) quality. In addition, the RAE provides information with international currency and recognition, and provides management information used within universities. The Research Assessment Exercise has been operating in the UK since 1986 and the next one is due in 2008. Seventy different subject areas are involved, with ratings from 1–5* (with 5* being the highest) given to institutions and departments by the assessment panels. The panels look at four submissions per staff member in making their judgments. The amount of reading is considerable – one typical panel member will be expected to read and assess 1,000 documents by August 2008. RAE has involved a continuing shift to serials (journals) as a form of output measurement – as much as 96 % in some subjects. Even with engineering, where conference proceedings loom large, the increase in articles has been from 57 % to 78 % and the humanities have seen an increase from 33 % to 36 % in the serial component. However, in early 2006, the UK Government announced changes to the RAE process, a reflection of its perception that the RAE refereeing mechanism is both expensive and burdensome. The UK Government does not appear to trust peer review, and as such wants a more ‘metric’ approach. This means that the 2008 RAE will be the last in its current form. The proposal is to scrap the current system based mainly on peer review – to scrap the panels that do the assessments – and to base future funding on quantitative measures.

Research Assessment Exercises 57

4.2.2 Criticisms of UK’s RAE There are indeed problems with the current system of research assessment: • Not all journals are included in the ISI database for citation compilation (8,500

• • • • • • •

out of over 23,000 published, refereed, active scholarly journals) and 75 % of these are in the sciences. The structure of citations differs between fields, and normalisation of such data between subjects is always likely to be problematic. STM accounts for only six of the main subject areas (out of 17) for 2008. The ‘fear factors’ have to be considered and how this affects citation rankings. Comparisons between metric evaluations and peer review are not always feasible. Peer reviews have more built into them than just citation-based counts. Researchers are being forced to publish in high citation impact journals (generally with low circulation or communication). Self-citation needs to be weeded out of the statistics, but this raises the question of who checks on this.

It also became evident that there are different types of articles or papers, which have varying life spans. For example, ‘Mayflies’ (papers that peak early and die quickly), ‘Sleeping Beauties’, ‘Negative Citations’, and other editorial influences make the citation metric a real problem. Another leading concern about the RAE process is that long-term research would be penalised by the need to produce outputs that need to be judged or assessed. This will give rise to ‘indicator chasing’ which in turn will affect staffing decisions. As a department is judged by the quality of its output, it will focus on recruiting quality people (like a transfer market in football) and will discriminate against younger staff with shorter track records. So what sort of metric should be used for RAE purposes? In the government’s annual Budget statement of 2006 the then UK Chancellor of the Exchequer cited many of the same concerns, including cost and discouragement of interdisciplinary work. One new point he offered was that it discourages user-focused research. But why did these concerns lead to a demand for a change in traditional practice? Dr Bahram Bekhradnia, Director of the Higher Education Policy Institute and former Policy Director at HEFCE, has identified two arguments. The first is what the government claims to be the prohibitive cost. The government’s cost figure for the RAE – including compliance and opportunity costs in the institution – is £ 45 million. (An unpublished HEFCE internal audit report puts it as high as £ 100 million.) However, even taking the latter, higher figure, this is still less than 1 % of the total grants to be allocated (of over £ 1 billion) each year. Bekhradnia knows of no other resource allocation system in the world (including the research council system in the UK) where the cost to administer is less than 1 %. Thus the cost argument has not been thought through. And it is counterproductive to suggest that one can have an assessment process that does not affect behaviour. The other argument for moving to metrics is based on the high correlation between Funding Council grants and the separate allocation of research grants by the Research Councils. But that correlation is a function of size (big universities get more research council grants). For the government to argue the need to eliminate

58

Measuring the Value of Information

the RAE and move toward metrics based on this strong correlation is both bogus and fatuous, according to Bekhradnia Dr Bekhradnia is concerned with three issues: scope, process, and substance. With respect to scope he has noted that the purpose of the RAE is to fund research selectively and to provide an indicator of comparative quality, information with international currency, and management information. The trouble is that it will only provide a basis for funding, not an assessment of quality. It tells you little about a department/university/individual other than the amount of money raised or number of publications it has produced. With respect to process, the proposal constitutes “one of the poorest pieces of policy making in thirty years”. The consultation paper failed to ask any important questions, it simply offered a list of alternative models. Concerns regarding substance are particularly worrying, claimed Bekhradnia. Quality judgements will be replaced with something quite different. Foreseeable behavioural consequences will include a rush for grants and staffing and equal opportunity issues. If this proposal goes through, the rate of failure for applications in the alternative Research Council grant process (currently 80 %) will increase even more, as will the cost. In the current system, quality is based on the whole submission (multiplied by the number of people in a department in which everyone counts to some extent). In a system where all that matters is the money brought in, those who bring in the money will be the ‘stars’ and the rest will go out of the window. It will require researchers and universities to offer cut-price research to industry. If all research money is dependent on getting grants and contracts, it will kill curiositydriven and unpopular research. The future will not necessarily be better than the past. What are the alternatives to the current situation? If we want a process that produces judgments of comparative quality, peer review must be at its heart. Metrics can indeed play a part and can serve as a trigger for peer review, but should not dominate. Bekhradnia predicts that citations will prove unworkable as the sole mechanism to distribute funds. The UK research evaluation system is therefore still unclear. CWTS (Leiden) is being used to prepare background data for HEFCE, and a report is awaiting publication. The new system would need to take into account: • Different patterns of publication, and how they are changing. • That researchers will still need access to published information. • That the system has to be dynamic and flexible.

In addition, the Universities UK (UUK) commissioned Evidence Ltd to investigate the whole issue of bibliometrics in the run-up to the changes to the research assessment exercise. 4.2.3 UUK report looks at the use of bibliometrics The UUK report published in early November 2007 outlined a range of factors that the higher education sector should consider when formulating the new framework for assessing and funding university research. The report discussed the proposed use of bibliometric indicators, such as counting journal articles and their citations, as a basis for assessing research. It assessed the use of bibliometrics in both STEM

Research Assessment Exercises 59

(science, technology, engineering and mathematics) and non-STEM subjects, and the differences in citation behaviour among subject disciplines. Professor Eric Thomas, Chair of Universities UK’s Research Policy Committee, said, “It is widely anticipated that bibliometrics will be central to the new system, but we need to ensure it is technically correct and able to inspire confidence among the research community. This report doesn’t set out a preferred approach, but does identify a number of issues for consideration. It is important that the sector fully engages with the changes to the research assessment process and we hope this report provides those involved with a basis for discussion.” Some of the points considered in the report include: • Bibliometrics are probably the most useful of a number of variables that could

feasibly be used to measure research performance. • There is evidence that bibliometric indices do correlate with other, quasi-

•

• • • •

• • • •

independent measures of research quality – such as RAE grades – across a range of fields in science and engineering. There is a range of bibliometric variables that could be used as possible quality indicators. There are strong arguments against the sole use of (i) output volume (ii) citation volume (iii) journal impact and (iv) frequency of uncited papers. ‘Citations per paper’ is a widely accepted index in international evaluation. Highly-cited papers are recognised as identifying exceptional research activity. Accuracy and appropriateness of citation counts are a critical factor. There are significant differences in citation behaviour among STEM and nonSTEM subjects as well as different subject disciplines. Metrics do not take into account contextual information about individuals, which may be relevant. They also do not always take into account research from across a number of disciplines. The definition of the broad subject groups and the assignment of staff and activity to them will need careful consideration. Bibliometric indicators will need to be linked to other metrics on research funding and on research postgraduate training. There are potential behavioural effects of using bibliometrics that may not be picked up for some years. There are data limitations where researchers’ outputs are not comprehensively catalogued in bibliometrics databases.

The report, The Use of Bibliometrics to Measure Research Quality in UK Higher Education, is available at http://bookshop.universitiesuk.ac.uk/latest/ Also, a prototype is being tested out in a London University to see how the allocation of funds measures up against research output. A number of elements will be removed from the judgmental process, including output from junior researchers, researchers undergoing a career break, and researchers operating at the interdisciplinary level. All these would have a negative impact on a truly metric-based system. The effectiveness of a metric-based system will depend on the nature of the metric selected and the feedback behaviour of researchers and scholars in the specific subject area.

60

Measuring the Value of Information

4.2.4 Australia’s research assessment exercise (RQF) Although the Australian government has released a description of Australia’s Research Quality Framework (RQF) that is similar to the UK’s RAE this all changed as a result of the elections held in November 2007. The first RQF assessment would have been based on submissions from the 38 Australian universities by 30 April 2008. Funding based on the assessment was due to follow in calendar year 2009. The next assessment would then take place six years later in 2014. However, as a result of the change to a Labor government it is now understood that the original assessment scheme will be scrapped. It is likely to be replaced by a cheaper and more metrics-based assessment, possibly a year or two later. The Labor Senator for Victoria and Shadow Minister for Industry, Innovation, Science and Research has claimed that the new Labor government “will abolish the Howard Government’s flawed Research Quality Framework, and replace it with a new, streamlined, transparent, internationally verifiable system of research quality assessment, based on quality measures appropriate to each discipline. These measures will be developed in close consultation with the research community”. Labor will also address the inadequacies in current and proposed models of research citation. The Howard Government had allocated $ 87 million for the implementation of the RQF and Labor will seek to redirect the residual funds to encourage genuine industry collaboration in research. Every university will have to have an institutional repository (IRs) to hold the full-text of research outputs. Around half of them already do. A request has been made to the Australian government for funds to establish repositories though it is unclear how this will be accepted by the new administration. There is likely to be a mad scramble in the smaller universities, with outsourcing and hosting repository solutions being very attractive. Under the old scheme all research output generated by all research groups would have had to be in the IRs for the RQF assessment. This could have amounted to 50 % of the university research production over six years.

4.3 The Future of BiblioMetrics As indicated above, there are a number of emerging metrics which are still to be evaluated. Dr Jan Bollen (Los Alamos National Laboratory, New Mexico) has researched usage and impact factors at California University campuses and claims that they are actually different. This difference in factors may pose problems if triangulation is adopted without careful consideration. However, there are even more new approaches being investigated to metrics: • y-factor (being developed by Bollen and van de Sompel, LANL) and based on

a combination of citations and PageRank algorithms. • Box-Plot. • Cited half-life.

Early in 2007, Carl Bergstrom, an associate professor in the University of Washington Department of Biology, reported on an iterative ranking scheme that he and his colleagues developed. Called the Eigenfactor, this new metric ranks “journals much as Google ranks Web pages.” At the Web site, Bergstrom and his team pro-

The Future of BiblioMetrics 61

vide data for the nearly 8,000 titles in the Journal Citation Reports along with about 110,000 other publications referenced by the JCR-covered serials. Bergstrom says they “use citations in the academic literature as tallied by JCR. By this approach, we aim to identity the most ‘influential’ journals, where a journal is considered to be influential if it is cited often by other influential journals”. The Eigenfactor algorithm simulates how researchers follow citations as they more from journal to journal through references based on the frequency with which each journal is visited. (“Eigenfactor: Measuring the Value and Prestige of Scholarly Journals,” College & Research Library News, May 2007, Vol. 68, No. 5). Jorge E. Hirsch, a physics professor at the University of California, San Diego, also devised “an alternative that appears to be a simpler and more reliable way to rank scientific output within a discipline than any now in use” (“Physicist proposes new way to rank scientific output,” Physorg.com). Hirsch’s h-index – the “h” stands for “high citations” – may have some shortcomings. For instance, it is more applicable to researchers in the life and physical as compared with the social sciences, but it can be used as another measure to evaluate people’s contributions throughout their careers. Hirsch’s original paper, published in the Proceedings of the National Academy of Sciences in 2005, outlines his arguments and algorithms, which, if applied more widely, may be used to evaluate groups of researchers as well. He concludes that his index estimates “the importance, significance, and broad impact of a scientist’s cumulative research contributions.” For a discussion of the h-index, see Wolfgang Glanzel, “On the Opportunities and Limitations of the H-Index,” Science Focus, 2006, Vol. 1, No. 1, pp. 10–11. Despite the above there remain a number of questions about the ‘what’, ‘why’ and ‘how’ of measurement of scientific effort. For example, what are we measuring – the articles, the issue, the journal? Although the focus seems to be on the article, some authors may want to consider parts of articles as the relevant standalone research entity. Authors need to think about what they want. Is this being looked at through the eyes of a librarian, or as a benefit available to society as a whole? It also has to be considered that the value derived from an article is a moveable feast over time – it may be more valuable in future years as and when it interfaces directly with specific research needs and other research results. The ‘why’ is that metrics are needed for a number of possible reasons – for example, to make informed decisions, to demonstrate a return on investment in a library’s collections, or to improve the level of service? There needs to be a focus on readers (users) and not on authors. It is also necessary to provide operational data for librarians, publishers, and researchers that is descriptive (not prescriptive). The ‘how’ is to use metrics derived from interviews, focus groups, feedback etc. The questions should not necessarily be generic – they should invite participants to describe the last item they read, and provide real solid information based on this. Time will tell if either of these alternative metrics will someday replace the ISI Impact Factor as the primary means of determining the quality of scholarly output and, consequently, the competitive ranking of scholarly journals in the marketplace. But their existence is clear evidence that some believe more reliable methods are needed. In 2006 a UK-based consulting group then called Electronic Publishing Services Ltd (now part of Outsell Inc) carried out a research project to evaluate what infor-

62

Measuring the Value of Information

mation and research existed in six key areas of the scholarly communication area. Their aim was to provide an audit of what as available, and highlight the gaps that existed in our awareness of the elements of the information chain. They found very poor data about buyers of scholarly journals and noted that existing market studies lacked comprehensive coverage. They also saw that more survey data provided directly from the publishers themselves (anonymised) would help to fill the gaps. They looked at a wide range of estimation on the supply side and the evidence here was also poor. EPS looked at journal usage where they found consensus only with respect to the incompleteness of the picture. These are based on user surveys that can be misleading as survey answers are not always honest. EPS found transaction logs to be a valuable form of analysis, but hard to obtain. Evidence regarding the value of citations and impact factors was inconsistent. The current marketplace tends to favour spot surveys over long time frames (longitudinal surveys) that makes it difficult to obtain real time series. There is positive evidence of the cost impact of alternative methods. One major gap was around user behaviour. In most web-penetrated marketplaces we are returning to the idea that the user has had their behaviour moderated by the nature of the web. What we do not know with respect to electronic publishing of scholarly material is how researchers now work. We assume that researchers work in the way they always did, that research methodologies have not been impacted and the web just exists. Whilst this is probably not so now it will certainly not be so in future. The nature of research is changing. How can we begin to design the service without knowing about behaviour modification? How can we know whether the publishing value models that we are playing with fit the evolving behavioural model of how research is accomplished? What are we doing? We are going into the past to look at the usage patterns of the past, at the way in which things have traditionally been done. We need to track the changing behaviour of a networked research world going into the future. We need to move from phase one through to phase three in the emergence of electronic publishing. Only if we have our fingertips on this in our libraries and publishing houses will we actually be able to respond to those pressures.

Phase Two

Chapter 5

Electronic Information Industry Structure

5.1 How much Information? ‘How much Information?’ is the partial title of a report released in October 2003 by the Regents of the University of California which attempted a rough quantification of the amount of information available in the world in 2002, and in some cases compared this with an earlier study undertaken by the same group which was published in 2000. The result is that the amount of newly created information each year is approximately 5 exabytes. Five exabytes is “equivalent in size to the information contained in half a million new libraries the size of the (holdings) of the Library of Congress”. In fact the study claims that magnetic media, mostly hard discs, are responsible for 92 % of this amount, with paper representing a mere 0.01 %. In 2002 it was estimated that the World Wide Web contained about 170 terabytes of information or 17 times the size of the Library of Congress print collection. Email on its own generated 400,000 terabytes of new information. The sheer growth of this information explosion is indicated by the claim that the amount of information stored on paper, film, magnetic and optical media has roughly doubled in the past three years. “Published studies on media use say that the average American adult uses the telephone 16 to17 hours per month, listens to the radio 90 hours per month, and watches TV 131 hours a month”. At that time, some five years ago, the US population used the Internet 25 hours and 25 minutes a month at home and 74 hours and 26 minutes a month at work – “about 13 % of their time”. In terms of the scanned equivalents of the world’s books published in 2002 there was 39 terabytes, whereas journals accounted for 6 terabytes. This compares with the scanned output of office documents, mainly from computer printers, amounting to 1,398 terabytes in the same year. Though these figures are large they are a snapshot of the true extent of information generated at the time, and a gross estimation to boot. They set the context more than offering a prescriptive solution to the problems or challenges facing the adoption of Electronic Publishing in scholarly communication per se. They demonstrate that there is a huge highway of development going on within which the specific STM situation is a mere pathway.

66

Electronic Information Industry Structure

5.2 The Information Industry The main information industry sectors can be summarised in the following table.

$36B

$83B

100%

$28B

$38B

Other

Other

Other

B2B

Market Research

Education & Training

Credit, Risk, Financial & Company Info

5-9%

7-9%

$3B $19B

$36B

$19B $14B

Total = $277M

Other Other

80

Other Other

Other

60

40

20 Time Warner

0

News

Search, Agregation & Syndication

STM

LTR

Professionals & Consumers Estimates future growth rate (2006-08)

2-4%

15-17%

NA 0-4%

7-9%

4-5% 4-5%

Figure 5.1 Professional Information Markets

The latest (preliminary results for 2007 issued by Outsell) indicate that the market size for the information industry was $ 381 billion in 2007. The industry continued to grow but ever more slowly. Industry revenues grew 5.3 % to $ 381 billion compared with 6.0 % growth in 2006 and 6.3 % in 2005. The sector which showed the slowest growth was the News providers. They witnessed a decline in overall revenues (by −2.5 %) but still represent in total one third of the overall information industry. The biggest gains were made by the Search and Aggregation services, notably by Google, Yahoo, MSN etc, whose growth in 2007 was 25.5 % over the previous year. Google in particular achieved a 57.7 % revenue growth in 2007, still significant but less that the dramatic 72.8 % growth achieved in 2006 over 2005.

5.3 Corporate Size The Top 10 information companies include representation from the Search, News and STM areas. According to Outsell’s preliminary results, the Top 10 companies and their 2007 size and growth were

Organisation Google Inc Reed Elsevier Pearson plc

Information Industry Companies Preliminary Worldwide Revenues 2007 Revenues (preliminary) $ 16,700 mil $ 10,621 mil $ 7,116 mil

2007 % growth Over 2006 57.5 % 6.9 % 8.0 %

The Scientific, Technical and Medical Information sector

Yahoo Inc Thomson Corporation Gannett Co. Inc Mc-Graw Hill Bloomberg Reuters Group Wolters Kluwer

$ 7,050 mil $ 7,037 mil $ 6,914 mil $ 6,891 mil $ 5,170 mil $ 5,135 mil $ 4,920 mil

67

9.7 % 11.0 % 3.6 % 11.6 % 10.0 % 8.6 % 6.1 %

To quote from Woody Allen – “Some drink deeply from the fountain of knowledge; others merely gargle”. The question is whether publishers are deep or shallow drinkers from the emerging trough of informatics. There is certainly a significant amount of gargling going on, experimentation is rife. In 2006 the main publishers and their size are shown below. With headquarters in 14 countries and operations that span the globe, the world’s 45 largest book publishers generated revenue of approximately $ 73 billion in 2006, according to a ranking commissioned by the Publishers Weekly and other trade magazines in several countries, including France’s Livres Hebdo. Since most of the largest publishers mix books with other media properties, such as journals, databases and various forms of digital products, the listing includes sales from all of these activities, in addition to books. Magazines are excluded from the figures, as are units that are not connected to publishing, such as Reed Elsevier’s Business division. The largest publisher in this ranking is Reed Elsevier, with revenue of $ 7.60 billion, followed by Pearson, which had sales of $ 7.30 billion. Overall, the top 10 publishers generated approximately $ 48 billion, about two-thirds of the $ 73 billion in sales recorded by the 45 publishers included in the ranking. The companies that dominate the top of the list were built largely through acquisitions, and this process has continued into 2007, with several major transactions. Among the biggest is Reed’s agreement to sell its Harcourt Education group, a move that will drop Reed from the top spot in 2007 while boosting the ranking of Pearson (which acquired Harcourt’s testing and international operations) and Houghton Mifflin Riverdeep (which bought the U.S. education group, including the trade and supplemental divisions). Despite the challenge facing publishers from open access, the e-Science movement, the rapid growth of digital datasets as sources for ‘information’, and social collaboration and networking of informal communications, traditional publishers should not be written off too lightly.

5.4 The Scientific, Technical and Medical Information sector The STM industry has begun to split into two segments. There are the traditional print-based STM publishers whose overall growth in 2007 amounted to 5.6 %. Then there is a new breed of data-centric publishers which currently are focused on the geophysical sector. The top five companies in this area have exploded on the scene in recent years and achieved a revenue growth in 2007 of 56.3 %.

68

Electronic Information Industry Structure

Table 5.1 Publishers by Size (in US$ ‘000) Rank Publishing Company (Group or Division) 1 2 3 4 5 6 7 8 9 10

Reed Elsevier Pearson Thomson Bertelsmann Wolters Kluwer Hachette Livre McGraw-Hill Education Reader’s Digest Scholastic Corp. De Agostini Editore

11

Holtzbrinck

12 13 14

Grupo Planeta HarperCollins Houghton Mifflin

15 16 17 18 19 20 21

Informa Springer Science and Business Media Kodansha Shogakukan Shueisha John Wiley & Sons Editis

22 23 24

RCS Libri Oxford Univ. Press Kadokawa Publishing

25 26 27 28 29

Simon & Schuster Bonnier Gakken Grupo Santillana Messagerie Italiane

30

Mondadori (book division) Klett Cornelsen Harlequin WSOY Publishing and Educational Publishing M´edias Participations

31 32 33 34 35 36 37

Les Editions Lefebvre-Sarrut Langenscheidt

Parent Company

Parent Country

2006 2005 Revenues Revenues

Reed Elsevier Pearson plc Thomson Corp. Bertelsmann AG Wolters Kluwer Lagard`ere The McGraw-Hill Cos. Reader’s Digest Scholastic Gruppo De Agostini Verlagsgruppe Georg von Holtzbrinck Grupo Planeta News Corporation Houghton Mifflin Riverdeep Informa plc Cinven and Candover Kodansha Shogakukan Shueisha John Wiley & Sons Wendel Investissement RCS Media Group Oxford University Kadokawa Holdings Inc. CBS The Bonnier Group Gakken Co. Ltd. PRISA Messagerie Italiane The Mondadori Group Klett Gruppe Cornelsen Torstar Corp. Sanoma WSOY

UK/NL UK Canada Germany NL France US

7,606.30 7,301.00 6,641.00 5,995.60 4,800.90 2,567.50 2,524.00

7,217.60 6,807.00 6,173.00 5,475.60 4,386.20 2,137.20 2,672.00

US US Italy

2,386.00 2,283.80 N/A

2,390.00 2,079.90 2,089.10

Germany

N/A

1,594.84

Spain US Ireland

1,319.50 N/A 1,312.00 1,327.00 1,054.731 1,282.10

UK 1,271.14 UK/Germany/1,201.20 Italy/France Japan 1,180.92 Japan N/A Japan N/A US 1,044.19 France 981.50

N/A 1,088.10 1,253.85 1,176.63 1,093.95 974.00 1,008.96

Italy UK Japan

937.82 786.11 808.60

921.18 858.65 809.90

US Sweden Japan Spain Italy

807.00 769.56 682.89 635.44 629.20

763.00 N/A 756.99 545.22 N/A

Italy

571.35

552.50

Germany Germany Canada Finland

520.00 451.10 407.03 401.70

458.12 450.97 449.54 N/A

Media Participations Frojal

Belgium

381.16

391.56

France

342.29

293.80

Langenscheidt

Germany

338.00

N/A

The Scientific, Technical and Medical Information sector

69

Table 5.1 (continued) Rank Publishing Company (Group or Division) 38

Weka

39 40

Groupe Gallimard Westermann Verlagsgruppe

41 42

Kyowon Weltbild

43

La Martini`ere Groupe

44

Higher Education Press

45

Egmont (book division)

Parent Company

Parent Country

2006 2005 Revenues Revenues

Weka Firmengruppe Madrigall Medien Union (Rheinland-Pflaz Gruppe) Kyowon Verlagsgruppe Weltbild GmbH La Martini`ere Groupe Higher Education Press Egmont International Holding A/S

Germany

327.47

333.84

France Germany

309.40 303.94

330.14 294.84

Korea Germany

N/A 299.78

303.68 291.46

France

296.40

334.10

China (PR)

N/A

266.50

Denmark

260.00

232.70

N/A = Not Available. 1 = For first nine months of 2006. Note: Figures are based on sales generated in calendar 2006 or – in cases with a fiscal year – from fiscal 2006. Data is from publicly available sources, in most cases annual reports. No attempts have been made to estimate sales in 2006 for companies that have not yet released updated figures. The listing was compiled by international publishing consultant Rudiger Wischenbart. Source: Reed Business Information and Livres Hebdo

As will be outlined later, the geophysical publishers have taken over market share from the traditional print publishers. However, the top publishers in each sector are as follows: Company

Top Companies in STM Information Revenue (2007)

% market Growth

(a) Traditional STM Publishers Elsevier Wolters Kluwer Health Thomson Scientific and Healthcare John Wiley and Sons Springer Science and Business Media Informa plc WebMed Health Corp

$ 2,915 mil $ 1,049 mil $ 1,020 mil $ 931 mil $ 818 mil $ 577 mil $ 428 mil

4.0 % 1.5 % 18.6 % 17.5 % 10.3 % 6.0 % 68.4 %

(b) Geophysical Data Publishers Compagnie generale de Geophysique WesternGeco IHS Inc TGS-Nopec Geophyical C. Petroleum gas Services

$ 2,230 mil $ 947 mil $ 677 mil $ 495 mil $ 354 mil

123.0 % 23.0 % 23.0 % 21.1 % 21.1 %

70

Electronic Information Industry Structure

The classic STM sector revenues grew 5.6 %, slightly outperforming the overall information industry of 5.3 %. However, Outsell claim that the increase could rise to 6.8 % in 2008 for the STM sector. This is based on the rapid changes taking place in the industry, as described in the rest of this book, which will have a delayed effect before it begins to erode into revenues and margins for STM publishers. The library buying budget is large, conservative and slow-moving, not very responsive to immediate change.

5.5 Challenges facing the information industry Nevertheless, changes are taking place. Using Porter’s analysis of competitive forces, the following table indicates some of the challenges that the information industry faces.

Figure 5.2 Porters Competitive Forces

Disruptive technologies can upset the smooth transition for information providers in adapting to a volatile electronic publishing environment. What makes them so disruptive is that they are not anticipated – they arise from nowhere and are supported by circumstances that were not anticipated. In such an environment it can be fatal to stay still. Experimentation is necessary, but with experimentation comes trial and error. Conventional management theory claims that once a failure is identified, the project should be killed quickly and then move on to experiment in other areas. But the definition of failure is not always easy to make particularly if it might be seen as just a temporary blip or an innovation too soon. However, it is not only disruptive technologies that pose a challenge. Disruptive competitors are equally a threat. Small, entrepreneurial organisations can move in and focus on the soft underbelly of an established market, draining away revenues

Challenges facing the information industry 71

and profitability. Flexible organisations can often last the course better than the large, structured, inflexible traditional organisations when there is a major change in the market. However, the power in ‘the brand’ should be recognised, and the existing reward system for authors still supports the traditional versus the new system. Entering a market where brand is viewed as important does create a barrier to entry that the new, flexible organisations may be unable to match. Personal endorsement and peer group recognition (using established ‘brands’) vies with more effective and efficient delivery systems (offered by the new flexible organisations using Internet technology), and the outcome is as yet unclear. The current STM information market is facing profound challenges. How these challenges impact user and buyer behaviour in the market remains unclear. Selfarchiving is a threat to the mainstream subscription-based publication system, and with strong advocacy behind open access (OA) during the past five years this alternative business model has captured some 15 % of the STM article output. The two main approaches to open access – author pays for articles to be published and disseminated freely, and self-archiving by authors of their article in their local institutional repository – each have questionable longterm sustainability. However these approaches answer some of the critics’ concerns over the exploitation of public science by the commercial publishing community. However, there are other challenges the traditional publishing industry faces. Threats from new organisations that are developing products that use e-Science and social collaborative tools to reach current and future researchers – such as Google – cannot be discounted. First, however, we need to look at what we have inherited as an industry structure for scholarly communication before assessing the extent of the change that might happen.

Chapter 6

The Key Players

6.1 Industry Overview 6.1.1 Overall Scholarly Trends There are a number of ways change has exhibited itself in the current scholarly communications sector. These can be summarised as follows In the Past

Present and Future

‘Snail mail’ Printing and copying main technology Limited distribution occurrence ‘Article of Record’ important Archival emphasis Limited media interest

Internet enabled distribution Multiple media and formats Broad distribution Iterative versions are produced Communication emphasis Heightened public awareness

These trends from the past to the present have led to a market structure that in many ways is distorted and unstable. For example, the multiple parts of the information chain from author to reader, and the diverse nature of the players within each segment, has – within a matter of the past decade – created one organisation which on its own has become larger than the rest of the scholarly communications put together. The rise of Google has been an exemplar for the sort of change which could completely transform the electronic publishing industry, riding on the back of a slick technological base with innovative applications, or in this case a search and online advertising function. And this transformation happened within a decade.

6.2 Structure of the Journal Publishing System So what does the current scholarly journal publishing system look like? Where does it come from and to what degree is it being challenged by the disruptive influences and disruptive competition? A project looking at the primary sources of information on scholarly communication was undertaken by Electronic Publishing Services Ltd on behalf of the UK-based Research Information Network (RIN), the Department of Trade and Industry (DTI), and the Research Councils of the UK (RCUK). It was published as a 100 page report in November 2006. The areas tackled included journal market value and volume; journal supply side economics; journal usage; citation impacts;

74

The Key Players

disciplinary differences; cost and impact of alternative formal dissemination models. • Journal Market value: 5.5 million researchers worldwide were identified. 20–

•

•

•

•

•

25,000 peer reviewed, active, scholarly journals were included in their assessment. 60 % of the journals were published with an online version. 10 % had some form of open access. Publisher revenues of about $ 5 billion in STM journals in 2004 were reported. However, existing studies were not comprehensive in this area, there was often poor data and many differences in definitions are used. Supply-side Economics: The range in costs given for producing an article varied from $ 250 to $ 2,000. Again there were differences in definitions that became apparent when breaking down costs. Overheads ranged from 11% to 55 %, manufacturing costs varied from 8 % to 40 %, and distribution from 3 % to 17 % of total costs. Each supplier operated its own and often very different accounting practices. Journal Usage and Disciplinary Differences: Every user-focused market study emphasises that there is value in the publisher brand. Authors select journals based on intellectual property issues and peer review. However, it was mentioned in the study that 50 % of the researchers have journal access problems (particularly where interdisciplinary access is concerned). The study also established that journals were important in science, technology and medicine, whereas books figure more prominently in the softer sciences. Career enhancements: All authors claim career advancement and peer-to-peer communication is important. However, there is the need for more transaction logs, but the real underlying question is how researchers actually do their work? Citation and Impact factors: It is difficult to get a comparison between OA and non-OA journals using citation data. There is no like-for-like comparison. It is suspected that the more highly cited authors deliver later works through open access channels, including institutional repositories. Most of the citation studies are point rather than longitudinal, so trends have not been identified. Cost/impact of alternative models: The interesting point made in the audit is that the jury is still out on whether open access is a viable system. There is still a paucity of evidence on whether the proposed new open access models are commercially sustainable.

The EPS study highlighted the many gaps that still exist. Nevertheless, it was recognised that scholarly journals were an important industry sector and that the UK was significant in global terms in this area. It was unclear how efficiencies in the publication system could transfer into social benefits, though some figures have been produced (by John Houghton amongst others) suggesting that these would be considerable. Scholarly journals are in a complex industry sector particularly as roles are still evolving. Equally, the study pointed out the gaps in our knowledge about the market value of the industry, our lack of knowledge about users, the assumed significance of citations and downloads as metrics, etc. Not only do we not know enough about the buyers in the market, but we also know little about the finances of the suppliers. The inadequacy of our knowledge about the users and their behaviour has become a repetitive theme in this book. Decisions are being made by all the stakehold-

Market Estimates 75

ers in the scholarly journal chain based on limited amounts of market data. One of the tragedies in looking into such areas is that there are so few real researchers who are sufficiently motivated or interested to attend meetings, become involved, get into a dialogue about some of the key issues facing the future of scholarly communication. With a few exceptions, their voices are not heard. This makes building systems and procedures that meet the needs of a supply-driven market rather than user-driven. Some of the other conclusions suggested in the EPS report include: • Although there is some knowledge about journal usage, there is much less ev-

idence about usage at the article level – further research would be particularly useful. This could be interesting in the light of moves towards metrics-based approaches to research evaluation (the Research Assessment Exercise in the UK). • The study pointed to evidence about the difficulties that many researchers experience in accessing material. Therefore there is a case for a detailed analysis about why researchers face such difficulties – and as a possible corollary, an examination about whether/how researchers alter their behaviour patterns because of access problems. • There may be merit in such further research, but it is important to set up an approach to define an agenda and prioritise the work that is most useful. Inevitably, this implies a dialogue and a collaborative approach between all stakeholders. Open sharing of data is also important. Such an approach could be extended to cover the whole of the scholarly communications system. • There is a case for updating the baseline report periodically, so as to chart the development of the evidence base. In this respect, identifying the views of researchers themselves poses a big challenge. Who speaks for them? To what extent are they developing new publishing models of their own? And are they interested in the sort of issues raised by this particular study? Despite the above gap analysis, and evidence that data is not always available on user metrics, some gross estimates can be made of the size and nature of the electronic publishing sector in the scholarly area.

6.3 Market Estimates Outsell Inc, the US-based market research and consultancy service, estimated in 2006 that information overall – across all sectors – was a $ 384 billion market sector (excluding government databases) the growth of which seems to be slowing down during the past five years or so. In their market analysis of the Scientific, Technical and Medical Information in September 2007, one of their senior analysts, Dan Penny, reconfirmed their organisation’s impression that the global information industry will continue to see a declining growth rate, and the STM segment will also slow down to 2010. Currently, science, technology and medicine represents only 5 % of the information “pie” or $ 16.1 billion in 2006. Outsell projected that the total information industry will grow by 5.5 % to 5.7 % over the next few years, led by growth in search services. STM information sector growth of 5.8 % was predicted. This is

76

The Key Players

much larger than the estimates given by Electronic Publishing Services Ltd in their EPS Market Monitor (for June 2006). The latter put the current STM market size at half that level, or $ 9.6 billion. The EPS estimates suggest that the STM market for all forms of published information will increase from current levels to $ 10.8 billion in 2008, an annual growth of 4 %. The differences between these market estimates can partly be explained by the different approaches adopted – Outsell went for a top-down assessment, whereas EPS used a bottom-up approach. It also depends on the definition of what is included in the STM market, with Outsell spreading their coverage much wider than EPS to embrace more of the peripheral sectors, aggregators and search services. (The differences should be ironed out in future as Outsell absorbs EPS within its own organisation as a result of acquiring EPS). Simba (part of the US-based ProQuest/CSA/Bowker group of companies) and also a market research and information company specialising in the information sector, puts the size of the scholarly journal business alone at $ 6 billion. 6.3.1 STM Publishers Publicly-traded STM publishers demonstrate their dominance as revenue generators in this sector by having the five largest STM publishers responsible for 53 % of EPS’s total world market estimate (these being Reed Elsevier, Thomson, Wolters Kluwer, Springer and Wiley). This total revenue compares with 30 % for the next 15 largest commercial suppliers. The twenty largest commercial publishers therefore represented 83 % of the overall market, though there has been some suggestion that even this is an under-estimate – EPS deducted the learned society revenues from publishers such as Blackwell Publishing to avoid double-counting. There has been, and still is, a remarkable amount of merger and acquisition activity evident in this sector based primarily on the perceived commercial value (by the City investors) in the lucrative subscription/licence based business model that STM publishers have adopted in the past. Publicly traded STM publishers generated $ 5.4 billion in revenues in 2005 with an 8.6 % growth. According to the EPS Market Monitor, Thomson posted the strongest increase in profits with an annual increase of 20.5 %, outperforming its competitors and the market average of 17.7 %. Elsevier achieved the strongest organic growth: 5 % and 6 % in its Science & Technology (S&T) and Health Sciences divisions, respectively. “Taken as a whole, these findings show that STM publishers are doubling down their investments, as the sector continues to record robust performance despite the spread of open access journal publishing,” claimed the director of the EPS Market Monitor programme. “The leading publishers are concentrating development efforts on expanding their content portfolios, and continuing to make significant investments in digitisation and value-added navigational and workflow solutions.” As has been suggested, the growth rate in STM output (articles) has been remarkably stable since the early 1950’s and is highly correlated with numbers of R&D staff. Combined R&D spend by the USA and the European 15 members reached $ 500 billion in 2003 growing at a compounded rate of 5.8 % since 1994. Recent studies reflect the importance of STM information as a fuel for economic growth. The output of scientific articles – the main focus of STM activity – grew by 29 % between 1993 and 2003. Approximately 45–50 % of articles published in the US and EU (15) are in the fields of clinical medicine and biomedical research. It is

Market Estimates 77

claimed that medical information will be an engine for growth in the foreseeable future buoyed by an expanding market for drug information and point-of-care solutions, growth in medical practitioners and a rebound in medical school enrolments. Other areas of STM will see a slower growth to $ l5.7 billion in 2008, a compounded growth rate of 2.6 % (compared with 4–7 % for medical information). However, this hard science sector will witness a strengthening in secondary information systems over primary publications, with a significant growth in A&I platforms over the next 3–5 years. Services such as ScienceDirect, (Elsevier) and Web of Science (Thomson) are paving the way. In their report, Scientific, Technical & Medical Information: 2007 Market Forecast and Trends Report, Outsell analyses the size, growth rates, financial performance and market shares of the top ten companies in the scientific & technical and medical information and geophysical sectors. The top eight and their estimated 2006 revenues in $ millions were; Table 6.1 Top Eight STM Performers in 2006 ($ millions) Company

2006 Revenues (in $ mil)

Market share (2006)

1. Elsevier 2. Wolters Kluwer 3. Thomson Science/Health 4. John Wiley (inc Blackwell) 5. Springer S&BM 5. WesternGeco 6. IHS Inc 7. American Chem Society 8. TGS NOPEC Geophysic

2,803 1,034 860 792 742 721 551 429 396

17.4 % 6.4 % 6.2 % 5.5 % 4.6 % 4.5 % 3.4 % 2.7 % 2.5 %

The largest annual growth was achieved by TGS NOPEC Geophysical Company ASA with a 65 % growth compared with the STM average growth of 7.2 % (and the information industry average overall of 6.0 %). Of the ‘traditional’ publishers Wolters Kluwer grew the fastest with 26.5 % with Wiley (excluding Blackwell) achieving the slowest growth rate (0.5 % per annum). As a result of the recent acquisition of Blackwell Publishing by Wiley the two companies together operate in fourth position with a 5.5 % market share, just slightly larger than Springer. In addition, there is a group of about fifty or so ‘medium-sized’ publishers that are split fairly evenly between commercial, learned society and public/government publishers. They are also equally spread between Europe and the US, though there is recent evidence that China and India are both focusing on developing indigenous scholarly publishing activities. According to an earlier report, the top 25 companies, particularly in the geophysical and energy segment, will be the engine behind STM growth in the next year. It states that the ‘medical part’ of STM will continue to expand with more interesting partnerships and acquisitions as non-traditional players enter the market. Within the United States alone, the scientific, technical and medical, part of the professional publishing market represents $ 6.64 bill of the $ 13.58 billion professional sector. Excluded from the STM sector are the legal and business professional

78

The Key Players

publication sectors, social sciences and the humanities. STM within the US has achieved 6.4 % pa compound growth in recent years (Simba Information) or over 8 % worldwide (Morgan Stanley).

6.4 Key Stakeholders 6.4.1 Publishers and Information Providers Leading Publishers Publishers have shown their capacity for innovation in the past. The STM journal publishers re-invented themselves when faced with complaints over the ‘serials crisis’ facing library budgets. They moved from a title-by-title sales and marketing approach towards adopting a site licensing business model. Someone buying 300 Elsevier print journals in the past may now have access to a further 1,200 e-journals from that stable for little additional expenditure and unlimited and ease-of-use through Elsevier’s ScienceDirect. In taking up such offers, publishers were able to upsell and gain more revenues from the same libraries. There is considerable fluctuation in the electronic component of the total revenues by each of the main publishers/information providers in STM. Elsevier received 37 % of its 2006 revenues in electronic format, whereas Thomson Scientific and Health Care received 69 % of its revenues from electronic services. The three non-traditional players (Western Geco, HIS and TGS NOPEC) had even higher proportions, with 100 %, 83 % and 97 % respectively. Another point of similarity between these three, and a distinction from the other traditional publishers, is that they obtain most of their revenues from the corporate sector (80 %, 90 % and 100 % respectively) whereas traditional STM publishers on average received under 30 % from this sector. Learned Society Publishers In a study undertaken by Mary Waltham, an independent consultant, for JISC (“Learned Society Open Access Business Models”, June 2005) some 33 learned society publishers, covering 4,123 journals, were analysed. Revenue for print-only subscriptions overall was down 13 %, and subscriber numbers were down nearly 3 %. This is in contrast to the increase in revenues posted by the main commercial publishers, as indicated above (Elsevier up 4 %, John Wiley up 7 % and Kluwer up 9 %). However, small society publishers need to feel part of a large aggregation service, and instances of society publishers clubbing together are also in evidence (e.g. the ALPSP Licence and the recently announced Scitopia in the US). Is there a raison d’ˆetre other than journal publishing which learned societies could rely on? An earlier ALPSP study conducted by Christine Baldwin investigated this, and some of the reasons given include ‘to promote understanding and interaction and to use the resulting knowledge for the common good.’ The vibrancy of publishing is a metaphor for society itself, claimed Waltham. Professional societies, which perform certification, have a broader range of services than learned societies that aim solely to promote their academic discipline. In all cases, the actual and perceived benefits of membership exceed the cost. Related to this was ‘Why do researchers join societies?’ In a small sample of 340 readers undertaken by The Scientist (17 (17):9), 83 % of scientists belong to a society, and 21 % belong to four or more. The three top reasons for joining were

Key Stakeholders 79

• to attend meetings and conferences, • to have association with fellow scientists and • to receive discounted subscriptions to the society journals.

However, smaller societies are seeing a 3 % decline, hence their decision in many cases to subcontract their journal publishing to the larger commercial and professional publishers. Some of the larger societies, such as the American Physical Society (APS), are growing in membership and no longer dependent (as many are) on journal revenues to sustain their wider educational and advocacy programmes. In addition, the APS has focused on attracting students and graduates into the society. Some 14 % of APS members elect to have a print subscription to the society’s journals, out of a total membership of 8,000. One of the consequences of the greater focus being made by the APS on lobbying and educational activities is that it has created a closer relationship with its members. The Academy of Sciences offers another example. This society has made a radical change by concentrating on reaching the audience through e-Briefings, multimedia and presenting live events in New York. Key value added for members is the organisation of scientific meetings with multiple speakers. The number of members has grown accordingly from 23,000 in 2005 to 26,000 in 2006. There is a difference between US and UK societies in the services offered to their members. US societies tend to offer online access to journals for no extra charge, whereas in the UK all but one society required an extra fee. Inevitably the largest societies (in the US, with over 30,000 members) offer the widest range of services and are the most pioneering. This leaves the smaller societies out in the cold. Again, even within the learned society publishing area itself there are indications of diseconomies of scale being substantial. For learned societies to survive in the publishing domain, a more strategic approach has to be made towards what each society has to offer, including the potential participation from individual Asian countries where science output is on the rise (particularly in biosciences, plant and animal sciences and geosciences). Also, the society would need to consider the emergence of ‘new fields’ and the impact of technological changes. Creating ‘portals’ of information across a range of products and services, all targeted at the needs of the society, is also potentially valuable. The conclusion is that, in order to retain membership, societies should do for members what the members are unable to do for themselves. University Presses “University Publishing In A Digital Age” – this is the title of a report from Ithaka on a study which began as a review of US university presses and their role in scholarly publishing. It evolved into a broader assessment of the importance of publishing to universities. The authors state that universities do not treat the publishing function as an important mission-critical endeavour. They argue that a renewed commitment to publishing in its broadest sense can enable universities to more fully realise the potential global impact of their academic programmes, enhance the reputation of their specific institutions, maintain a strong voice in determining what constitutes important scholarship and which scholars deserve recognition, and in some cases reduce costs.

80

The Key Players

They state that, for a variety of reasons, university presses have become less integrated with the core activities and missions of their home campuses over the years – a gap that threatens to widen as information technology transforms the landscape of scholarly publishing. The responsibility for disseminating digital scholarship is migrating instead in two directions – towards large commercial publishing platforms and towards information channels operated elsewhere on the campus, mostly libraries, computing centres, academic departments and cross institutional research centres. Press directors and librarians need to work together to create the intellectual products of the future which increasingly will be generated and distributed in electronic media. An objective behind the study was to gauge the community’s interest in a possible collective investment in a technology platform to support innovation in university-based, mission-driven publishing. See: http://www.ithaka.org/strategic-services/university-publishing

6.4.2 Life Cycle of scholarly communication Publishers are part of a broad spectrum of activity which has as its en product a published article or book. There are a number of phases and iterations this has to go through before a publication suitable for being classed as a ‘Record of Science’ emerges. The following table lists some of these activities. Table 6.2

The Life Cycle of Scholarly Communication

R&D funding Evaluate Prior Research Applicants Evaluate the Publications themselves Input the metadata Produce citation statistics Rank scientific journals Evaluate Research proposals Make Funding Decisions Perform the Research Study existing Scientific Knowledge Collect data from existing Repositories Do Experiments and make Observations Analyse and draw conclusions Communicate the Results Communicate the Results Informally Communicate the Results through Observation Share the Data Communicate the Results through Publications Publish the results Write manuscript Choose where to submit publication Produce publication Publish as a manuscript Publish as a conference paper Publish as a scholarly article Do publisher’s general activities Do journal specific activities Market journal Negotiate and manage subscriptions

Key Stakeholders 81

Table 6.2

(continued) Plan and manage issues Process article Do peer review Manage the review process Review manuscript Revise manuscript Negotiate copyright Pay article charge Do technical phases of publishing Copyedit manuscript Queue for publishing Embed in issue Duplicate and distribute article Print paper issue Distribute to subscribers Control access to e-vers Publish e-version Facilitate dissemination and retrieval Facilitate retrieval globally Bundle publication from different sources into E-service Make copy available on the web Post on personal web pages Post in Institutional Repository Post in Subject Specific Repository Integrate meta data into search service Index in bibliographic index Index in web harvester for scientific content Index in general web search engine Facilitate retrieval locally Negotiate subscriptions and licences Make paper copy available inside organisation Make electronic publication available inside organisat Preserve publication Study the publication Find out about the publication Search for interesting publications Use dedicated search services for scient publics Use general web search engine (Google) Browse shelves in library Be alerted to a specific publication Receive recommendation from colleague Notice reference in another publication Remember existence of previously read publicat Browse journal issue or e-mailed TOC Consider buying the publication Retrieve the publication Retrieve paper publication Retrieve electronic publication Read and process the publication Photocopy or print-out for easier reading Read the publication Read for research purposes Read as part of university education Read for company R&D purposes Read to keep up with progress

82

The Key Players

Table 6.2

(continued) Read for public policy setting Read to increase one’s knowledge Self-archive for future reference Publish secondary accounts of the results Cite in one’s own publication Report on results in review articles Include results in university textbooks Report in popular media

Apply the Knowledge Educate professionals Produce teaching material Teach students Teach practitioners Regulate industry and society Define standards Grant patents Legislate and define public policy Do industrial development Apply in practice Treatment of patients Company practice Life style developments

The Future of the Big Deal One of the above conditions is changing however. Big Deals still remain a significant feature of the research library operations, but there has been a growing disquiet over what it means in practice. One of the key functions of the librarian is to reflect the needs of its patrons and to acquire or collect publications accordingly, whether in print or electronic format. What the Big deal does is take much of the selection decisions out of the hands of the library staff and into the hands of publishers who establish the content included in the bundled packages. It means that control over the acquisitions budget is being watered down in its relevance to the target audience in favour of financial expediency. Some libraries have in the past few years reacted against the forced acquisition of the publisher’s full list of titles, many of which may be of no or little relevance at all, and the reduction in budgets available to buy high quality material from small learned society publishers. Instead they want sub packages to be created by the large publishers, each sub package having more specific relevance to their needs, rather than having ‘all or nothing’. This in effect breaks down the Big Deal as a highly profitable and dominating feature of the scholarly communications landscape. Trend towards Open Access There are some ‘unintended consequences’ of the profit motive. Some critics feel that there is not only an ongoing rise in subscription prices but – more significantly – that commercial publishers were invariably more expensive than non-commercial publishers. Professor Ted Bergstrom, a US-based economist, has produced an analysis that shows that commercial publishers are five times more expensive per page

Key Stakeholders 83

than not-for-profit publishers. This has become almost folklore but equally has been vehemently challenged by the STM publishers association. Following on from the Bergstrom analysis why has there been, in this particular industry, so little correlation between price and quality? Some of the reasons lie with the library community itself that traditionally has maintained subscriptions as an act of faith and felt that some subscriptions are a ‘must have’. Such inelastic demand gives publishers market power and in many cases a 25–40 % profit margin for commercial publishers. However, there have been consequences arising from this – there has been a decline in journal subscription sales among the smaller, specialised publishers; university presses have been in decline; monograph sales have suffered to the extent that, over the decades, books which would have traditionally sold 5,000 copies are now selling 500. Whilst the Big Deal licences have provided some good, they have also resulted in lock-in to the few larger publishers. This has given rise to a new countervailing force. The claim by the new force is that the ‘open access’ genie is now out of the bottle. The key drivers for this are that traditional publishing creates barriers to access (‘disenfranchisement’), and the new e-infrastructure supports an open movement. Versioning Version control is the management of multiple revisions of the same unit of information, or in this instance the journal article. As a technique it exists in many engineering and software applications, but with scholarly publishing it has come to the fore as new business models have emerged which provide a new utility, functionality and status to earlier versions of an article. In work undertaken by ALPSP some years ago a dozen different stages in the development of a research article could be identified, each ‘version’ having different inputs from different stages of the publication process. In essence at one extreme is the rough, raw document produced by the author often in draft as a discussion item. At the other is the final published article available as a PDF on the publisher’s web site. The process in between can take months, in some cases years, depending on the efficiencies built into the various production stages of the article. Some of the interim article publication stages are: 1. Author’s first draft manuscript 2. A modified version produced for comment by members of the research team and the peer group 3. After circulation a ‘cleansed’ version is created and submitted to the editor of the journal of choice and, in certain disciplines and topics, to a subject based archive (as a so-called Preprint) 4. In some instances the editor will make an arbitrary decision that the article is not acceptable i.e. from a quick scan of the contents 5. More typically, the editor will select from a list of network of referees which he/she and the publisher have developed over the years. Usually at least two referees’ comments are required before an informed decision about the quality of the article, and whether it meets the journal’s selection criterion, on the research methodology adopted, etc, is made. The manuscript is included in the editorial tracking system used either by the journal editor or the publisher.

84

The Key Players

6. The external (blind) referees may make suggestions for additional changes before the manuscript is finally accepted. This process may take time as it is dependent on the schedule of the (unpaid) referees 7. The author makes the appropriate changes to the manuscript 8. At some stage the author deposits (or casts aside) the raw background data which has been collected as part of the research study. This comes in many different forms – data models, datasets, audio/video clips, etc. In some cases these can and are deposited in the local Institutional Repository. 9. The document is returned to the editor who sends it on to the publishing house. 10. In-house (or external) desk editors convert the document into the publisher’s house style, making changes of layout, grammar, language, and comprehension in the process. The article is logged into the publisher’s production tracking system at this stage. 11. The document is returned to the author as a Postprint for his/her final agreement (with copyright transfer form and/or open choice request for publication) 12. Any final changes made by the author after this stage are often chargeable costs. 13. Page and issue numbering are applied by the publisher. 14. The manuscript is sent to the printer for typesetting as part of a journal issue 15. Printers proofs are checked by the publisher’s desk editorial team 16. DOI’s are applied for those publishers who are part of the CrossRef consortium 17. The article is printed in the journal and disseminated; an electronic copy is mounted on the publisher’s web site (or hosting service) 18. The article becomes part of the publisher’s digital archive (and subject to ownership change). Traditional journal publishers claim the final printed/online version is the definitive one. However, earlier versions (particularly the Postprint – stage 11 above) are often used by authors to deposit on their personal web site (or in institutional/subject repositories if available). Supplementary material (datasets, video clips, audio, software, tables, derived data, models, etc) may be included in these repositories as publishers are ill-equipped to handle such material. Much of this data, potentially ending up in huge datasets, is increasingly where much of scientific effort is being focused – and most of this data is open access. In some areas of physics, mathematics and economics there has been a tradition to circulate the e-prints (electronic versions of Preprints, stage 3 above). The archive of electronic preprints in high energy physics has been in existence since 1991 – all physicists in this area check on the ArXiv e-print service for their information needs, which means they are aware of developments months before it finally appears in print. However, in many areas (medicine in particular) careful refereeing – stage 5 – is essential in order to avoid mistakes being made to patients. In this case the final, published version, is all that counts. Disciplinary differences count significantly in what ‘version’ of an article can be used with any degree of confidence.

Key Stakeholders 85

Table 6.3 Stages in the Emergence of a published article 1. Author’s Draft

1st DRAFT

2. Modified by peer group

E-PRINTS

3. 4. 5. 6. 7. 8.

Submission to editor Initial editor decision Refereeing Referee comments Author changes Datasets

DATASETS

9. Article to Publisher 10. Desk editorial process

e-POSTPRINT

11. Postprint to authors 12. Final additions 13. Page layout

CITATIONS

14. 15. 16. 17.

To Printer Checked by publisher DOIs applied Printed and online version disseminated

DEFINITIVE VERSION

18. Archived

86

The Key Players

As has been highlighted in an article in Learned Publishing (Vol 20, No 2, April 2007), Robert Campbell and Edward Wates from Wiley-Blackwell point out that there is a major activity which publishers engage in to make the final article a better Record of Science than the pre-edited versions. From analysing some 189 articles from 23 journals the authors were able to establish, through careful scrutiny of pre and post publication results, that copy-editing carried out by the publisher is a significant activity. Copy editing is particularly important for the online version. Other Challenges facing Publishing Publishers therefore need to face Change in a number of areas. For example, a recent change is that the individual research article has now been transformed from being a standalone item to being part of a network. It has ‘links’ to similar articles, to bibliographic databases, has links to images, maps and structures. Issues such as retraction are often difficult as the information is spread so widely throughout the system. The network effect puts pressure on the traditional publication system. In addition, some of the specific issues which publishers are having to face include: * Editorship The role of editors varies considerably There is limited response that journal editors can impose on referees Conflicts of interest among selected referees are not being identified and acted upon Commitments by the industry to the journal vary in many cases * Privacy and confidentiality There are tensions if the authors are expected to reveal their raw data * Publishing and Editorial Issues Negative results need to be published Corrections, retractions and ‘expressions of concern’ are not handled consistently Copyright transfer and warranties are unclear * Overlapping publications Duplicate submission can cause a problem Competing manuscripts may be submitted for the same study Salami publishing (cutting a research project into ever smaller sizes to achieve multiple publication outlets) still prevails There is an issue about published correspondence There are many electronic publishing issues (with links and with archiving) Differing advertising policies Much needs to be done by publishers to ensure the existing house remains in order. Otherwise their credibility – their ‘brand’ – may become tarnished. Hybrid Journals and document delivery Many larger publishers have offered some form of open access publishing for their authors. One emerging issue in these articles is the lack of identification when the

Key Stakeholders 87

author pays for the open access privilege. Particularly where these articles are part of a hybrid journal that includes both paid for and toll-based articles. At present there is no mechanism for those agencies that provide document delivery services to distinguish which articles in a hybrid journal are free and which require a royalty fee. In both cases a service fee would be charged by the intermediary agency. There could be instances where libraries and end users seek to purchase an article from a document delivery service and that the article in question is one which the author has paid for free access. However, currently there is no method available for the document delivery (docdel) agency to identify that the particular article in question does not carry a royalty charge. Royalty charges are set at journal level, not at the article. Because of lack of correct charging systems, a publisher could end up being paid both by the author (under the author pays option) as well as a document delivery royalty to which the publisher is then no longer entitled. Under these circumstances there could be legal complaints, a class action, made by those authors who have paid for their article to be made freely available and this is not being done. So far there is no indication that all publishers are prepared treat this issue as a legitimate concern. Refereeing Since 1665, and the Philosophical Transactions, a two-person review of an article has been the lynchpin of scholarly publishing. However, one can now ask what is peer reviewpeer review and what function does it fulfil in scholarly dissemination given that we are in a period of substantive Change? The traditional answer is that • • • • •

Refereeing assesses the novelty of the research It ensures consistency in the methodology adopted by the author Refereeing supports the effort to ensure reputable recent work is referenced It prevents undue claims being made Potentially it identifies plagiarism (though does not necessarily detect fraud)

Whilst it is not a primary function to improve the written English, language correction is often carried out as a secondary aspect. Over time the reasons for having reviews undertaken have changed, and now the quality of online review has become important, to make life easy for the reviewers. Fraud has become a greater issue with 19 % of respondents to an Elsevier survey believing there is too much fraud. To cope with new market and technical conditions there are many types of review emerging. In the past publishers have relied on anonymous refereeing. There are now examples of open refereeing and also double blind and continuous reviews. There are some changes taking place with regards to technology, with citation linking and plagiarism. These newer ways of conducting reviews have not yet achieved widespread popularity. The Nature Publishing Group introduced a six month scheme of open peer review, but terminated it for lack of support. Continuous reviews also suffer from versioning problems. Seeking the esteem of others is crucial to peer review. This is of enduring value, irrespective of prevailing conditions. These issues have been investigated in some depth.

88

The Key Players

Peer Review in scholarly journals. At the same time as a meeting was being held in Berlin in mid January 2008 organised by the Academic Publishing in Europe and focused on ‘Quality and Publishing’, a report was being issued by the Publishing Research Consortium entitled ‘Peer Review in scholarly journals: Perspective of the scholarly community – an international study’ (see http://www.publishingresearch.net/). The two events were mutually reinforcing and made the case that there remains powerful support for a professionallymanaged peer review system for the scholarly community. Mark Ware, an independent consultant with a history in scientific publishing mainly at the Institute of Physics Publishing, was the author of the PRC report. With email questionnaire returns from some 3,040 academic authors (a respectable 8 % response rate) and some 55 questions with over 120 data points being answered, the source of original and new information used by Ware was large and statistically significant. The main conclusions drawn from the study was that peer review is widely supported by the author community. It is claimed that it improves the quality of published papers in terms of language, presentation and correcting scientific errors. Though there was division on whether the current system is the best there was a similar division on whether the refereeing system required a complete overhaul. This despite some concerns that the system was slow and highly active referees were overloaded. If a new system was to be instituted a double blind system was given preference and active discouragement existed for open peer review. Nevertheless double blind review had its detractors – it was not foolproof as it was often possible to identify author and referee. Metrics had very little allegiance as a mechanism for quality control. There was limited support for payment of the reviewers on the basis that this would have a knock-on effect on making the costs of publishing too expensive (at a time when prices of scholarly communication are under scrutiny), and it was felt that refereeing was a duty for academics and researchers to remain within the global scientific academy. 90 % of authors were also reviewers. Reviewing supplementary data to an article drew a mixed reaction – a very small majority claimed that they would be prepared to review data though this conflicted somewhat with a separate concern expressed by respondents that they were in general overloaded. There was an average of 8 papers reviewed by each referee during the past year, compared with the maximum of 9 that they claimed would be prepared to review. However, the really active referees did an average of 14 reviews which is way over the desired maximum. According to Ware, 44 % of all reviewers undertook 79 % of all reviews, In terms of logistics, the average review takes 5 hours and is completed in 3–4 weeks (or 24 days), and the average acceptance rate by those who submitted returns was 50 %. Some 20 % are rejected prior to review. Online submission systems have taken hold particularly in scientific journals compared with the humanities and social sciences. Mark Ware suggests that the results of this study “paints a picture of academics committed to peer review, with the vast majority believing that it helps scientific communication and in particular that it improves the quality of published papers”. The sources of discontent, not to trivialise their strong support for the overall review system, lies in the area of possible alternative systems of peer review. However, there was no significant consensus on what might be a viable alternative.

Key Stakeholders 89

It does seem that the scholarly community believes in the fundamental need for its peers to undertake the quality assessment of an article prior to it becoming part of the record of science, and that new challenges from technology, funding changes, user behaviour from the Google generation, etc, notwithstanding, the existing system works best. It is not perfect, and the overloading of the more active referees whose prime motivation is the intangible drive to remain good members of the academy, is a potential stress point. A stress point which could be made more difficult as datasets become part of the referee procedure of the future. But the fundamentals of the existing system remain strong among the scientific author community, it is something which can be built on rather than drastically overhauled or scrapped. Alternative review procedures. Nevertheless, the industry should not stand still on the refereeing process. In some respects collaborative review could make publishing better. Closed refereeing is often seen as being divorced from identifying what is valuable. Some valuable comments from reviewers are never disclosed. Sufficient time in not always provided to write detailed comments, and speed of turnaround has become a key driver and quality of review may suffer as a consequence. Interactive open access reviews are an alternative. This is where the classical review mechanism is kept in place but it is supplemented by opening up to public commentary. The reviews become transparent and more democratic. However, it may not be necessary to invent something totally new. Added features to the classical system could be considered. So who will build the new tools that will bring the information to the researcher in future? There is an opportunity for any of the current stakeholders in this sector, and the organisational experience of publishers with the current review mechanism puts them in a strong position. However, if the current stakeholders fail to experiment with new methods they risk seeing that much of the new systems and procedures in establishing ‘quality control’ in scholarly communication will be taken over by totally new players. Meanwhile, ALPSP has commissioned Dr John Regazzi, Dean of the Scholarly Communications Laboratory at Long Island University in the US, and formerly Vice President at Elsevier Science, to undertake a study on ’author-perceived quality characteristics of STM journals’. This will provide a further yardstick on which publishers can assess how important the sifting process is. Publishers and the ‘Valley of Death’ The Valley of Death appears in many guises during the course of this book. Essentially it raises the notion that there is a point of transference from one paradigm to another and at that stage there is the chance that the communication system could break down. The original application of the valley of death in scholarly communications was for those publishers that depended on publishing the printed page would not be the same as those ‘information providers’ that would take on the delivery of information through screen-based systems in the digital age. The transference – at the bottom of the valley – was where the change from one type of content provider to another would take place. This is also where the old school would die out and a new breed of stakeholders emerges. The reasoning is that there is a technological difference between printed page publishing, and the skills and investment required in information technology that

90

The Key Players

Schema of ’Valley of Death’ Emergence of viable Electronic Publishing

Decline of the Printed Versions

Commercial Margins

1990

1995 y

2005

2000 e

a

r

Figure 6.1 The Valley of Death

underpins the screen-based system. There is a wide gap between the two skill sets and investment requirements – the width of the valley floor represents the extent of the transformation required. But this did not happen to the extent some pundits envisaged. The reason was largely due to the costs of making the transference from print to digital fell, enabling even the smallest of publishers to make some moves towards electronic publishing. But the drive to bring in IT solutions to publishing was led by the large and growing commercial publishers who are using their economy of scale to full advantage. In addition a change in the market mechanism took place that further reduced the impact of the ‘valley of death’ in publishing. The change in market mechanism was the emergence of Big Deals. The Prisoner’s Dilemma A similarly bleak concept has also been applied to the current electronic publishing sector. This is the Prisoner’s Dilemma. In game theory, the prisoner’s dilemma is a type of non-zero-sum game in which two players may each ‘cooperate’ with or ‘defect’ (i.e. betray) the other player. In this game, the only concern of each individual player (“prisoner”) is to maximise their own benefit without any concern for the other player’s payoff. In the classic form of this game, cooperating is usually dominated by defecting, so that the only possible equilibrium for the game is for all players to defect. In simpler terms, no matter what the other player does, one player will always gain a greater payoff by playing defect. The unique equilibrium for this game is a Pareto-suboptimal solution – that is, rational choice leads the two players to both play defect even though each player’s individual reward would be greater if they both played cooperate. In equilibrium, each prisoner chooses to defect even though both would be better off by cooperating, hence the dilemma.

Key Stakeholders 91

Journal publishers face two Prisoners’ Dilemmas. The first concerns whether to continue business as usual, to mounting criticism from parts of the academic community and the tax-paying public, or to convert directly to Gold OA now, at the risk that article-based income may not result in sufficient revenues to cover costs. The second Prisoners’ Dilemma facing publishers is that as an alternative strategy they could counter the critics by converting to Green open access now. As access-provision and archiving (and their costs) will then be performed by the distributed network of mandated Green OA Institutional Repositories, the revenues (and expenses) of journal publishing then may be reduced from what they are now. Publishers would lose some of their revenues. According to Professor Stevan Harnad (in a September 2007 listserv) the scenario for converting to Gold OA does not work if it is not universal. In particular, it cannot emerge gradually, either journal by journal or institution by institution. Harnad claims there are three reasons for this (1) the real costs of Gold OA publishing are currently not known, (2) there is no money available to pay for them, (3) publishers would be unwilling to downsize to reduce costs of their own accord. Only the cancellation pressure from universal Green open access, together with the distributed infrastructure provided by institutional repositories, would result in a rapid conversion to an open access business model – of the Green (institutional repository) variety. This would allow the functions (and costs) of access-provision and archiving to be offloaded from journal publishers and libraries onto the distributed network of Institutional Repositories. It is claimed by those in favour of open access that this will suffice to force both the downsizing and the transition, while at the same time freeing the funds to pay for it. The one thing that just might encourage publishers to make the full transition to Gold OA voluntarily, however, is the worry that if they wait to make the transition under the pressure of Green OA self-archiving and self-archiving mandates at the article level, then the transition may indeed come with a forced downsizing and loss of income. If, however, they convert voluntarily, at the journal level, then they might hope to “lock in” current prices for a while longer yet. This is a second prisoner’s dilemma. We are seeing the Prisoner’s Dilemma being played out in all its splendour at the moment with everyone waiting to see who makes the next most significant move (Mandating by funding agencies? Switch from subscription to author pays by publishers? Cancellations of subscriptions in favour of open access ‘membership clubs’ by librarians? Etc). Future of Publishing A fundamental challenge to the existing scholarly publishing system may come from ‘alternative’ publishing systems that not only address the price and time issues but also answer the outcry from academia that the current STM publishing system is iniquitous and dysfunctional. One major factor keeping the subscription model alive is researcher indifference to Open Access. Researchers may change their information gathering habits and the network publishing models are now in place to help them. But they will only

92

The Key Players

change when it suits their interests (career advancement) to do so. Clearly we are not yet there in most disciplines, even if the physicists seem to have cracked it. The leading publishers are also doing interesting things in other parts of the value chain. The ones less able to adapt could suffer a Darwinian decline, become victims of the Valley of Death, and consolidation will be the result. However, the burgeoning dataset management challenge has been discussed in consultancy circles for many years but so far with little commitment from publishers. There is a growing impression that the subscription model is not sustainable in the longer term, and some have thrown more weight behind pay-per-view. Elsevier has recently launched an advertising based business model for some of its oncology publications. But such experimentation in the business model area is still largely confined to the larger more innovative publishers. For the rest it could be inferred that the venture capitalists in the City are aware of this, and relish the idea that there will be fortunes to be made in consolidating this marketplace once the business model migration begins to seriously de-stabilise things. Some think that margins will also be made in OA publishing, though the auguries are not as good as they were for the subscription model which it potentially replaces. So in terms of alternative electronic publishing processes it would appear that there may be two to three very large software based players that undertake the systems business of managing content self-published by academics within communities, and who run the community infrastructures for academe as well as the workflow and process environments that help to track and prioritise the reporting of research and its subsequent evaluation. The objective will be due diligence and compliance for the researcher, along with the removal of the requirement for academics to read everything in full text. The most popular systems will be the ones where the reading requirement is least. Inside these large groups, brands (Elsevier, Nature, Cell, ISI, ScienceDirect etc) will still be used, but with less and less conviction. Some of these groups may in fact manage library networks as an outsource operation. Given such a radical vision the current publishers face some challenging questions. A strategic approach, requiring a strategy which integrates all the issues raised in this book and beyond, will be essential if the same players are to be around in ten years.

6.5 Research Libraries The library system, and its materials budget for acquisitions, represents the funding cornerstone of the current scholarly communication system. The STM market in particular is in the business-to-business (B2B) sector – i.e., the publishers sell packages primarily to libraries, and not to individual researchers. The funding for STM purchases is approximately 90 % through research libraries, both academic and corporate. Little B2P (i.e. sales directly to the researchers)is in evidence though there are hopes that the individual will become the target for the more innovative product /services being developed in future. Reinforcement of the power of the library has come from the adoption of a consortial approach to the site licensing of packages of journals – that is, a group of related libraries unify their purchasing power to claim even greater benefits from

Research Libraries 93

publisher’s offering Big Deals or site licences. It does mean more time being spent on negotiating terms with each (large) publisher but the benefits can be considerable. OhioLink, a statewide consortia, has offered evidence that those titles not acquired received almost as many hits as those traditionally bought by the library. ARL statistics In a recent (2004–5) ARL Statistics published by The Association of Research Libraries (ARL) the authors describe the collections, staffing, expenditures, and service activities of ARL’s 123 member libraries. Of these member libraries, 113 are university libraries (14 in Canada and 99 in the US); the remaining 10 are public, governmental, and private research libraries (2 in Canada and 8 in the US). ARL libraries are a relatively small subset of the research libraries in North America, but they account for a large portion of academic library resources in terms of assets, budgets, and the number of users they serve. The total library expenditure of all 123 member libraries in 2004–05 was more than $ 3.5 bn; of that total, more than $ 2.6 bn was spent by the 113 university libraries and $ 900 m was spent by the 10 non-university libraries. Libraries continue to spend more on serials each year, as the average annual percentage increase is still above 7 %. Serials expenditures for the median ARL library were close to $ 5 million last year. The statistics show that about half the money spent on serials ($ 2.8 m) was used to purchase electronic serials. The statistics also report a decline in the unit cost per serial since 2000, the year when electronic subscriptions were officially included in the serials purchased figures, and an increase in the number of titles acquired as a result. In 2004–5 the serial unit cost was $ 239, close to the 1996–7 unit cost level. Between 2001 and 2005, purchased serial subscriptions increased by 64 %. One of the reasons for this decline is likely to be the result of licensing access to many more titles of a publisher for an extra fee – i.e. the ‘Big Deal’. UK university library expenditure Each year the UK Publishers Association (PA) commissions a similar report summarising the trends in university library spending, both in the UK and internationally, derived from the latest statistics available from LISU, SCONUL, ARL etc. and through the personal contacts of the author, Peter Sowden. He concluded that in the five years from 1999–2000 to 2004–2005 (the latest year for which full data are available), total serial subscriptions to UK higher education libraries grew from 666,000 to 1,200,000 (80 %), while the overall cost of these subscriptions grew from £ 69.2 m to £ 96.1 m (39 %). In the five years prior to 1999–2000, total subscriptions grew by 25 % and the overall cost of those subscriptions grew by 28 %. “So the value delivered to UK HE by the journals publishers has improved markedly over the last five years”. The statistics on the books side are not so rosy. Acquisitions spend as a proportion of overall library spend remains stubbornly flat at around 33 %, while the libraries’ share of HE funding is less than ten years ago. With expenditure on serials and ‘electronic resources’ continuing to rise, spending on books continues to lag behind inflation. The following chart comes from the spending patterns of UK higher education institutions. It shows that books in particular have suffered from cutbacks as journal investment continued to grow. But electronic information systems are only at the early stages of adoption, and

94

The Key Players 25 22.4 22.4 20.9

Percentage of library expenditure

19.9 20.1

20

15

20.5

20.2 19.9 18.8 19.0 19.0

20.6

18.0

14.9 14.5

14.0

14.9

13.3 13.3 13.2

8.2

7.9

7.8

18.3

21.7

18.9

17.3

12.6 12.3 11.4

10

20.8 21.0

11.0 11.2 10.7 10.5 10.8

9.9

9.6

9.4 8.7

7.7

6.9

5

3.5

3.7

3.8

3.4

3.5

0.6

0.7

0.6

0.8

0.9

3.5

2.6

3.5

3.2

3.3

3.2

3.0

3.0

3.2

3.2

4.0

3.9

4.1

4.4

3.0

2.6

2.4

2.2

0

86- 87- 88- 89- 90- 91- 92- 93- 94- 95- 96- 97- 98- 99- 00- 01- 02- 03- 0487 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 books electronics other ills + bindings

periodicals including other - electronics

Figure 6.2 Proportion of library expenditure within the ‘old’ universities in the UK (Source: LISU, Loughborough University)

when they are adopted they may further change the relationships with books and journals. Librarian relationship to their customers (users) This is an important moment in the relationship between researchers and their libraries. As a recent study on the relationship between research libraries and researchers in the UK demonstrates, the foundations between librarians and researchers are being tested by shifts in the way that researchers work. The study provided a forward-looking view of how researchers interact with academic libraries in the UK. Reviewing empirical data and qualitative responses from over 2,250 researchers and 300 librarians, RIN and CURL – the two funders of the research – hoped that the results would be useful in informing the debate about the future development of academic libraries and the services they provide to researchers. The study was undertaken by Key Perspectives Ltd in 2006/7. But as Dr Michael Jubb, director of the Research Information Network, has pointed out, there are some 100,000 researchers in the UK, so even Key Perspectives sample of 2,250 might not represent a statistically valid picture. Nevertheless, according to the study, the majority of researchers think that their institutions’ libraries are doing an effective job in providing the information they need for their work, but it is time to consider the future roles and responsibilities of all those involved in the research cycle – researchers, research institutions and national bodies, as well as libraries – in meeting the future challenges. Included in the findings was the dichotomy of the importance from a researcher’s point of view of increased access to refereed digital information, yet the inability (because of lack of funding) of the library to meet the need through purchasing.

Research Libraries 95

The consequence is that there has been a pronounced decline over the past five years in the number of science researchers (as opposed to those in arts and social sciences) who visit the library. Researchers are working in new ways, and though they may value the custodian functions of the library, they remain to be convinced that they can provide useful information support services. There still seem to be, according to Key Perspectives, considerable differences between researchers and librarians in attitudes, perceptions and awareness of important issues (such as open access). The authors claim that “The successful research library on the future needs to forge a stronger brand identity within the institution”. The foundations of the relationship between librarians and researchersare being tested by shifts in the way that researchers work. The rise of e-research, interdisciplinary work, cross-institution collaborations, and the expectation of massive increases in the quantity of research output in digital form all pose new challenges. These challenges include consideration of how libraries should serve the needs of researchers as users of information sources of many different kinds, and also consideration of how libraries should deal with the information outputs that researchers are creating. The consequence is that there has been a divergence between the 1:1 services librarians provide to individual researchers, and the collective service they perform for the science-based researchers in general. KPL has suggested that librarians should recognise that there is about to be a major swing in favour of ‘openness’ and that they have a powerful role in administering the organisation’s institutional repository. Some additional research work which Key Perspectives is undertaking explores ways to easily ‘embed’ the output of a research centre into the IR, and KPL is also working with Southampton University in potentially circumscribing the contentious ‘versioning’ problem through automatically tracking differences between the different versions of the same article. There was an underlying fear from the report that the library community is unsuccessful in connecting with the research community, particularly the science based sectors. The report indicated the need for further discussions to find a role for the library profession within the emerging highly digitised scientific information systems. But there were no immediate solutions on how the library will mutate within the new millennium into a service that meets the new needs of scientific researchers. See Researchers’ Use of Academic Libraries and their Services – A report commissioned by the Research Information Network and the Consortium of Research Libraries, April 2007. 6.5.1 The European Digital Libraries The European Commission is developing the i2010 digital library initiative. On 28 April 2005 a letter was circulated to the European heads of State committing the EU to creating a major European library in reaction to some extent to US developments (Google, etc). In June 2005 the EC then published the i2010 study which was a technology-driven assessment on digital libraries. It has since become the flagship initiative for Commissioner Redding, with a huge momentum.

96

The Key Players

There are three main strands to i2010 – • Digitisation, covering issues such as copyright, orphan works, etc. • Enabling online accessibility. Help create a multilingual access point to Europe’s

distributed digital cultural heritage. • Digital preservation, covering e-legal deposit, etc. There are huge problems

facing digital archiving and member states are expected to do more. As a further indication of how seriously Europe treats the role of the digital library The European Library (TEL) has been established. By the end of 2006, the TEL infrastructure, headquartered in the Hague, consisted of 25 national libraries and will have some 2 million digital objects. 6.5.2 The Future of the Librarian It is important in assessing the future role of librarians, to make a distinction between libraries and librarians. The Library is basically inert – it doesn’t do anything. The Librarian does things. So what is the future essential role of libraries and librarians? For libraries it involves buildings and managing physical collections, tied up with physical space. For librarians it could be managing the knowledge base. This gets past the notion of space and physical buildings. The future of the research library roles is constantly an item for debate – will they, as allegedly with subscription agents, become disaggregated (i.e., will publishers go directly to the researcher and bypass the librarian)? The jury is still out, with some claiming a reallocation of functional priorities away from traditional roles of collecting ‘stuff’ towards navigating to ‘stuff’ not necessarily acquired by the library. Some also question, perhaps hopefully, that the individual researcher will become the key decision maker or buying point in future. Librarians themselves are also searching for a new role, one which builds on their traditional professional experiences and core competences but also takes into account a new set of informatic market needs. Will they be able to find a role in the migration towards Web 2.0 and the semantic web, in the development of ontologies and creation of quality metadata to enable targeted access to the world’s STM information resource? Will the body of professional training cope with this change in approach? New roles are being identified. Though folksonomies by definition are not in the mainstream of librarianship given its more grass roots appeal, there is a role in the support for the creation of quality metadata and the development of appropriate ontologies perhaps in association with learned societies and relevant communities. There is also a need for compliance to be included in the librarian’s future role, to see that that IPR and copyright restrictions, where applied, are observed. Though this policing activity may not be so popular in the list of things to do. The information network and community has become complex. Managing this has become critical for all librarians. Within the STM world, notably within biomedicine, the gold standard has been the individual, peer-reviewed article. It is sometimes claimed that the peer reviewed individual article is no longer so dominant as a communication device. The article has been transformed. It has become part of a network. The research article is often the gateway or portal into a world

Research Libraries 97

of simulations, data analyses, modelling, etc. Though the article has become richer in its evolution, it has become less essential. There are many other information sources that now compete for the attention of the researcher. In the biomedical area there is Genbank with its online community that contributes their data and findings directly into the datasets. For many parts of the community this has become the primary means of research communication. Others suggest that the future of the librarian is tied up with the fate of the institutional repository. Librarians could become the stewards of the local IR. Some librarians (as well as others in the information community) have been concerned that the rate of voluntary deposition by authors of their works within IRs has been low (without mandate), and they therefore see IRs as a failure. Not so, according to Scott Plutchak, a US leading librarian. The IRs are good places for all the digital ‘grey literature’. Applying metadata to this extensive range of literature – not just article administration – could offer a new role for librarians. This would allow metadata to be captured by the search engines. This grey literature could then offer additional competition to the research article. A further challenge is the role that librarians have with regard to blogs, wikis and podcasts. These will have some impact on scholarly communication, the extent of which is currently unclear. It is however na¨ıve not to assume that it will have some impact, particular for those currently aged under twenty years who are attuned to such services in the world of entertainment and education as part of their daily lives. So what will be the librarians role be given the challenge to the research article edifice? They will become: • Stewards of the institution’s information needs. This will no longer be just to

buy or licence information products. The traditional library funds are being used in other ways • Guiders through the information morass. • Evolve partnerships with the faculty and students. Particularly involved with the authors and faculty in a proactive way. In serving their clients needs it appears from research that librarians have a different set of skills and resources that they can call on, as compared with the researchers themselves. According to Outsell, librarians use RSS for 40 % of their alerts, and Blogs for 9 % – the former for themselves, the latter on behalf of end users. It appears that forward thinking on the role of libraries in the information supply chain is being undertaken, in the USA at least, more by public libraries than corporate libraries. Some social tagging is being undertaken at University of Pennsylvania (such as Del.icio.us); Denver Public Library has an imaginative next generation service using MySpace. Then there are ‘libraries in the Mall’. The recommendation made by Outsell is that librarians should look at non-traditional users in seeking a future role as well as serving the emerging needs of the younger generation.

98

The Key Players

6.5.3 Understanding the new user Given the above conceptual background, some suggestions for market-based information includes: • Organisations should make information available when and where it is needed.

Tools such as RSS should be used. • Organisations should look for a wider, disenfranchised audience as much as

serving the easily reached targeted core market • Organisations need to provide direct access to the object not just the metadata. • OpenURLs will enable digital objects to be exposed. • Services such as ‘people who like this also like that’ as used by Amazon should

be included. Information resources should be seamlessly integrated, whether digital or print, for ease of navigation and to ensure format agnostic discovery. The sector should avoid organising content too much by containers (journals, books, standards, patents). The future strategy should also embrace non-textual information and the power of pictures, moving images, sight and sound. Federated search may allow some of this to be achived but we still seem to build things the way we did in the past with specific repositories containing single format information. There is also the need to deliver information irrespective of the device. In future it can be anything from laptops to smart phones, from PDAs to e-Books to iPods. There is the need to fit things on ever smaller devices. The future nanophones are expected to be multipurpose. E-Ink and e-paper are also likely to have an impact on display requirements, as already apparent within Amazon’s new Kindle service. Interoperability between targeted and broad-based federated search using OpenURL is also becoming important. This will align services to the context within which the user operates. The user should drive the formatting, not the supplier. All players in the scholarly communication sector should reduce the hurdles to connecting people with the information through technology. This introduces the need for flexible licensing options that librarians can administer. Also, to cater for the new alternative digital formatting options, non-traditional indexing cataloguing and classification should be developed including user-driven tagging. There is a great deal for the library community to work on in developing a longer term sets of strategies which will ensure that they have a role on the future of scholarly communications.

6.6 Other Stakeholders 6.6.1 Collaboratories A ‘collaboratory’ has been defined as a “center without walls, in which the nation’s researchers can perform their research without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries” (Wulf, 1989). A more simplified definition would describe the collaboratory as being an environment where participants make use of computing and communication technologies to access shared instruments and data, as well as to communicate with others.

Other Stakeholders 99

At present, research is still largely conducted in small teams whereas, in future, it is likely that in some research areas it will be dominated by a very few large collaborative groups. For example, in signalling gateways there are at most 15 centres worldwide – eight in the US and seven in the rest of the world – and these centres share their information. There are many other examples of such collaboratories, with the high energy physics area being a leader in this area. The collaboration between Los Alamos National Laboratory and CERN is another leading example. It is claimed that future research in many fields will require the collaboration of globally distributed groups of specialist researchers, each group needing access to distributed computing, data resources and support for remote access to expensive, multi-national specialised facilities. Problems of geographic separation are especially present in large research projects. The time and cost for traveling, the difficulties in keeping contact with other scientists, the control of experimental apparatus, the distribution of information, and the large number of participants in a research project are just a few of the issues scientists are faced with. Whilst the growing need for massive scale collaboration has emerged in recent years, the development and implementation of collaboratories proves to be expensive in many cases. From 1992 to 2000 financial budgets for scientific research and development of collaboratories ranged from US$ 447,000 to US$ 10,890,000 and the total use ranged from 17 to 215 users per collaboratory (Sonnenwald, 2003). The main initial goal was to provide tools for shared access and manipulation of specific software systems or scientific instruments. Such an emphasis on tools was necessary in the early development years of scientific collaboratories due to the lack of basic collaboration tools to support rudimentary levels of communication and interaction (such as videoconferencing). Nowadays such communication facilities are more ubiquitous and the design of collaboratories may now move beyond developing general communication mechanisms to evaluating and supporting the very nature of collaboration in the scientific context. As such, collaboratories are increasingly becoming a feature of the research communication process, with specific tools being adopted that marry the geographical and multidisciplinary needs of the research group. It is a process that is almost unique to each discipline, lacking the generic and broad approaches typified by scholarly publishing in the past.

6.6.2 Funding Agencies The sheer power of funding by agencies such as NIH, RCUK and the Wellcome Trust, and their increasing willingness to exploit this power-base, suggests that there are now new kids on the block who are driving the scholarly communication agenda. The National Institutes of Health (NIH), for example, spends $ 28 billion per annum, which generates 25 % of US scholarly papers (and 8 % of the world’s output of papers). The Research Councils in the UK (RCUK) spend £ 3.5 billion on research and Wellcome Trust £ 378 million on mainly biomedical investigations. This funding buys a lot of allegiance. There is a pragmatic approach to open access visible in the UK which in particular has generated some spurious findings – for example, Stevan Harnad’s assertion that the UK economy ‘loses’ £ 1.5 billion per annum in not throwing its full support behind self-arching (open access) by authors.

100 The Key Players

The new masters – the politicians – are concerned about economic growth (as indicated by Gordon Brown whilst chancellor of the exchequer). The funding bodies are more concerned about control and accountability, whereas the people operating at the coal face of research communications – notably the open access advocates – it is all about belief and values.

6.7 Government involvement There is a considerable attention by government agencies to make their information available for free (particularly in the USA, where a number of bills are due for consideration which, if passed onto the statute books, would open the flood gates to open access publishing systems). This would make the output of all US government funded research ($ 45 billion) placed in the public domain and not subject to publisher copyright. A recent OECD report has furthermore come out in favour of recommending that the raw data underpinning the research article also be made freely available if the funding which produced the data and publication partially came from the public purse. Governments have therefore become important players in determining the structure of publishing, though their impact is still in its early days in most countries. However, in the UK there has been a powerful influence exerted by the Joint Information Services Committee, part of the Higher Education Funding Council and as such a government body. There are concerns that in many areas JISC is rewriting the rules and is coming into direct and no-holds-barred conflict with the publishing sector. Whilst HEFC and JISC operate within the newly created Department of Universities, Innovation and Skills (DUIS) with the aim of improving research efficiency, the equally new department of BERR (which derived from the Department of Trade and Industry) is more concerned with ensuring that the UK economy is healthy and expands. The fact that 8 % of the gross national product in the UK has intellectual property aspects, and that the UK publishing community is a significant tax provider and stimulus for exports, means that BERR is less likely to support some of the measures that JISC in particular would like to see implemented. There is therefore a touch of schizophrenia within the UK government departments over the issue of electronic publishing. 6.7.1 Case Study Joint Information Services Committee JISC updated its strategy for the provision of an ICT infrastructure within the UK, as published in their JISC Strategy 2007–2009. Whilst also reaffirming its commitment to the support of institutions in realising their goals in the digital age, for the first time it has specified support being given to UK institutions’ activities to engage with business and the community. Innovation, integration, sustainability and working in partnership continue to be the cornerstones of JISC’s strategy, part of a longer-term ambition to support the enhancement of the UK’s competitiveness and the contribution of further and

Government involvement 101

higher education to society and the economy. The strategy specifies a range of priority activities that JISC will be undertaking with partners in the coming years. These include: • the continued development of the UK’s e-infrastructure; • the development of a freely available content layer of academic and scholarly

resources; the support of the research community through e-science, the semantic grid and virtual research environments; encouraging the further use and take-up of e-resources; the support of the management, business and administrative processes of institutions; • further exploration of the benefits of e-learning and its contribution to widening participation, and • an investigation into the ICT skills and expertise capacity within institutions. • • • •

To read the JISC Strategy (2007–09) see http://www.jisc.ac.uk/strategy0709 6.7.2 National Priorities on toll free or toll paid In effect the clash between the moralists who support an open publication system where price/costs are not to be used as a deterrent to accessing publicly-funded research results, and the pragmatists who point to the fact that there is no such thing as a ‘free lunch’, is also being played out within government departments. The education and research ministries stand to be in opposition to the trade and business focused departments on this issue. Each country will seek to establish its own balances between these often conflicting goals. Within Europe this raises a particularly interesting spectre. As the Presidency of the European Commission rotates from one country to another, the incumbent presidency could have significant impact on the business model supported by the EC during a given period. For example, if the country has no significant publishing industry and has a strong need to establish a democratised education system, it will favour an open access model over the traditional business models applied by scholarly journal publishers. This may be true of the Portuguese presidency which exists at the time of writing and the Serbian presidency which is upcoming. However, in countries such as the UK, where the scholarly publishing sector provides substantial tax receipts for the government, and where its international reach is such that overseas revenues become a significant input into the national income, a greater element of protection of the existing publication model can be expected. How significant will this be? It is difficult to find out what the tax receipts are from the publishing industry which go into the government’s national coffers in the UK. These could be considerable based on the summation of the revenues from UK-based publishing companies, and applying corporate tax percentage against these. It would be of interest to know what this figure is in the case of the UK, and to compare the tax payments against the expenditure by organisations such as JISC on creating a new business model which could have a significant negative impact on publishers’ future business operations. In effect it could be claimed that the UK publishing industry is being expected, by the government, to fund its own demise.

102 The Key Players

6.7.3 Emerging Competition New stakeholders are also emerging that are transforming the electronic publishing scene. These include: • Powerful Search engines offer first stop and ease-of-use access to material

(Google, Yahoo) • New multimedia services are being offered which combine elements of the STM

information needs within one interface (ScienceDirect, Web of Knowledge) • Web 2.0. This has spawned a whole industry of social bookmarking, social

publishing operations which focus on information being created and freely distributed at the grassroots level. These include Connotea, MySpace, flickr, etc. Meanwhile the existing stakeholders are facing the challenges that these innovations have generated in a number of ways. Publishers are essentially retrenching, emphasising their claimed right over copyright (which is being weakened by Creative Commons licences) in preventing widespread distribution of works published by them. Libraries are searching for a new role, supportive of the migration towards the semantic web, development of ontologies and creation of quality metadata to enable targeted access to the world’s STM information resources. In this respect they see competition from the alternative, grassroots school, of Web 2.0 developments whereby ‘folksonomies’ (unstructured approach to information entity tagging) take over. Aggregators and Intermediaries have had fluctuating fortunes over the years, and have essentially been faced with disintermediation as publishers sought to bypass them and gain direct access to users. In the new Internet environment they are reinventing themselves with such services as aggregated Mashups (mixing the API from different services), social bookmarks, signalling gateways, etc. In effect there is boiling cauldron within which current and future stakeholders in the electronic publishing sector are trying to make their mark. One area where they are attempting to establish their credentials is in the creation of information formatted to the precise needs of the users. The next section will look at how some of the established formats are coming under stress as new digitally-based forms gain acceptance.

Chapter 7

Publication Formats

Books and journals have been the traditional means whereby researchers have communicated their results to a national and international audience of peers. This has been the case for over three centuries, with the results of the research efforts being selected for relevance and voracity by the same peer community that benefits from the publications. It has been a closed circle, but one that has served the original purpose of keeping knowledgeable people informed about developments in their field. It has achieved buy-in by the research community at large, with the services provided by publishers, librarians and intermediaries being used and supported by authors and readers of research material. However the arrival of electronic publishing techniques, and the change in habits of part of the research community as they adapt to the new opportunities which technology presents, is creating a paradigm shift, one that has the potential to change the whole nature of the scholarly communication process within the next ten years. Already we are seeing the journal and book come under the microscope.

7.1 Journals and e-Journals Learned journals first began in the 17th century when Henry Oldenburg (1619– 1677) created the world’s first scientific journal, Philosophical Transactions. This appeared in March 1665 as part of Oldenburg’s involvement as first Joint Secretary to the newly created Royal Society of London. Since then the journal has become the main means by which researchers communicated their research findings to the peer group. There are now approximately 23,000 journals which are at the same time ‘active’, ‘peer-reviewed’ and ‘scholarly’. They include about 1.4 million articles each year from one million authors. There are some 10–15 million readers of these articles worldwide, located in 10,000 institutions. It has become a large, dispersed and stable publication system catering for the specific and unique needs of scholars and scientists worldwide. Furthermore, scholarly journals continue to grow in number and size. The growth in number of journals has been about 3.5 % per annum on a sustained basis each year, and the number of articles has been growing at about 3 % per annum. The functions of the journal, as originally espoused by Henry Oldenburg in 1667, have been in four main areas: • Registration of the author’s results • Certification, or peer review, which gives the article creditability

104 Publication Formats

• Dissemination, enabling the results to be seen by those entitled • Archive, to keep the record of science

Some have added a further function to the above • Navigation, enabling access to the final version

Much has been written about the history of scientific journals. There is general agreement that scientific journals were first published in 1665 in France and England and the first U.S. scientific journal was published in the first half of the 1800 s. It is clear that the scientific journals soon caught on and began to play an important, if not essential, role in scientific communication. Early articles often appeared in more than one journal and they would frequently also be published in a monograph, a practice not too dissimilar from current experience. Why did the journal achieve such prominence? The key can be found in one of the four functions espoused by Oldenburg. The need is for certification, or a peer review system. This gives solidity to the article – it has been sifted out by those who know and is given recognition that it contains material which is worthy of dissemination. The article becomes part of the ’minutes of Science’. In Derek de Solla Price’s classic work in 1963 on ‘Little Science, Big Science’ he showed that there was a consistent growth in research output since the early days of their inception to the current day. Two examples showing how consistent this growth is can be found in the following charts.

1665

1765

1865

1965

Figure 7.1 Journal Growth 1665–2006 Source: Mabe, STM Association

According to Ulrich’s International Periodicals Directory there are currently about 250,000 periodicals published worldwide. About 23,000 of them by Ulrich’s definition are still active, peer-reviewed, academic scholarly periodicals and, of those, nearly 10,200 are currently available online (in 2007) but the number is increasing rapidly as the Long Tail embraces electronic publishing. According to another definition and count, Meadows and Singleton (1995) estimated the number of scholarly journals worldwide to be about 70,000 to 80,000

Articles

Journals and e-Journals 105

1981 1982 1983 1984 1985 1986 1987 1988 198 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

Figure 7.2 Article Growth 1981–2005 Source: Michael Mabe, STM Association

some ten years ago. In 1995 Tenopir and King estimated there to be about 6,800 U.S. scholarly scientific journals (Tenopir & King, 2000). In 2001, six years later, they estimated this number to be about 7,800 journals. The growth of traditional (printed) scientific journals in the U.S. has been rapid, but it has lagged behind worldwide growth since the 1930s. The number of scientific journals has doubled about every 15 years until recently, but the rate of increase has tailed off during the last five years. A better gauge of scholarly output is the number of articles published. Over a 15year period (1960 to 1975) the number of scientific articles published in U.S. journals increased from the estimated 208,300 to 353,700, an average increase of 3.4 % per year (King, et al. 1981). By 1995 the number of scientific articles published in the U.S. was estimated to be 832,800 (Tenopir & King 2000) and in 2001 the estimate is nearly one million articles. This growth reflects a corresponding increase in number of actively employed scientists. There were estimated to be 2.64 million such scientists in 1975 and 5.74 million in 1995 or 7.5 scientists per article published in 1975 and 6.9 scientists per article in 1995, which suggests a possible small increase in publishing by U.S. scientists. It is noted that authorship (i.e., the number of scientists who write) has increased dramatically because the number of co-authors per article has increased steadily over time. Other important journal indicators are the number of articles, number of article pages and total pages published per journal title per year. All of these indicators are shown in Table 7.1 below. While the size of journals continue to increase in terms of pages and number of articles, it is noted that the average size of articles has also increased from an average of 7.4 pages per article in 1975 to 11.7 pages in 1995, but decreased again to 9.2 in 2001. This result is important because it has a significant bearing on journal publishing costs and, therefore, on the price of journals. Many of the criticisms levied against journal publishers in recent years revolves around the ‘serials crisis’ and who is responsible for the resulting frustration gap.

106 Publication Formats

Table 7.1 Average number of articles, article pages and total pages per journal title per year: 1975, 1995, 2001 Average per title per year

Year

No. of articles No. of article pages Total pages

1975 85 630 820

1995 123 1,439 1,730

2001 132 1,215 1,520

Source: Tenopir & King (1975,1995, 2000)

Publishers have not been particularly successful in demonstrating that there has been a greater output per product – that the journal itself has been growing in size – and that this has been a significant contributory factor in the price inflation of journals. Underlying this has been a particularly successful business policy which was adopted by the leading commercial journal publishers, mainly based in Europe, in the post World war II period. Elsevier in particular grew in size and market share during this period by allowing its more successful journals an almost unrestricted growth in output. Journals which started life with one volume per year were allowed to double, treble in size as the manuscript flow came in and libraries became locked in as subscribers. Much of the resulting wealth for Elsevier was generated in the past few decades by a handful of very popular titles which had in some cases between 20 to 30 volumes of material each year – Brain Research, Biochimica et Biophysica Acta, Nuclear Physics, etc. As long as libraries continued automatic subscription renewal the publisher’s revenues were protected, and the upfront payment provided the wherewithal to continue their editorial expansion, acquisitions, marketing reach and profits returned to shareholders. This editorial policy was largely hidden behind the sterile arguments about journal price inflation exceeding the general retail price index. It was not a like-for-like comparison given the unit of measurement, the journal, was increasing in size as much as price over the years. This is what gave the journal publishers such clout by the end of the last millennium. A war chest built on a very lucrative and profitable journal publishing programme which – in some respects – had become out of control, certainly in respect of the available library budgets with which to pay for them. 7.1.1 Multi author articles Another structural change which was taking place during this period related to the authorship pattern of journal articles. According to an article in Science Watch, which used data from the ISI database on journal articles, the numbers of scientific papers published with more than 50, 100, 200 and 500 authors reached a plateau from 2000 to 2003, then experienced a sharp increase in 2005. That year, each group reached its all-time highest levels. More than 750 papers with 50 or more authors were published in 2005, compared with a little more than 500 the previous year. Papers with more than 100 authors grew by more than 50 %, from 200 to just over 300 in 2003, and to 475 in 2005. Papers

Journals and e-Journals 107

with 500 or more authors increased from 40 in 2003 to 131 in 2005. This group saw the largest jump of all – a 200 % increase – admittedly from a low base. However, whilst 2005 saw great leaps in the total number of multi-author papers, most of the totals for the assorted groupings of multi-author papers showed a decline during 2006 – the most recent year of available data. In order to assess the general makeup of recent multi-author papers, Thomson Scientific divided the papers with more than 100 authors into two main groupings: physical sciences and biomedicine. The physical science group increased its volume by 144 in 2005 to total 393. (The majority of these papers were in the main field of physics.) Meanwhile, the number of biomedicine papers with 100 or more authors declined from 41 in 2004 to 19 in 2006. As such there is a key discipline feature to authorship patterns – there is no one size fits all in authorship among the sciences as a whole. The study also highlighted individual papers that had the highest number of authors. In 1987, a paper with 200 authors took the prize, and each year has seen the winning number increase. In 2000, the most-multi-authored paper had 918 contributors. The paper taking top honours in 2006 had 2,512 authors. This ties in with the earlier commentary on the growth of collaboratories and their significance in the electronic publishing scene. 7.1.2 The evolution of the electronic journal The above are largely structural features affecting the traditional printed journal system – but in the latter half of the 1990’s a new phenomenon appeared. Electronic publishing systems produced the e-journal either as a derivative of the printed journal, or more recently, as a born digital product in its own right with no companion print version. A necessary requirement for Big Deals and pay-per-view from publishers has been the emergence of such electronic publishing systems. During the 1990’s there has been an evolution of e-journals, in parallel with the printed journal which built on these electronic publishing systems. There were concerns expressed at this time that publishers may be unable or unwilling to adapt to the changing technological environment. The concept of the ‘valley of death’ was coined at this time to suggest that those publishers sliding down the printed slope of the valley would not necessarily be the same agencies that would adapt and be able to climb up the digital or screen-based slope on the other side of the valley. There was a belief by many in the USA that electronic journals would resolve all of the earlier flaws in scientific journals. In particular, the National Science Foundation funded a large number of studies in the 1960s and 1970s that were designed to facilitate specific progress towards electronic publishing and generally towards enhanced scientific communications (see Tenopir & King 2000). This included major investment by NSF in pioneering editorial processing centers (EPCs). By the late 1980s all the components for producing electronic journals were in place – authors could input articles through magnetic tapes or cards; publishers used computerized photocomposition or computer driven typesetting; libraries were automating; bibliographic databases began to be searched online; scientists had access to computers and modems; and the genesis of the internet had been established. Through the 1980s many predicted the imminent emergence of electronic journals and several journal studies were launched as early as the mid-1970s to further

108 Publication Formats

the progress. However, there were a number of barriers to e-journal adoption. One early electronic journal project by professor John Sender led to him to summarise the results of his work in the early 1980’s with the claim “I have seen the future . . . and it doesn’t work!” Such was the extent of the barriers which were still arrayed against online journals at that time. However, over subsequent years effective and often inexpensive technology and a commitment by publishers to adopt IT has enabled this transition by-and-large to occur. To the extent that some 60 % of the core (23,000) active, refereed scholarly journals are now available in digital format. The migration from print to electronic journals takes into account many separate developments in technology, in marketing, in administrative notably legal developments. What factors determine whether things change? Is there a ‘tipping point’ beyond which the migration from print to electronic escalates? As an indication of the rise of electronic publishing within a particular community we can look at the situation in the UK. Spending in the UK higher education sector on electronic resources, excluding e-journals subscriptions, rose by 11.6 % in 2004/5 compared with 2003/4, and rose as a proportion of total acquisitions spending, to 15.8 %. The balance between electronic and print-only subscriptions for journals is shifting in favour of electronic. In 2004/5, of the total of £ 96 million spent on journals in UK higher education, the breakdown was as follows: Table 7.2 Spending on UK journals by UK Higher Education sector (2004/5) Print subscriptions Electronic only subscriptions Joint subscriptions

£ 40 m (41.6 % of total journal spend) £ 27 m (28.1 %) £ 29 m (30.2 %)

Total journal spend Total acquisitions expenditure

£ 96 m (100 %) £ 173 m.

In the UK the total journal subscriptions increased in 2004/5 by 2.8 % to a new record of 1,200,000 subscriptions. There were well over twice as many subscriptions – 126 % more – in 2004–05 than there had been ten years previously. However, over the same ten-year period the ratio of books to journals expenditure has changed from 45:55 to 34:66. Total spending on journals during the ten years has risen by 77.9 %. Libraries see value in long runs of serials and therefore get locked into maintaining a journal subscription, and book buying is a series of one-off buying decisions. In May 2007, the UK Publishers Association produced a survey which gave detailed information about the subject areas and formats of journal and book expenditure in UK universities, comparing these against earlier years and also against the situation in Europe, the US, Canada, Australia, New Zealand and Japan. (See: UK University Library Spending on Books, Journals and e-resources - 2007 Update – The Publishers Association (www.publishers.org.uk)). From such updates it is clear that electronic journals are now a firmly established feature of the information scene. In North America, within the Association of Research Libraries (ARL), approximately 50.2 % of total journals expenditure was on electronic subscriptions in 2004– 2005. The total expenditure by the 113 ARL libraries on both books and journals in

Journals and e-Journals 109

2004–2005 was 2.9 times greater than the total spend by all UK higher education libraries. One of the barriers was that the economic cost of the components, while not prohibitive, was still too high; there were no standard codes for text input and the processes for handling mathematical equations, chemical compounds and certain images were still inadequate; and the computer capabilities of individual scientists were extensive but not yet all-pervasive. These components continued to develop, became much less expensive and their application more commonplace through the 1980s and 1990s. Article Readership During the latter part of the progress towards electronic publishing a series of readership surveys were undertaken by King Research, the University of Tennessee, and Drexel University from 1977 to 2002. These showed that the amount of reading by scientists was increasing from about 105 readings per scientist in 1977 to over 125 in 2000 to 2002. Readings were defined as “going beyond the table of contents, the title and the abstract to the body of the article.” The observations in 1977 and 1984 were from national statistical surveys of scientists (under National Science Foundation contracts). All other observations were from “self-selected” universities, government agencies and labs, companies or publishers that asked King Research or the University of Tennessee to perform a readership survey. Thus, there could be bias in those estimates, though not sufficient to discredit the overall findings. Over 15,000 scientists have responded to these surveys over the years. 140 120

102

106

1977

1987-83

118

114 101

101

1985-89

1990-93

125

100 80 60 40 20 0 1984

1994-98

2000-01

Year(s)

Figure 7.3 Amount of readings per Scientist

Other studies, dating back to 1948, indicate that scientists have always read a great deal. Projecting time in which reading observation is made (e.g., a week or month) to a year, Bernal in 1948 estimated that medical researchers read over 300 articles a year and engineers about 70 to 80 articles. Most articles are authored by university scientists (i.e., about 75 %) and individually they tend to read more than the non-university scientist. However, over 70 % of readings are by scientists located outside of universities in government laboratories and agencies and in companies and non-profit institutions – this apparent anomaly is because most scientists are employed outside of universities (Tenopir & King, 2000). Other studies confirm this conclusion. For example, early studies indicate that ACM (computing) journals are written by experts for other ex-

110 Publication Formats

perts (mostly university), but these experts constitute less than 20 % of readership. These in part support the earlier contention that there is a vast, disenfranchised knowledge worker market beyond the traditional market which journals had been targeted at. Electronic journals potentially expose this latent demand more readily than printed journals. There was a period during the 1960’s and 1970’s in which many believed that scientific journal articles were rarely read and concluded traditional journals were a huge waste of paper. Much of this assumption came from the Garvey & Griffith surveys of the 1960s and early 1970s. Estimates of their sample average of 17 readings per article have often been cited. In fact, in a detailed 1963 report Garvey and Griffith reported that the journal articles sampled were estimated to be read 520 times. Others using the same survey methodology have suggested even greater reading per article: Machlup & Leeson (1978) found that economics articles were read an average of about 1,240 readings per article and King, et al. (1978) reported that Journal of the National Cancer Institute (JNCI) articles were read an average of 1,800 times or 756,000 times for the entire journal over its 12 issues. All of the studies which report readings from table of contents ignore the fact that information from articles are often circulated and many articles are photocopied (or printed out) and passed on to scientists through interlibrary loan, document delivery or by colleagues exchanging copied articles over coffee breaks. Another type of estimate shows that there were about 640 readings per article in 1977 and 900 readings in the late 1990s (Tenopir & King, 2000). This is important validation for the business model used by traditional journal publishers as the ‘reach’ by such journals was clearly more extensive than critics had claimed. Concerns about Journals As suggested above, there was a period, particularly in the 1960s and 1970s, when the use, usefulness and value of scientific journals were being questioned. * Readership was thought to be low because some equated the amount of reading with the average number of times an article is cited. * There was concern about an information explosion that would ultimately bury scientists in a sea of paper. In fact, as indicated earlier, the number of articles published merely reflect the growth in number of scientists. * There were also myths, since discredited, that scientists only wrote to achieve tenure or self-aggrandizement and readers already knew about the research being reported before it was published (Tenopir & King, 2000). * One more valid concern was the time delay from research findings to initial manuscript submission to final publication. Therefore there was a short period when readers expressed concern with the usefulness and quality of journals, thus resulting in a hesitancy to submit manuscripts to publishers and to read these journals. However, it also seems clear that this concern began to recede quickly. Electronic journal use Recent surveys of scientists and medical faculty and staff at the University of Tennessee showed that 35 % of their article readings are now from electronic journals (Tenopir & King. 2002) and readership studies at Oak Ridge National Lab-

Journals and e-Journals 111

oratories revealed a similar proportion of readings from e-journals (Tenopir, et al., 2000). A 2002 readership survey (survey by Montgomery and King) showed that over one-half of readings were from e-sources and most of them from ejournals. In 1995 the American Astronomical Society (AAS) developed an advanced electronic journal system, which quickly became an essential resource for astronomers. A late 2001 survey of AAS members (survey by Tenopir, Boyce, King) showed that nearly all of their article readings were from electronic sources. By 1997, and from 1998 to 1999 the proportion of faculty from a sample of Association of Research Libraries (ARL) institutions who used electronic journals increased from 48 % to 61 %. Thus, scientists appear to have adapted quickly to sophisticated electronic journals and as older material get digitised, a very high proportion of readings are coming from electronic journals. However, it is still surmised by Tenopir and King that both print and electronic journals may continue to be used for some time to come. Purpose of Reading Over the years, useful purposes of reading journal articles have been expressed in a number of ways. These ways include: • to expand research horizons (60 % of the times used), generate alternative ap-

• •

• • •

proaches (58 %), generate critical dimensions (54 %), set limits of acceptability (50 %), test alternatives (27 %), and reject alternative approaches (13 %) (in research projects, Allen 1966); for research (48 %), design and development (33 %), analysis and testing (48 %) in central laboratories; to form a basis for instruction of new scientists, to acquaint themselves with the accumulated knowledge that exists when embarking on new research or inquiry, facilitate day-to-day scientific work, and advance the research front ; general interest (46 %), research (33 %), teaching (15 %), coursework and other purposes (6 %) (economists, Machlup & Leeson 1978); current research (38 %), help on the job (25 %), writing a paper or speech (13 %), general information (10 %), and teaching (5 %) in library journals; displaying an interest in a topic (68 %), to help in work (25 %), researching a topic (14 %) in electronic journals;

The purposes for which articles are read vary by type of scientist and their work setting. For example, engineers are less likely to use journals for research (79 %) and instruction (21 %) than scientists (95 % and 42 % respectively) and social scientists (90 % and 73 %), although differences are much less pronounced for general updating (71 % to 79 %) and to obtain research funds (10 % to 15 %). According to Tenopir and King, at the University of Tennessee (1993) readings by scientists were for current awareness or professional development to support research (75 %), teaching (41 %), to prepare formal publications or formal talks or presentations (32 %), and administration (13 %). At 23 non-university settings (including the Oak Ridge National Laboratories), scientists read journals for current awareness or professional development (30 %), background research (26 %), conducting primary research (17 %) or other R

112 Publication Formats

& D activities (11 %), for communications related activities (14 %) and management (3 %) (Griffiths & King, 1993). After electronic journals were introduced, surveys at the University of Tennessee and the Oak Ridge National Laboratory showed little differences in purposes of use or in ratings of the importance of journals. The Value of reading Journals At the University of Tennessee, in 1993, when virtually no readings of articles were from electronic sources, scientists gave articles an average importance rating of 4.83 per reading for teaching purposes and 5.02 for research purposes (with a ratings scale with 1 meaning not at all significant, to 7 meaning absolutely essential). 13 readings were absolutely essential to teaching and 23 to research. In the 2000 to 2002 surveys by King and Tenopir the ratings of importance were even greater, but they did not differ much when read from print or from electronic versions. Earlier studies had also found journals to be important. For example, in a 1959 study, journals were found to be the primary source of creative stimulation (Scott, 1959); in 1976, one study found that the quality of information is a major factor in the adaptation of innovation and another that they were the single most important source of information in achieving product innovation; in 1978 economists said that 32 % of their readings were useful or interesting, 56 % were moderately useful, and only 12 % were not useful (Machlup & Leeson, 1978). There are two types of value of information provided by journals. There is a ‘purchase or exchange value’, and also a ‘use value’. • The ‘purchase value’ is what readers (or libraries) pay in money for journals

and what readers pay in their time getting and reading articles. Placing a monetary value on scientists’ time, the average ‘purchase value’ is about $ 6,000 per scientist per year, which tends to be five to twenty times the amount expended on the price paid for journals (Tenopir & King 2000). • The “use value” is the outcome or consequences of reading the information content of journal articles. According to professor Tenopir, there are over 50 studies over the years (1958 to 2002) that observe the amount of time scientists and others spend reading journal articles. These studies report ranges of over 20 hours per month (physical sciences, Halbert & Ackoff, 1959; cancer research, King, et al., 1978; physical sciences, Brown, 1999) down to only 2.2 hours per month (engineers, Allen, 1966). The median of these studies is about nine hours per month or 108 hours per year. Tenopir/King studies have further demonstrated that the amount of time spent by scientists reading journal articles may be increasing. The trend in average annual time spent reading by scientists are shown in Figure 7.4, below. The results show an increase of about 80 hours per scientist in 1977 to over 100 hours in 2000/2001 surveys (an amount close to the median of all reported time studies). Regardless, it is clear that scientists are willing to spend a substantial amount of their time reading articles. Another type of value is the consequence of having used the information provided by articles (i.e., use value). Over the years there are a number of indicators of the use value of journals as follows:

Journals and e-Journals 113 113

Hours per annum

120 100

97 80

90

91

1985-89

1990-93

109

104

1994-98

2000-01

80 60 40 20 0 1977

1987-83

1984

Year(s)

Figure 7.4 Time spent reading

• When compared with other resources used for work (i.e., computing, scientific

instrumentation, support staff, consultation from colleagues, etc.), information found in documents (largely journals) was rated second highest in importance for primary research and highest for most other tasks performed by them. • Journal reading was found to save money and staff time. The average such savings was about $ 300 per reading, recognising that a small proportion of readings resulted in most of these savings. Most of the savings were achieved by not having to do primary research or stopping an unproductive line of enquiry. • Some surveys showed that journal reading affected the timeliness and quality of research. About one-third of readings resulted in faster performance or more rapid completion of the principal activity for which the reading was done. About 44 % of readings resulted in improved quality of the activity for which the reading was done and the level of improvement was nearly 50 % in rating of quality. Recipients of awards for recognition of work tend to read substantially more articles than those not recognised. This result has been observed in over 30 surveys dating back to the 1960s. In one recent survey, 24 high achievers read about 60 % more articles than staff with equivalent degrees, fields of specialty and years of experience. In one company, a stated goal was to increase the speed of products from discovery to the marketplace. About 20 major processes were identified and 31 % of reading was found to result in work in these processes being completed faster. There are many other indicators of favourable outcomes of reading as well, including inspired new thinking or ideas (35 % of readings); narrowed, broadened or changed the focus (15 %); resolved technical problems (8 %); etc. The evidence is overwhelming that there is great value placed in scholarly articles. The library contributes to this value by facilitating journal access to users. e-Journals in industry In the pharmaceutical industry there has also been a significant adoption of ejournals. Pharmaceutical research firm Best Practices, LLC, published a report which claims pharmaceutical companies are increasingly turning to e-journals and other alternative media to speed up the scientific publication process. 50 % of companies participating in a benchmarking study indicated that they are becoming more focused on non-traditional media in order to expand and accelerate publica-

114 Publication Formats

tion of clinical trial results. The study entitled ‘Scientific Publications Strategy: Managing Reputation, Clinical Trial Results and Commercial Relevance’, included companies such as Pfizer, GlaxoSmithKline, Eli Lilly and Merck. It sought to guide pharmaceutical and biotechnology companies in the transition to the new publishing environment. Using new publication outlets is only one of the ways that pharmaceutical companies are responding to a demand for faster and more transparent scientific publications. The study also found that companies are posting trial results on designated industry or government websites. Other key topic areas in the research include transition of global publication function from commercial to clinical oversight; tactics for handling publication of neutral or negative clinical trial results; optimal structure for the global publications function; and strategy changes for the new marketplace. The purposes of electronic publishing are changing rapidly in the industrial environment in particular, and this is having its consequences on the traditional central role of the journal. 7.1.3 Future of the Journal Do the four core functions of the journal, spelled out at the start of this section, stand the test of time? Are they capable of adapting to the new circumstances created by the digital revolution and the Internet? In essence the publishing community feels that these functions are as robust as ever and are still required irrespective of the media adopted. Others claim that several of the core functions could be taken over by other stakeholders in due course. Are the core functions as robust as is generally assumed? • Registration. This was certainly relevant when the issue of authorship and own-

ership of an article was clear. In the old days one or a few authors wrote the article – the structure of Science nowadays is such that co-authorship and multi-site collaboration are much more in evidence. Collaborative research is undermining some of the strength of the registration pillar from the days when a single author was dominant and that he/she sought personal prestige and recognition through sole registration. • Certification. The peer review process has become the cornerstone for ensuring that only valid articles were put into the public domain, and the poor articles would be rejected. This remains true for most disciplines, but in some the communication of results takes place well before the review process, and reviewing is used subsequently purely for status, tenure, career advancement, etc and to create the Record of Science. Also, experiments are taking place that are making the review process more open and interactive, moving away from the traditional closed peer review. • Dissemination. The rights management which publishers have put in place to protect the copyright for their and the author’s benefit has come under scrutiny in many parts of the world, and this has led to the open access publication system being considered as an alternate. One where dissemination and reach

Books and e-Books 115

could in theory be much wider than the traditional subscription-based system. • Archive. Few publishers understand the mechanics of offering access in perpetuity for their material. It is a technical issue which several key libraries and other intermediaries have taken on board. Under the print model there was an archivable printed product – with digital publications a new set of technologies, foreign to the expertise and experience of publishers, has emerged. Will these various challenges cause the journal publication model to break down? Probably not, certainly not in the foreseeable future. But there are chinks appearing in what has been until the advent of the digital age a solid and supported mechanism. There is the suggestion that a focused approach to doing what publishing does best – organising and administering the certification process – is where the future of journal publishers lie. But there are also suggestions that, under electronic publishing, every journal will become a channel to market which will facilitate, and be facilitated by, social networking. The ‘journals’ will become catalysts for a community, for discussions groups and other such fora. The emerging Web 2.0 world, as we will see later, is a highly viral one. But will publishers retain their relevance as supplementary data, as datasets and other new innovative information formats emerge? As collaborative and social networking becomes more in evidence using Web 2.0 processes? As the semantic web becomes more robust? These are questions which currently concern the key strategists operating within the publishing industry.

7.2 Books and e-Books Book publishing has moved the other way – at one stage there was a threat that books were being driven out of the market. Several of the large commercial publishers, those with a significant commercial overhead, were reluctant to take on books. They had difficulty making a viable return from a large number of single-effort publishing activities. With e-Books the commercial issues facing printed books have been taken a step further as questions about the quality of the viewing devices became important. The issue is that the small screen is inadequate to cope with the requirements of researchers, particularly when long strings of data, formulae, tables, etc are needed. 7.2.1 The e-Book phenomenon It has therefore become almost folklore that e-Books have not achieved the level of success that early entrants into this area had assumed for a variety of reasons. Despite the apparent attraction of having a book accessible anytime on a screen and with the search functionality which handheld electronic systems offers, the results of initiatives in this area failed to live up to expectations. However, there is a time for product/services and some pundits believe that the time for e-Books has now arrived. In fact some experts close to the situation feel

116 Publication Formats

the e-Books could be bigger than any of the much-hyped social collaborative tools such as FaceBook, MySpace, etc. It could be the next main revolution in scholarly communication. This assessment was based on a major investigation of access to eBooks from a number of publishers’ lists which was undertaken in 2006/7 as part of the ‘SuperBook’ project run by University College London/Centre for Publishing. This involved the UCL library attendees being used as a test site or laboratory for analysing the search patterns of UCL staff and students. The result has been that a sizeable latent potential for e-Books has become evident once the licensing barriers were dismantled. The potential is not necessarily of the same order of magnitude as e-journals. The research sector has taken e-journals in its stride – e-Books are geared more towards the students and professionals who need immediate access to textbooks, particularly those that are recommended reading by the lecturer or course leader. These same students are now ‘paying customers’ within the UK university system, and therefore have some influence on what is purchased by the institution. Under the print paradigm the library would only hold one or a few copies of the recommended text, and place them on short loan service. This made access impossible for all but the fleet of foot and the wealthy. Instead, with e-Books, all students could access the same title through e-Book access. What has made the potential even greater is that access is not necessarily confined to the physical library. Students working from home, from student halls of residence, from the workbench, could all access the title. Suddenly e-Books become a valuable teaching tool, no longer dominated by the other elephant in the room – the e-journal. In fact in the SuperBook project 15 % of usage occurred from halls of residence. The SuperBook project included the e-book titles from Oxford Scholarship Online, Wiley Interscience and Taylor & Francis, with deep log analysis being the technique applied to assess usage. This included collecting the number of sessions, time spent viewing a page, duration of session, pages printed, etc. Two of the 1,200 titles included in the analysis accounted for 12 % of all page views, and the top 20 titles accounted for 43 % of usage. Interestingly, there was a scattering of monographs within the list of top titles accessed. Sessions lasted on average 3.5 minutes. There was volatility in the usage rates from month to month and by subject area accessed. Perhaps this is a reflection that the analysis was carried out at a single library with its unique usage profile. Usage patterns indicated an age of e-book dispersion – 17 % of views were to books less than two years old (whereas 25 % of titles were that old). This is partly due to students not being fixated with the latest or newest material; that social sciences and humanities titles often had a classical appeal, or it may be a reflection of the lag in the updating of lecturers list of recommended readings. Library catalogued books achieved over twice the usage of non-catalogued books. Again, this might suggest a role for librarians here, but in the popular Oxford Scholarship Online service most of the abstracts and keywords were supplied by the author rather than a librarian created catalogue entry. It therefore appears that information seeking within e-Books is very different from e-journals. Sessions are busier, older content is viewed frequently, more use occurs at weekends, and it is strongly academic/teaching biased, etc. On the basis of the evidential information gleaned and insight achieved from the year-long SuperBook project, UCL/CIBER is now about to start an extension of this research

Books and e-Books 117

to cover 100 university sites in the UK as part of a National E-Book Observatory study. This will involve extending the research from one university site (UCL) to over a hundred. Ebrary results The electronic book provider ebrary has also reported the results of an informal eBook survey that was completed by 583 international librarians. These are available at: http://www.surveymonkey.com/s.aspx?sm=kqxPd1nXcrb9lRVf9ZWjQQ%3d %3d The Global eBook Survey covers such topics as eBook usage, purchase drivers and inhibitors, digitisation and distribution platforms. It also includes an analysis of the results that were authored by Allen McKiel, Director of Libraries, Northeastern State University. Key survey findings include: • Respondents ranked Google and other search engines as the least common ways

• • • • •

that patrons find eBooks. Google and other search engines were also indicated among the least prevalent factors that drive eBook usage, while the library’s catalogue and professor and staff recommendations were the most important. 78 % of respondents described usage of eBooks at their libraries as fair to excellent. 81 % of respondents indicated that the integration of eBooks with other library resources and information on the web is ‘very important’. Respondents were equally split between subscription and purchase, as their model of choice for acquiring digital content. 56 % of respondents are digitising their own content or actively considering it, and 81 % indicated that they will digitise their materials in-house. Price, subject areas, and access models were indicated as the most important factors when subscribing to or purchasing eBooks.

ebrary currently offers more than 120,000 eBooks and other titles from over 260 publishers and aggregators. For four consecutive years, ebrary has been named as one of the eContent’s 100 list of ‘companies that matter most’ in the digital content industry. Oxford Scholarship Online This is an example of a service built by the journals division of Oxford University Press, at a cost of between £ 2–3 million, which offers greater granularity than a traditional monograph. The OSO system has been designed to treat chapters as articles, through inclusion of appropriate quality metadata. This therefore increases the range of material open to users as their search can be extended from e-journals to ebooks and e-chapters. As the entire corpus of chapters across all subjects from OUP are made accessible a valuable new resource of information has been made possible. Until September 2007 there were four main subject areas included in OSO. Since then the service has been extended to 13 areas and 1,800 books online. They have over 9 million fulltext page views, some 25,000 abstracts and 150,000 keywords to make access possible at the chapter level. It means the library loses some of its appeal as publishers become the new librarians. OSO has therefore restored the book to the level of prominence that many saw as being possible for the e-book, albeit at the price of a significant investment in the

118 Publication Formats

production cycle. Despite this investment OUP claim that front-list print sales have not been compromised – print usage has in some areas increased as discoverability has been enhanced. e-Books on other platforms Included within ‘The value of online books to university libraries’, a white paper distributed by Elsevier’s ScienceDirect to their library customers in April 2006, there was an assessment made by five librarians on the reasons why they chose to buy books online. Simultaneous user access was cited as 13 % of the reasons for purchase, whereas 40 % claimed ‘ease of use’ was important and ‘content’ was cited by 36 %. Within the next five years Elsevier Science & Technology Books will move from having less than 5 % of the list available in electronic format in late 2006 to over 80 %. Similarly, there has been a white paper produced by Springer for its library customers in 2007 that analysed ‘e-Books – Costs and benefits to Academic and Research Libraries’. This was based on contributions from six university sources from across the world. They also rated e-Books according to some 11 potential benefits, with ‘enhanced user access’ being top of its benefits list, followed by enhanced book functionality and access to more content. Compared with the print edition, e-Books achieves cost savings in terms of physical handling/processing of books, followed by savings on storage/arching and then circulation. A Strategy for Book Digitisation Kevin Kelly, former editor of Wired magazine, wrote an article for the New York Times of May 14th, 2006, which reviewed the position of books in a digital world. Part of this move from a physical paper to a new paradigm is the emergence of powerful technology in the form of global search engines. A ‘universal library’ should, according to Kelly, include a copy of every book, article, painting, film, and music past and present. From the Sumerian clay tablets to the present there have been about 32 million books published, and 750 million articles and essays. Given current technological advances, all this could be compressed onto 50 petabytes of hard discs. However so far only 5 % of all books have currently moved from analogue to a digital format. The long prophesied e-Book revolution in this area of converting printed books to digital has not happened. This conversion has not happened despite the portability and utility of the digital library. The real trick will be when each word in each book is cross linked, clustered, cited, extracted, and remixed and woven deeper into the culture than ever before. Readers would be able to add their own tags – such as the hundreds of viewers adding tags in flickr. These tags will be assigned faster, range wider and serve better than out of date cataloguing schemes particularly in frontier scientific areas. The link and the tag may be the two most important inventions in the last fifty years, according to Kelly. However, another aspect is that as users click on links this ensures that every click elevates the item’s rank of relevance. “One is anonymously marking up the Web with breadcrumbs of attention”. Search engines are transforming our culture because they harness the power of relationships that clicks and links leave behind. This tangle of relationships is precisely what gives the Web its immense power.

Books and e-Books 119

According to Kelley, “In a curious way the universal library becomes one very, very, very large single text: the world’s only book”. Digitised books can be unraveled into single pages or even further into snippets of a page. These snippets will be remixed into reordered books. The universal library will encourage the creation of virtual “bookshelves”. Bookshelves will be published and swapped in the public commons. Indeed some authors will begin to write books to be read as snippets or to be remixed as pages. Once snippets, articles and pages of books become ubiquitous, shuffleable and transferable, users will earn prestige and perhaps income for creating an excellent collection. When books become a single liquid fabric of interconnected words and ideas, works on the margins of popularity will benefit from the ‘Long Tail’ – that place of low-to-no-sales where most of the printed books traditionally lived. It will deepen our grasp of history and bring the archive back into current usage. It will cultivate a new sense of authority – everything related to a particular topic will be easily accessible and on tap. It provides the infrastructure for new functions and services – such as “mash ups”. However, so far the ‘universal library’ lacks books or content. Creation of intellectual efforts commingles to create culture. Exact copies of works in pre-printing press era were impossible – each work was original. The printing press basically created cheap duplicates. Copyright bestowed upon the creator a temporary monopoly. Initially this was 14 years in the US. At the same time public libraries supported mass access to enable the works to be woven into the fabric of common culture. This became known as “fair use” or “fair dealing”. This has been undone, partly by the 1976 Copyright Act in the USA. Various iterations have pushed copyright protection from fourteen to seventy years and beyond. This made it difficult to move work into the public commons. It becomes an eternity within the time scales of the Internet. However, it has spawned a massive collection of abandoned or ‘orphaned’ works. The size of the abandoned library is huge – 75 % of all books in the world’s libraries are orphaned. 15 % are in public domain. Only 10 % are still in print. The 15 % of the world’s 32 million catalogued books are in public domain and are therefore potentially free for everyone to borrow, imitate, etc. Almost the entire current scanning effort by American libraries is aimed at this 15 %. This is in the commons. The 10 % of books in print will also be scanned before too long by Google, Microsoft, Amazon, etc. Google and Microsoft have a partnership programme by which they scan books on behalf of publishers. The real problem is the 75% whose ownership, in copyright terms, is unclear. The prospect of tracking down the copyright of the approximately 25 million orphaned books is a huge challenge. However, Google decided to overcome this problem by scanning the whole of the orphaned books collection, but only allowing snippets of the text – which match a search query – to be revealed. Google’s lawyers claim that exposing a quote or an excerpt should qualify as “fair use”. For out-of-copyright books Google will show the whole book. For in-print books, Google will work with publishers to commercialise their books in digital form. For the dark orphans Google will show only limited snippets. However, authors and publishers accuse Google of blatant copyright infringement. The reason is that because the dark books now have some sparks of life in them, publishers do not want to lose this new potential revenue source. The orphan

120 Publication Formats

books issue – the inability to check the ownership of every one of the 75 % books – could be overcome however by maintaining a common list of no-scan copyright owners (and only exposing small parts of the rest). Within the US there seems an East Coast/West Coast split in the industry (defenders of the book versus defenders of the screen). The copyright of the book model has allowed the flowering of human achievement and a golden age of creative works – as well as authors being able to earn a living. But the wealth created by this print model is being used to shore it up, to protect it from copyright violation. The new model is based on intangible assets of digital bits, where copies are no longer cheap but free. As the digital world expands, the old model is dethroned. Now relationships, sharing and connections are the new assets. Value has shifted away from the copy towards the many ways to recall, annotate, personalize, edit, authenticate, display, mark, transfer and engage a work. Search engines are challenging everything. The search engines operate a new regime. Search uncovers not only keywords but also relationships. The basic premise is that the value of any work is increased the more it is shared. Things can be found by search only if they radiate potential connections. Having searchable works is good for culture. This leads to copyrights being counterbalanced by copyduties. In exchange for public protection of a work (copyright) a creator has an obligation to allow the work to be searched. No search no copyright. Science is on a campaign to bring all knowledge into one vast, interconnected, peer-reviewed web of facts. Every new observation or bit of data brought into the web of science enhances the value of all other data points. However, the legal clash between the book copy and the searchable web promises to be a long one. Technology may resolve this. Scanning technology in particular – users will employ it in increasing numbers. What this trend suggests is that copies don’t count any more. Copies of isolated books, bound between covers, soon won’t mean much. But their content will gain meaning as they are drawn into the growing web of relationships. They need to be wired into the universal library. According to Kelly “In the next few year the lobbyists from the publishing industry will work hard to mandate the extinction of the “indiscriminate flow of copies”. However, the reign of the copy is no match for the bias of technology. The great continent of orphaned books, 25 million of them, will be scanned into the universal library. Because it is easy to do technically, even if the law is against it”.

7.3 Document Delivery 7.3.1 The market for article supply One of the significant changes that have occurred within the scholarly information space in recent times has been the rise and fall of document delivery. Until 1998/9, there was a gradual but consistent growth in the number of requests received by intermediaries who dealt in the supply of documents to libraries and their patrons on demand. Worldwide, by 1999, there were some 10 million requests per annum being filled by the key document delivery agencies. Since then the number has been halved. Why?

Document Delivery 121

Partly because librarians, who acted as agents in the purchase of documents from centres such as the British Library, CISTI, INIST, etc, were facing budgetary problems. They were balancing their budgets as best they could between books, journals and other media, and document delivery (which did little to expand the physical collection in the library) was being squeezed. However, the primary cause for the market decline in documents was because publishers introduced the so-called Big Deals in 1998/9. These meant that a library, in subscribing to a package of journals from a particular publisher, would be able to get access to all the electronic versions of the publisher’s titles for a marginal increase in the subscription price. Besides being a Big Deal, it was also often a good deal for libraries as their collections suddenly expanded dramatically. This expansion was in peripheral journal titles, titles they would have hitherto not bothered to subscribe to, but now came almost for free. But it was these same titles that had hitherto also been responsible for much of their document delivery demand. Instead of having to buy the articles required by their patrons from intermediaries such as the British Library, InfoTrieve, CISTI and INIST, they now had these requested articles available online from the publisher direct at no extra charge. The particular challenge facing the established document delivery centres is how quickly the market will continue to decline, and whether or when it bottoms out. Key future trends in the market appear to be: • Information will become increasingly disaggregated – i.e., researchers will be

•

•

•

•

able to pick small items of information (parts of articles, tables, graphs, each transactionally identifiable through a persistent identifier such as DOI – see later). Researchers will also increasingly want strong links between content (i.e., citation links, other relevant articles, links to the data) and easy navigation to this other content. Traditional customers of the document delivery services (researchers at higher education and commercial institutions) will get more of this information through licensed content (Big Deals and consortium licences) and powerful new resource discovery tools such as Google. The full impact of resource discovery tools (Google etc) is yet to be felt. Traditional customer needs to go outside to any aggregator will be low. Instead of an ‘article supplier’ they will look for suppliers who can take on their content management roles more widely (from outsourcing physical holdings through to whole outsourced management of libraries) – so that they can focus on content strategy rather than execution. There will be more and more sources of content, with the combination of search engines and institutional or disciplinary repositories becoming a powerful source of information (and one which will threaten traditional publishing models). These will be largely free, threatening the document delivery model further. Open access publishing in all its forms offers users/librarians the ability to obtain required articles at no cost – how many of them will choose to use the document delivery when a service charge will be imposed for delivering the very same article? Particularly as in some cases the Creative Commons licence that the author has agreed to may prohibit the use of that article for commercial purposes (i.e. document delivery). There will be more researchers looking for information, so document delivery will be well positioned to supply the harder to find material (from older, smaller

122 Publication Formats

publishers). In addition, because of the price advantage of library privilege supply, national libraries will be most attractive to non-commercialresearchers. However, both these older materials, and these markets, are expensive to serve (and there is a high price sensitivity for non-commercial customers too). • Demand for document delivery will continue to fall. Also, the demand patterns will significantly shift away from institutional support to non-commercial usage. New roles will emerge in the management of IRs and disciplinary repositories, in creating new focused research tools, in text mining services, in the provision of outsourced management for corporate/ HE libraries, and in the creation of information linkages to help researchers navigate between content. The trend towards e-Science in some subject areas will have a growing impact on the nature and format of information being generated and used. The standalone refereed article could lose its current pivotal role, with the consequence being the decline of the ‘traditional’ document delivery service. • The only potentially positive unknown is how far a new market will be generated by the confluence of the ‘disenfranchised’ knowledge worker – six times the number of academics – with Google and other powerful search engines. Traditionally these professionals have been locked out of the document delivery system because of the systems being library-focused, and because alerting systems were not available to tell them what was available. Both these are now changing, with the potential to bring in a new end user market. In essence, though it is still believed that document supply will continue to decline over the next few years it is also assumed by pundits that the market will probably ‘bottom out’ at some point. At this bottom level, the document delivery centres will become the ‘final backstop’, the deliverer of last resort, for those difficult, esoteric items that will be requested when all other avenues have been exhausted. The ‘easy’ documents will be fulfilled through publisher-maintained online sources, and cheaper local sources. What this means is that ‘article supply’ may no longer be seen as a market in its own right. Instead it will be part of the sector that supplies disaggregated information – i.e., unbundling of journals – which will increasingly be done with value added features such as links through to other articles or increasingly other digital objects of interest. InterLibrary Loans Total disintermediation of the document suppliers may be prevented by one feature in the communication process that has its roots in national and international copyright law. In the UK and the US in particular there are clauses in their national copyright laws that allow “fair use” (USA) or “fair dealing” (UK). This means that, if individuals are prepared to guarantee that a particular article will be used for personal and private use only – i.e. excluding commercial use – one copy of the article can be obtained from document delivery centres without having to pay a royalty fee to the publisher. Inevitably and particularly given the commercial Pay-Per-View aspirations of publishers at present, this causes concerns within the publishing sector. Publisher trade associations have been fronting discussions with library associations, public funding bodies and government agencies to ensure that the interlibrary loan activity does not undermine the commercial viability of journal publishing. At heart

Document Delivery 123

there is a fear that too much or too easy interlibrary loans will mean that libraries could cancel journal titles and rely on ILL as backup to supply required articles on demand. The following chart from the Association of Research Libraries (ARL) shows how InterLibrary Loans have been a rapidly growing part of the research libraries’ activities in the US. It reflects the need to make use of support organisations to provide infill for those items that the research libraries can no longer subscribe to.

Figure 7.5 Trends in ILL acquisitions at US Research Universities

Reference was made in the 2003 UK Select Committee Enquiry (Scientific Publications: Free for all?) to the proposition that it was always possible to get hold of a copy of an article in the UK through the local public library (see art 42). Both Sir Crispin Davis (Reed Elsevier) and Dr John Jarvis (John Wiley and Sons) made this claim. The point was also made that the existing subscription and licensing of scholarly journals by publishers, did not preclude other end users from getting access to required articles through alternative means. They claimed that there was no lock out of end users. Even though relying on a public library sector to supply documents was making it difficult both for the public library and the end user who had to wait for the article ordering process to be completed. In fact a challenge was thrown down by one expert who provided evidence (Vitek Tracz) for the committee members to try and obtain an article from the local public library. In theory it is possible, in practice it was more difficult. The Committee concluded “We are not convinced that journal articles are consistently available to members of the public through public libraries” (Article 42). The British Library purchases over 40,000 serials as part of its collection, and articles from these serials can be purchased under strictly controlled circumstances.

124 Publication Formats

Public libraries can order documents from the British Library providing they are registered to do so. Most public libraries are registered, but some are not. It is often the smaller, sub-branches of public libraries that are not registered, and faced with a request to supply a copy of an article will refer the requester to larger branches. Even public libraries that are registered with the BL may not always offer a service of ordering documents to anyone other than their own library card holders. In those cases where the public library does respond to a request and delivers an article it is subject to current Copyright Law. That is, unless there is a signed declaration by the end user that the item is for his/her sole use for private research, they will be required to pay copyright fees as set by the Copyright Licence Agency in the UK. If the item is required for commercial purposes, whether or not it is as a photocopied article delivered by the British Library (BL), or as a locally-held item within the public library, a sticker scheme would apply. This means that the requester would buy a sticker from the library desk as payment for the royalty fee set by the CLA, and the sticker would be placed on the article. At present, the BL does not offer Library Privilege (i.e. royalty-free articles) directly to end users but only through registered intermediaries. The text of the declaration is available at http://www.bl.uk/services/document/pdf files/ decform.pdf. The BL imposes a condition on all intermediaries who register with the BL that they must obtain a declaration for each and every LP copy – see clause 11(a) of the standard Terms and Conditions at http://www.bl.uk/services/ document/pdf files/tandc.pdf

The BL does not state that the declaration must be signed at the time of making the request but it strongly recommends that this be done. It is also recommended that declarations, all of which are paper based at the present time, should be retained by the intermediary for at least six years. The BL does not have a right of audit for examining declaration forms but it does keep records of what has been ordered by intermediaries and could, if there be sufficient grounds, generate an order trail. However, more recently the British Library has announced its intention to offer electronic article delivery to the end users through a new BL Direct service. This has been challenged by publishers who see the electronic delivery of articles something which they feel breaches the terms of the copyright act. The arguments are still being pursued and guidelines being sought from legal and governmental sources. There is still some confusion about what the law intended, and how this relates to the new and developing culture of information exchange in an Internet environment. But one fear that the supporters of the Interlibrary Loan/Document Delivery school have is that the powerful publisher lobby, formalised through the International Association of STM Publishers, will impose a licensing based solution to the delivery of articles through the Internet. This would restrict the ability to provide a quick, instantaneous and cost-effective service for end users. Subito analysis of document delivery A recent entrant to the document delivery business has been the Subito consortium of large research libraries in Germany. They have set up a mechanism to offer rapid electronic delivery of articles with only a minimal royalty being made available for publishers. The resulting price was therefore much cheaper than the prices (service charge plus full royalty to the publisher) charged by all other legitimate document delivery agencies. As a result their service volume rocketed from low levels to over

Document Delivery 125

a million requests in a few years in early 2000’s. Publishers reacted to the loss of royalty and the way Subito could allegedly erode their subscription business by taking Subito to court in Germany. Meanwhile, in an issue of the Journal of Information Science (32 (3) 2006, pp. 223– 237), two Austrian representatives from the Subito collaboration reported on a study they undertook to investigate features of Subito’s document delivery activity. In 2003, Subito collated the order data for the 500 most requested journals at all of the 33 delivery libraries within the Subito network. These 500 titles, from approximately 750,000 unique titles available within Subito, accounted for 22.1 % of all orders. The nature of this research project prevented identification of the ‘long tail’, but the assumption is that it is extremely long. Of the ‘top 500’ titles, 80 % came from countries which were predominantly English-language. Elsevier alone was responsible for 14 % of all requests among the top 500 (with Lippincott, Williams and Wilkins (LWW) the next largest at 7 %). Small societies and publishers accounted for 60 %. Medical, pharmacological and biology titles were the most frequently requested titles – 72 % came from these combined areas. The study then related the pattern of document requests to citation data as obtained from Science Citation Index-Journal Citation Reports’ (SCI-JCR) top 100 journals. From this they claim there is little overlap. The top five rankings from the two lists included only one journal in common (Nature). There is also barely an indication of a relationship between the two lists if the comparisons are made within sub-disciplines. The pattern of demand, reflecting the markets served, suggested that more application-oriented journals were requested from Subito while ISI’s JCR top rankings cover more basic research journals. The study showed that the share of current publications appears much higher among ordered articles, compared with cited ones. Perhaps surprisingly, given past assumptions about the motivations for document delivery and the importance of document delivery in being held responsible for the cancellation of large expensive commercial journals, Subito data shows that there is no evidence that articles of journals with higher subscription rates are ordered more often. The article is entitled Document Delivery as a Source for Bibliometric Analyses: the Case of Subito, and is authored by Christian Schloegl and Juan Gorraiz. Despite the above, and the concerns which some document delivery agencies have about making agreements to deliver articles under a licensing arrangement, a new agreement has been reached between the publishers and Subito in December 2007. It appears that at the last minute, negotiations were concluded between the three organisations of Subito e.V., Boersenverein des deutschen Buchhandels, and the International STM Association to put e-delivery of scientific articles on a contractual or licensed basis. The last minute in this instance is that a German law on copyright was due to come into force early in 2008 which would have unknown consequences on the delivery of e-documents. The terms of the agreement permit the member libraries of the Subito consortium to reproduce and electronically deliver any article requested by students, private persons or businesses within Germany, Austria, Liechtenstein and Switzerland. This agreement complements the so-called international framework agreement reached between various STM publishers and Subito concluded in July 2006. The text of the framework agreement can be seen at: http://www.boersenverein.de. According to a Subito representative, the agreement provides a solution more certain and less cumbersome than the one to be introduced by the new German Copyright

126 Publication Formats

Act. “It would avoid a serious rupture in the supply of electronic documents,” said Dr. Berndt Dugall, President of Subito e.V., and Director of the library of the Johann Wolfgang Goethe University, Frankfurt am Main, Germany. The agreement now endorsed enables the conclusion of individual licensing agreements for an initial five-year term. These agreements will authorise Subito and its member libraries to use the content of published scientific journals for electronic document delivery. Publishers set a rate for the delivery of documents to private persons and for businesses. Students, public and non-commercial libraries as well as academics will benefit from discounts under the agreement. The practice of a contractual licence agreement overriding national copyright laws may stick in the crawl of some national document delivery organisations, however.

Chapter 8

Legal Developments

8.1 Legal Initiatives 8.1.1 Creative Commons Humanity’s capacity to generate new ideas and knowledge is its greatest asset. It is the source of art, science, innovation and economic development. Without it, individuals and societies stagnate. The creative imagination requires access to the ideas, learning and culture of others, past and present. In future, others will use what is being done today. We are building a global society by standing on the shoulders of giants.Creativity and investment should therefore be recognised and rewarded. The purpose of intellectual property law (such as copyright and patents) should be now – as it was in the past – to ensure both the sharing of knowledge and also the rewarding of innovation.However, according to one section of the community, the expansion in the laws’ breadth, scope and term over the last thirty or so years has resulted in an intellectual property regime which is radically out of line with modern technological, economic and social trends, and stresses reward over collective knowledge sharing. Several principles have been put forward by this sector including the principle that laws regulating intellectual property must serve as a means of achieving creative, social and economic ends and must not be seen as ends in themselves. The public interest requires a balance between the public domain and private rights. It also requires a balance between the monopoly rights implicit in IP laws and the free competition that is essential for economic vitality. These principles have become the foundation of a Creative Commons school that was led by Professor Lawrence Lessig from Stanford University. A specific outcome has been the creation of templates, available to be downloaded, which enables the author to determine which rights they are prepared to keep and which to make freely available. The Creative Commons school insist that government should facilitate a wide range of policies to stimulate access and innovation, including non-proprietary IP models, such as open source software licensing and open access to scientific literature. Intellectual property laws should also take account of developing countries’ social and economic circumstances. 8.1.2 Science Commons Science Commons and SPARC (the Scholarly Publishing and Academic Resources Coalition) have released new online tools to help authors exercise choice in retain-

128 Legal Developments

ing rights over their scholarly articles, including the rights to reuse their scholarly articles and to post them in online repositories. The new tools include the Scholar’s Copyright Addendum Engine, an online tool created by Science Commons to simplify the process of choosing and implementing an addendum to retain scholarly rights. By selecting from among four addenda offered, any author can fill in a form to generate and print a completed amendment that can be attached to a publisher’s copyright assignment agreement to retain critical rights to reuse and offer works online. The Scholar’s Copyright Addendum Engine will be offered through the Science Commons, SPARC, the Massachusetts Institute of Technology (MIT), and the Carnegie Mellon University websites, and it will be freely available to other institutions that wish to host it. It can be accessed on the Science Commons website at http://scholars.sciencecommons.org. Also available for the first time is a new addendum from Science Commons and SPARC, named ‘Access-Reuse’. This represents a collaboration to simplify choices for scholars by combining two existing addenda, the SPARC Author Addendum and the Science Commons Open Access-Creative Commons Addendum. The new addendum will ensure that authors not only retain the rights to reuse their own work and post them on online depositories, but also to grant a non-exclusive licence, such as the Creative Commons Attribution-Non-Commercial licence, to the public to reuse and distribute the work. In addition, Science Commons will be offering two other addenda, called ‘Immediate Access’ and ‘Delayed Access’, representing alternative arrangements that authors can choose. MIT has contributed by including its MIT Copyright Agreement Amendment in the choices available through the Scholar’s Copyright Addendum Engine. The MIT Copyright Amendment has been available since the spring of 2006 and allows authors to retain specific rights to deposit articles in MIT Libraries’ D-Space repository, and to deposit any NIH-funded manuscripts on the National Library of Medicine’s PubMed Central database. Science Commons’ goal is “to encourage stakeholders to create areas of free access and inquiry using standardised licences and other means; a ‘Science Commons’ built out of voluntary private agreements. A project of the non-profit copyright organisation, Creative Commons, Science Commons works to make sharing easier in scientific publication, licensing of research tools and materials, and databases.” Science Commons can be found at http://science.creativecommons.org 8.1.3 JISC and SURF’s Licence to Publish SURF and JISC – two organisations that promote the innovative use of ICT in higher education in the Netherlands and the UK, respectively – have published a model agreement that will help authors make appropriate arrangements with publishers for the publication of a journal article. This “Licence to Publish” is the result of several years of international consultation and aims to establish what they consider to be a reasonable balance of rights and interests in scholarly communications. According to the announcement the rise of digital channels of communication has meant that the process of publishing research material has been undergoing major changes over the last few years. SURF and JISC have pressed for arrangements to be made regarding copyright, with the interests of all parties being maximised. The overarching principle behind their activities is that the results of publicly funded

Legal Initiatives 129

research should be made freely and openly available to all who want to access them as quickly as possible. The main features of the Licence to Publish are that: • copyright in the published work remains with the author • the author grants the publisher a licence to publish the work • the licence takes effect as soon as the publisher has indicated that it wishes to

publish the work • once the article has been published, the author can make it publicly accessible –

in the form in which it was published by the publisher – by making it available as part of a digital repository • if the publisher so requests, the start of such public accessibility can be delayed for a maximum of six months. It is claimed that the new model agreement will be particularly useful where articles are published in the traditional way, with publications being made available only to subscribers. The agreement is available in both Dutch and English and can be used for publications involving more than one author. Use of the Licence to Publish is supported by the Wellcome Trust. For further information see: www.surf.nl/copyrighttoolbox/authors/licence/ 8.1.4 Future of Copyright The US Copyright law dates to the birth of the Republic. Article I of the Constitution assigns Congress the right to pass laws “securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” The first copyright law was passed in 1790, and it has been frequently and confusingly amended over the years, most recently in the Sonny Bono Copyright Term Extension Act of 1998, which extended copyright terms by twenty years. The stimulus behind the development of the Creative Commons by professor Lawrence Lessig of Stanford University has been to counter the publisher lobby that has been responsible for the extension of the term of the copyright with the creation of a licence giving the author the power to decide the use being made of their work. As will be outlined later, some pundits feel that once the social collaboration and networking activity takes hold over the next decade or so, copyright as we know it now will have largely disappeared. Orphan Works One of the main issues that currently receive much attention is that of ‘orphan works’. These are publications whose ownership is unknown. In the US, a Copyright Office investigation began into this issue in 2005 and a report was produced at the end of January 2006. There remain concerns about the definitions of an orphan work. Also, whilst it is proposed that a diligent search for a copyright owner of such a work may protect the user in using content, the term ‘diligent’ is yet to be clarified. There is no proposal to create a central repository of orphan works, which might be useful for photographs in particular.

130 Legal Developments

Also the current position of Section 108 of the 1976 US Copyright Act (particularly the limitations and exceptions given to libraries and archives) has been under scrutiny. NDIIP and the Copyright Office is to look into this, notably with respect to the rights libraries would have to make use of digital copies for own internal users and for other users. Issues about ‘virtual libraries’ are also being reviewed. This will bring to the fore issues such as ‘when will libraries become publishers?’ for example.

Chapter 9

Geographical Trends

9.1 Globalisation of Research Scientific research and scholarly publishing no longer operate within a village or even national mindset. William Brody, in his article ‘College Goes Global’ in Foreign Affairs, March/April 2007 (pages 122–135), states that research is now international in reach, and evaluation of research outputs is based on multiple criterion. Science has become a global industry, unfettered by national boundaries (except where issues of national security arise). Very few countries have the wherewithal to be totally self-sufficient in conducting research within a particular area. Even the United States which led the drive into Science at the time of the space race in the late 1960’s, and became dominant in some aspects of science and engineering developments, is now no longer so preeminent. Over the years the percentage of research articles produced in various disciplines has declined as other countries have taken up the baton and invested more of their national resources in research and development. Gross national product figures prominently as a key factor in determining the amount of R&D undertaken in most countries. The disparities in incomes between the average income in the USA ($ 40 k), some 20–40 times greater than India and China, has a number of consequences not least in the export of labour intensive activities to less expensive countries. We are seeing this with the transfer of much of the scholarly publishing production and editorial processes from western countries to India in particular. This indicates that the gap between the rich western countries and the emerging economies such as China, India, Brazil and Russia is narrowing, and a new feature of Change – still to be made manifest – will be the greater sharing of the scholarly communication effort around the world. The rise of China and India as key players in the scholarly research and communication processes will soon be evident, and result in a significant altering in the world publishing order. There are strong indications that this investment in the scientific effort achieves results in that innovation is stimulated and the economic health of the nation is a beneficiary. The correlation may not be direct and immediate but a healthy R&D effort within a community produces a ripple effect across a number of applied and basic research areas. Therefore Science has become a vital function within national priorities, and efforts are being made in the main industrialised nations to ensure that a balanced and realistic proportion of the nation’s gross national product (gnp) is reinvested

132 Geographical Trends

in R&D and science. This proportion is minimally meant to be felt to be 2.5 % to 3 % of gnp. Though there may be a universal appreciation of the importance of Science and research within a country or region, there are nonetheless variations in commitments being made to support research. Different countries have different industrial and service infrastructures which partially dictate how they see, and put into place, a sustained research effort.

9.2 Movement of global funds for research Another major change has been that freedom has become more evident in the 20th century – freedom in movement of capital, labour and the spread of knowledge and culture. As such the economic global recoveries are now less fragile, since 9/11, than they were at the time of the First World War. Despite current indications, levels of global poverty have fallen, the middle class has grown (and is growing in rapidly in emerging countries such as China, India and Brazil). Only Africa seems to continue with a low-income profile. African countries have not reached the threshold of a per capita income of $ 4,000 above which economic improvement achieves ‘takes-off’ and is stimulated. Economic growth then accepts immigration flows. Money and capital are becoming international resources – they will flow where funds spent on science and research is considered optimal. At a conference in London in 2007, Sir Keith O’Nions, Director of the newly established Department of Innovation, Universities and Skills (DIUS) felt that there was an important international money flow – he referred to a map (see later) which indicated that $ 24.2 billion went from the European Union to the USA in research, whereas $ 14.4 flowed back the other way. At present only $ 4.1 flowed from the US to Japan but one suspects – given the US’s strong position in Chinese research – that this Asian trend will become a significant bilateral flow in due course. In part this is a consequence of the globalisation of service industries in general, with capital moving to where it can best be used. In some instances this can be the western developed world where focused research on high quality projects produce the greatest benefits, whereas in China and India we are seeing the massive flow of research activity, an increasing amount of which is collaborative and applied rather than fundamental. The point is that research will be done where it is best done, and money will move to support this. As such the UK needs to do world-class research. According to Sir Keith, given the overall Science Budget for 1997–2011, this is achievable for the UK in particular. 9.2.1 Regional variations Nevertheless, the expansion in global research activity is moving away from the USA and Europe towards China and India. The speed with which this is taking place is raising strategic questions that traditional stakeholders in the scholarly communication process are having to address. It raises issues of whether international collaboration is the way forward and to what extent and how. It also raises the problem faced by many less-developed countries of how to retain their research talent. Attracting foreign students and postgraduates to the rich

Movement of global funds for research 133

and powerful countries enhances a country’s perceived status as a quality global research centre. For example, 40 % of PhD’s doing research in the UK are from overseas, bolstered by some 220,000 overseas students. But providing incentives for these researchers to stay is difficult given competition from other countries. This involves focusing on the gross national product spent on science and R&D, such as the European Union’s challenging R&D target of reaching 3 % of GNP. However, the growth in Chinese R&D spend is over 20 % per annum which means that – if this level is sustained – China will overtake Europe as a major research centre by 2020. In fact the US’s National Science Foundation latest report on ‘Science and Engineering Indicators’ highlights the increasing importance which governments are attaching to knowledge-intensive economies for economic competitiveness and growth. However, the report also comments on the rapid emergence of the Asian economies as important players in the S&T system. China is growing at the most rapid pace and its government has declared education and S&T to be the strategic engines for sustainable economic development. Fragmentary data on India suggest that it is also seeking rapid technological development focusing on knowledge-intensive service sectors and biotechnology. In a relative sense the major EU countries as a group are losing ground, whereas the United States is maintaining its position across a variety of measures. Others areas and countries do not yet play a major role in the world’s S&T system, according to the report. In terms of R&D, spend the OECD countries share has dropped from 93 % in 1990 to 84 % in 2003. China has become the third largest R&D performer (behind the USA and Japan) Industrial R&D Whilst governments are increasing their R&D funding, industry R&D has often expanded its share more rapidly, leading to a declining share for public R&D funding. In the EU the government share fell from 41 % in 1990 to 34 % in 2001. It is also claimed that industry is increasingly looking beyond national borders in the location of its R&D activities. For example, more than a quarter of the UK’s industrial R&D was supported by foreign sources in 2002. As a result employment of industrial researchers has grown at about twice the rate of total industrial employment. For the OECD as a whole the full time equivalent (fte) of researchers rose from 1 million in 1981 to 2.3 million in 2002. The USA, China and other Asian economies have shifted into high–technology manufacturing sectors more rapidly than the EU-15 or Japan. Academic R&D Academic R&D has grown robustly but remains less prominent in Asia. The USA and the 25 countries constituting the European Union have been spending similar amounts for academic R&D, $ 41 billion to $ 44 billion in 2003, about twice their 1990 expenditures. However, China has experienced the most rapid growth in its spending for academic R&D, from $ 1.1 billion in 1991 to $ 7.3 billion in 2002, with double-digit growth rates since 1999. Nevertheless the academic sector, where basic research is conducted in most countries, plays a relatively small role (about 10 %) in China’s R&D system.

134 Geographical Trends

9.3 Implications on scholarly publishing Scientific expertise is expanding which diminishes the US quality advantage. According to the ISI database coverage (which does not include all titles) the total number of articles published rose from 466,000 to 699,000. The combined share of US, Japan and EU-15 countries declined from 75 % to 70 % of the total, with flat US output from 1992 to 2002. This led to a drop in the US share of the world’s article output from 38 % to 30 %. Meanwhile EU-15 output rose steadily to surpass that of the USA in 1998, and Japan’s output has also continued to rise. Output from China and the Asia-8 expanded rapidly over the same period by 530 % and 233 % respectively, boosting their combined share of the world total from less than 4 % in 1988 to 10 % by 2003. It is apparent that the future of electronic publishing as it impacts on scholarly and scientific publishing is linked to the amount of R&D countries are prepared to invest in, whether through public funds or through a healthy industrial R&D sector.

Figure 9.1 Market share of Publications in OECD countries, 2000–2004

Whilst US scientific output continues to receive a disproportionate share of the total citations to articles it appears that the quality of scientific output produced outside the USA is rising. The scientific portfolios of the emerging Asian countries suggest greater specialisation in the physical and engineering disciplines than that of traditional scientific centres. In 2003 more than half of China’s publications were in physical sciences and another fifth in engineering. Life sciences and social sciences represented a much smaller share. In contrast, literature from the US and the EU-15 showed a heavy emphasis on the life sciences (45 % to 54 %) and a relatively lighter share

Implications on scholarly publishing 135

on engineering (10–13 %) and the physical sciences (22–39 %). The literature from Japan falls between these two ranges. The Science and Engineering Indicators 2006 report suggests that there are 194 million people with postsecondary education in the world (2000). The US share of this ‘knowledge worker’ market has fallen from 31 % to 27 % since 1980. China and India shares have doubled to 10 % and 8 % respectively. These numbers are being supplemented each year by an additional 8.7 million per annum. Particularly strong increases occurred in Europe and Asia. This is the ‘lost knowledge worker’ area – lost in the sense that a majority of them have been excluded from the publication system in the past. This is the market sector that Google could enfranchise. Overall, current trends in degree production, retirement and immigration within the US suggest that the number of trained scientists in the labour force will continue to increase but at a slower rate than in the past.

9.3.1 Worldwide Trends in Article Output The number of scientific articles catalogued in the internationally recognised peerreviewed set of science and engineering journals covered by the Science Citation Index (SCI) and Social Sciences Citation Index (SSCI) grew from approximately 466,000 in 1988 to nearly 700,000 in 2003, an increase of 50 %. The growth of publications reflects both an expansion in the number of journals covered by the SCI and SSCI databases and an increase in the number of articles per journal during this period. The number of articles in a fixed set of journals that have been tracked by SCI/SSCI since 1985 has also risen, indicating that the number of articles per issue and/or issues per journal grew during this period. Other science and engineering journal databases that have broader and/or more specialised coverage of scientific fields in general show an increasing number of publications. Data on article authorship by country provide an indication of the knowledge and research capacity of regions and countries. Data by scientific discipline provide a comparative measure of national research priorities.

9.3.2 Trends in Three Major Publishing Regions Strong increases in science and engineering articles published in the European Union (EU)-15, Japan, and the East Asia-4 economies (China, including Hong Kong, Singapore, South Korea, and Taiwan) accounted for 69 % of the increase in world output between 1988 and 2003. The article output of the EU-15 grew by more than 60 % between 1988 and 2003, surpassing that of the United States in 1998. This rate of growth slowed, however, starting in the mid-1990s. Japan’s article output rose at a slightly faster pace than that of the EU-15, resulting in gain in output of nearly 75 % between 1988 and 2003. Japan’s growth rate, however, slowed in the latter half of the 1990s in a pattern similar to that of the EU-15. The article output of the East Asia-4 rose more than sevenfold, pushing its share of the world’s S&E articles from below 2 % in 1988 to 8 % in 2003. By country, the increase in output was 6 fold in China and the Taiwan economy, 7 fold in Singapore, and nearly 18 fold in South Korea, up from only 771 articles in 1988

136 Geographical Trends

to more than 13,000 15 years later. S&E article growth in China and South Korea resulted in these two countries becoming the 6th- and 12th-ranked countries by share of world article output in 2003. On a per capita basis, the article output levels of Singapore, South Korea, and Taiwan were comparable to those of other advanced countries. China’s per capita article output, however, was far below this level. The National Science Foundation (NSF) produced another report in 2007 which found that the number of US science and engineering articles in major peerreviewed journals flattened in the 1990s, after more than two decades of growth, but US influence in world science and technology still remains strong. The report, Changing U.S. Output of Scientific Articles: 1988–2003, found that changes occurred despite continued increases in funding and personnel for research and development. Flattening occurred in nearly all US research disciplines and types of institutions. In contrast, emerging Asian nations had large increases in publication numbers, reflecting their growing expertise in science and technology. Total European Union publication numbers also rose. Despite the levelling of articles published, researchersemphasise other evidence that indicates that US science and technology capability remains strong. They say the decrease in the US share of the world’s articles is not a surprise in view of growing research capability around the world, nor do they view it as a cause for concern. Europe United Kingdom. Education and science are key planks ion the new government’s agenda, with the aim of raising graduate enrolment from the current 29 % to 40 %. Science, technology, engineering and mathematics (STEM) are being particularly focused. The comparison with the 2.5 million STEM enrolments in India and 3 million in China give some urgency to reaching a greater scientific penetration in the UK. The government recognises that it needs a revolution in order to attract a return to the same sort of STEM levels that existed ten years ago let alone meeting new and more ambitious enrolment targets. Whilst the past years have shown a healthy growth in public funds in support of Science this may not be enough for the future. The Comprehensive Spending Reviews (CSR) dictate future levels of public support for Science and this competes with other government agencies for tight budget increases. Nevertheless, as pointed out earlier in this chapter, the head of the DIUS believes that the proposed UK Science budget will provide enough resources to keep the UK on track. Asia Four Asian societies – China, Singapore, South Korea, and Taiwan – out-distanced all others in the world between 1992 and 2003 with an average annual growth rate of +15.9 % in publications. The EU posted an average annual growth rate of 2.8 % during the same period, more than four times faster than the US. In terms of authors, Asia is different. Evidence from a CIBER email study undertaken for the PA/STM showed that Asian authors found it difficult to be published in western journals. This is partly a language barrier, partly communications and partly the publication process. As such Asian authors are more in favour of open access than their western counterparts. In a Publishers Association survey of 2005 involving over 500 Asian researchers, only 29 % submitted their manuscripts to

Implications on scholarly publishing 137

foreign journals. However, few local journals in Asia carry a SCI impact factor, and they are also not widely available. There has been an explosion in the number of STM journal titles since 1988. In 2003 there were 4,500, with English-only journals being 200. However, many of the 4,500 journals are aiming to become English-language. There are a considerable number of articles now being published in non-Chinese journals. They are experiencing problems in finding a place to put them. China. The scale of scientific research and expansion of high education in China means that more research output would come from the east. Some will be individual research efforts by scientists and institutions, some will demonstrate a greater global partnership between research centres. Already there is significant cooperative research between western and eastern research centres and this is growing but it has been suggested that the core quality basic research would come from the west and the mass outpouring of applied research findings from the east. But to ensure that the quality research was still produced from western countries a continued commitment to the science budget would have to be upheld despite expected intense competition for the available public resources. China represents a country where one is either impressed or concerned with the current situation. There was either a ‘gathering storm’ that means the west has to get its act together; or else the west needs to collaborate into a new set of relationships with China. China is developing at a stupendous pace. R&D investment has been growing at 20 % per annum since 1999. Journal article output has quadrupled since 1996. Student enrolment in 1996 was 3 million – in 2005 it was 16 million. There are currently 53 science parks with a further 30 planned. In 2004 it was the top manufacturer in the world of ICT goods. Despite this China does not have a truly innovative system – there is still much to be done. For example, R&D by the business sector is low. The high tech exports rely on foreign technology. Although patent numbers from China are rising, it is from a very low base. The impact factors of journal articles are rising but slowly. Plagiarism and fraud still have to be conquered. IPR enforcement is still shaky. The next phase of Chinese development involves China committing 2.5 % of its gross national product to R&D by 2010. It is also intending to stimulate more domestic innovation and rely less on foreign technology. Finally, it aims to get industry/commerce to drive innovation – these are all encoded in China’s ‘Medium and Long-Term Plan for Science and Technology’. Their main goal is “the smooth integration of China into an increasingly global knowledge and innovation system” (OECD Review of China’s Innovation Policy, 2007). Collaboration between China and the west is increasing. 25 % of articles are coauthored. The EU is losing ground in this respect to both the USA and Japan. The Silicon Valley connection is particularly strong for China. There are unique authoring needs among Asian researchers. In China alone there are 1.3 billion people. The style and culture of authoring for publication has a different tradition in China. For example, the first author is the student, then the supervisor, then the technician, etc, in listing the authors, which means finding the corresponding author can be somewhat difficult. Nevertheless, China is currently publishing 60,000 papers per annum, 6.5 % of the world’s total. This has been a major increase in a short period of time. However, in China there is a restriction on the allocation of ISSN numbers since the explosion

138 Geographical Trends

in titles began. Nevertheless, there seem to be awards and funds available from the Chinese government for quality articles. Australia Australia has taken steps to increase the open access publications available to the Australian research community. A recent study of the submission of material to eight (of the 34 universities in Australia) has shown a huge difference in submission rates from those which operate a voluntary submission policy for its authors, from the sole university which mandates that its authors use the local IR as a depository for their works. The non-intervention policy has achieved rates of well under 15 %; Queensland University of Technology (the mandated campus) has current levels of over 60 %, with expectations that this should rise to 100 % soon.

% of DEST output

2004

2005

45% 40%

T sm an ia

Q U

Ta

ns la nd

as h

Q ue e

M on

ur ne

n

bo

ur ti C

M el

AN U

35% 30% 25% 20% 15% 10% 5% 0%

Figure 9.2 Percentage of manuscripts deposited in Australian IRs

This suggests that the success of the open access movement is not likely to come from convincing authors to change their submission habits, rather by insisting that they change. In July 2007, a forum convened in Canberra, Australia, to address the issue of ‘Improving Access to Australian Publicly Funded Research – Advancing Knowledge and the Knowledge Economy’, was chaired by Colin Steele. The 100 delegates were mainly Australian but with a few international representatives. The overwhelming consensus from the forum appears to be that both publications and research data should be made available free on the Internet, as soon as possible, and justified by the public good. John Houghton, author of a major economic study in this area, made the point that the public good was synonymous with economic good – indeed Australia was vastly losing out on research impact without open access to its research outputs and research data. However, it was recognised that, as far as publications were concerned, the existence of publishers and the services they provided had to be accepted.

Implications on scholarly publishing 139

It was claimed by the bulk of the participants that the current business models for publications used by most publishers were unsustainable. The ‘oligopoly rents’ which are charged by publishers, and the bundling models (Big Deals) used to minimise competition, were simply not going to be acceptable or economically sustainable in the long term. Moves to other models were inevitable, according to this particular audience. Implementation of author-side fees for ‘Gold’ OA journals would be more difficult to implement in Australia; Australian library subscriptions are usually devolved to universities, and diverting a fraction of them to author-side fees may prove difficult. It was felt that Australia would need to do a significant amount of work on these to replicate what the Wellcome Trust had achieved in the UK. As for institutional repositories, these have attracted little support hitherto from Australian authors as the kudos remains with publishing in commercially available journals included in the Journal Citation ranking. Nevertheless, it was claimed that all Australian universities would have an institutional (or consortial) repository by the end of 2007, that DEST was determined to press on with its Accessibility Framework, outside the RQF (the Australian research assessment exercise), and the ‘Research Impact’ part of the RQF could accommodate publications that did not fit the paper journal mould. The adoption of ‘mandates’ seems to be accepted, though there is still some reluctance to take the final steps, seeming to prefer to let others do it first. Australia may not be in the vanguard of open access changes (it produces 2–3% of the world’s research), but it is monitoring them closely and it seems it will follow as soon as it is expedient to do so. See: www.humanities.org.au/Events/NSCF/NSCF2007/NSCF2007.htm

Whilst the national characteristics of the global research effort are important, it should not hide the fact that there is an equal if not more important difference to be found in the research effort by subject or discipline. This will be explored in the next chapter.

Chapter 10

Research Disciplines

The main stance adopted in this book is that scholars and researchers from separate disciplines have different requirements and needs for information. It would be wrong to suggest that solutions for chemists would also satisfy physicists, or even solutions for some would have equal relevance for specialisms within the same discipline. Each sub-discipline and research area operates within a different ‘culture’, tradition and evolving mindset for its communication of research results. As such, each individual seeks different ways of meeting their information needs and there is no generically acceptable model. There is no single, simplistic pattern that is currently discernible.

10.1 Sources of Funds for Research Fundamentally, research publications are a creature of overall R&D funding. The sources of funding for research can be various with both private and public sectors sharing the burden of financing a country’s research effort. Taking the UK as an example, the research funders spend some £ 17 billion per annum on R&D. It comes from: Table 10.1 Expenditure on R&D in UK by source of funds 2000 Compiled by Office of Science and Technology

Funding Source:

Amount (in £ mil)

% of total

Government Departments Research Councils HEFC Higher Education

£ 2,534 £ 1,259 £ 1,276 £ 150

14 % 7% 7% 1%

Industry Not-for-profit

£ 8,648 £ 815

49 % 5%

Overseas organisations

£ 2,854

16 %

£ 17,543

100 %

TOTAL As a percentage of UK’s GDP

1.83 %

142 Research Disciplines

All the research funding agencies are more of less interested in ‘outcomes’ and outputs. The UK Treasury has woken up to the need for such funding to be both sustainable and generate impact for society. These funds provide support for some 200,000 researchers within the UK. Within the UK university sector alone there are 100,000 researchers. These are the main agents of research, and have a mission to create and disseminate new the information and knowledge. Their scientific ethic, to see that their peer groups community benefit from the dissemination of their research results, is a powerful and traditional driver to communicate in the best possible manner. An important influence on the way electronic publishing develops in the biomedical sciences worldwide is determined by the National Institutes of Health based in Bethesda, MD, USA. The NIH is the world’s largest funder of scientific research (not counting classified military research). Its budget last year, $ 28 billion, was larger than the gross domestic product of 142 nations. Its fund allocation is more than five times larger than all seven of the Research Councils in the UK combined. NIH-funded research results in 65,000 peer-reviewed articles every year or 178 every day. The agenda which NIH has for publication of the research results would, on the basis of sheer scale, have a significant impact on the industry structure. More broadly, the OECD lists within their Outlook 2006 some US$ 770 billion as being spent on research in their member countries in 2005, of which US$ 265 billion came from the public sector. These amounts dwarf whatever the publishers and libraries contribute in disseminating the resulting knowledge. However, most of the funds go on the research process itself. Only 1–2 % is channelled into the infrastructure that sustains the publication of research results.

10.2 Research Trends Though the overall funding for R&D is a significant proportion of UK’s gross national product – over 1.8 % – funding for individual research activities ebbs and flows according to priorities which prevail at the time within government advisors, commercial strategy departments, individual entrepreneurship and society at large. Also, progress in science is neither regular nor smooth. Rather it can be typified by being a large and expanding globe with an irregular circumference – at the circumference nodules of intense activity distort the shape of the globe as these areas of intensive effort coalesce for a period, creating spikes of activity. Because of the nature of basic science research, with its requirement for investment in expensive equipment, the hard sciences (physics, chemistry, astronomy) and the biomedical sciences have tended to receive a greater share of fund allocations compared with the social sciences and humanities. Within the UK some 85 % of R&D funding by the government and research councils is in the STM areas, with the softer sciences receiving what is left. Applied research and development has a different information requirement one that is specific and targeted at achieving commercial return. It requires access to patents, standards and financial information system as well as the pure STM research books and journals.

The changing R&D process in large corporations

143

10.3 The changing R&D process in large corporations Research and development normally refers to future-oriented, longer-term activities in science or technology, using similar techniques to scientific research without predetermined outcomes and with broad forecasts of commercial yield. In general, R&D activities are conducted by companies, universities and state agencies, but particularly companies. As the earlier figures showed, in the case of the UK over 50 % of the R&D funding comes from the private sector, and a proportion of the 15 % which comes from overseas is also a private sector investment. Electronic publishing therefore needs to take account of, and is impacted by, the trends and attitudes in the commercial world. In the USA, a typical ratio of research and development for an industrial company is about 3.5 % of revenues. A high technology company such as a computer manufacturer might spend 7 %. Although Allergan (a biotech company) tops the spending table with 43.4 % investment, anything over 15 % is remarkable and usually gains a reputation for being a high technology company. Companies in this category include pharmaceutical companies such as Merck & Co. (14.1 %) or Novartis (15.1 %), and engineering companies such as Ericsson (24.9 %). Generally such companies prosper only in markets whose customers have urgent needs, such as in medicine, scientific instruments, safety-critical mechanisms (aircraft) or high technology military armaments. The extent of the needs justify the high risk of failure and consequently high gross margins from 60 % to 90 % of revenues. That is, gross profits will be as much as 90 % of the sales cost, with manufacturing costing only 10 % of the product price, because so many individual projects yield no exploitable product. 10.3.1 Social Collaboration Web 2.0 is now having an impact on the way research is being performed in some select corporate organisations. In a recent book describing the new ‘wikinomics’ business models, the authors Tapscott and Williams contend that the conditions are emerging for ‘the perfect storm’ in corporate R&D (in their book Wikinomics – How Mass Collaboration Changes Everything). The interactions which occur between new platforms for social collaboration; the new generation of those who are accustomed to collaborating; a new global economy which supports new forms of economic cooperation; and a new business model which aligns itself more to the world of the Internet than to the book – all these have an impact on the way research corporations conduct their R&D. The days when organisations such as IBM, Linux, Motorola, HP and even Procter and Gamble conducted their own R&D efforts within their own labs are disappearing according to Tapscott and Williams. They now open up their R&D efforts to the professional community at large. A collaborative approach has developed, with these companies exposing their software programmes and research results for all to use and, in so doing, improving them at a fraction of the cost it would take to do so in-house. It also enables speedier and more innovative developments of the programmes as the power of the community exceeds the power of a few dedicated in-house researchers. Google has been a classic adopter of this approach, offering its APIs for other organisations to apply to other datasets and create new ‘mash-ups’.

144 Research Disciplines

Open and free innovative services such as InnoCentive enable forward-looking companies to expose their product development problems to the 100,000 scientists around the world who participate in solving tough R&D problems, not necessarily from any mercenary inducements (although a fee may be received), but because it is part of an open social network where benefits are to be found in solving challenges rather than making money. ‘Ideagoras’ have emerged following the systematic tapping into the global abilities of highly skilled talent, offering many times more productive outcomes than are available within one organisation alone. Commentators such as Surowiecki put it down to the ‘wisdom of the crowd’. While the old Web was about web sites, clicks, ‘stickiness’ and ‘eyeballs’, the new Web economics – a mere five years on – is about communities, participation and peering. It is the latter which is frightening some publishers. Without control over peer review, without making money out of taking control of the IPR of publications, they are as nothing according to the authors of Wikinomics. It is the exercise of such control which is anathema to social networking and wikinomics. Most technologists agree that DRM is a lost cause (due to hacker innovativeness) as well as being bad for business. Some publishers do not accept this – this is why Google, Yahoo and YouTube are driving the industry forward. New business models to keep publishers in the loop may need to be fleshed out. Meanwhile, corporate R&D is changing from a proprietary, closed activity, to an open research process embracing the world’s scholarly community. This is particularly the case in the pharmaceutical industry – other areas have been slower to adapt. This is another trend which has its impact on EP – it provides support for a wider, collaborative approach to R&D which in turn has repercussions on the communication and dissemination of research results.

10.4 Behavioural Trends Professors Donald King and Carol Tenopir have undertaken many studies over the years to assess the differences among the disciplines primarily in terms of how they adapted to published material in a print-derivative form. It is useful to review some of the King/Tenopir’s material to see how these could stand up in an electronic publishing era. It is contended that some Changes will occur quickly – the adoption of e-journals was an example. Others changes are less in evidence – for example there is little support for or adoption of new forms of editorial review. But there are five main areas in which Change has manifested itself according to Tenopir. 1. The amount of reading is going up but the time spent on reading an article is going down. This is based on self-reported data going back to 1977. The average number of articles read has gone from 150 in 1977 to 271 in 2006. The average time reading an article has gone down from 48 minutes in 1977 to 34 minutes in 2004–6. Combining the two shows that the overall amount of time spent reading articles has gone up over recent years. 2. Repurposing of information is increasing – there is a rise in granularity. Using the critical incidence technique whereby the last use made of the information sys-

Behavioural Trends 145

tem by a user was analysed, Tenopir found that 50 % of the usage was for research purposes and 20 % for teaching, though the actual proportions by individual varied by responsibility and age. Different purposes have different access routes. I. Current Awareness (or ‘browsing’) needs are met by accessing the full journal or issue. The level of granularity is the journal. II. Search needs are met at the individual article level. This is both for new research as well as for writing. The level of granularity here is the article. II. Specific item needs can be met from a part of an article, a table, an image. The level of granularity here is even finer, at the sub article level. 3. There is greater reliance placed on relevant aids. Help is wanted to get the user to the best possible source for the required information. Abstracts are increasingly important, as demonstrated by work Tenopir did for US paediatrics society and CIBER has shown in its researches. 33 % of readings are just of the abstract (CIBER) and one third of paediatricians rely only on the abstract – 14,700 fulltext articles were accessed, and 7,200 chose to use just the abstract. 4. There is wider readership from a greater variety of sources and types. In 1977 the average number of journals consulted was 13 titles. By 1995 this had grown to 18 journals and by 2003 it has become 23 journals. Also there has been a greater amount of multidisciplinary readings. Whereas browsing from a wide collection of material has declined over the years to 58 %, specific searching for information has grown to 28 %. Citation linking is used but not as much as was expected. 5. There is more readings from older material. The split between reading articles up to 2 years old, compared with articles older than 2 years, is 50:50. Here it is the stage the user is in their career, rather than just age itself, which is a key determinant. Older faculty (over 36 years) spend more of their time reading printed material and less using screen-based systems. Browsing is even more important than searching in looking at older material. Overall there is a need to accommodate different reading patterns. As indicated, these patterns are more often related to the stage in the person’s career development rather than age. The other main point which Tenopir notes is that the current reading pattern growth is unsustainable. This puts pressure on the information providers to develop value-added features in their services which addresses the time constraints which users have. However, whilst the above factors reflect on the changing nature of research behaviour, one feature which needs to be added to the equation is the difference which exists between subject areas. This runs as a theme throughout this book – that scientists have huge variations in the way they adapt to the information supply within their respective disciplines. For example, in terms of subject spread of articles read per annum, this shows great variation. The following represents Tenopir’s findings on the articles read.

146 Research Disciplines

Table 10.2 Number of articles read per annum (Source: Carol Tenopir data)

Subject Area Medical Social Science Science Engineering Humanities

Faculty 434 239 369 251 136

Students 242 177 173 149 124

As shown above there is a huge difference between the reading habits (or needs) of the medical research sector as compared with engineers, and particularly the humanities. Similar data from professor David Nicholas (University College London, CIBER) illustrates the dispersion (or ‘scatter’) among main subject areas based on his questionnaire surveys. Even so, in comparing the CIBER and the Tennessee results it also shows that there is a difference between the results from the two sets of samples. In the instance of CIBER, the humanities have increased their share of articles read at the expense of the medical respondents. Professor Tenopir also gives data on the average time spent reading an article by main subject disciplines. Again, this varies significantly by subject areas, with engineers spending nearly twice as much time reading an individual article than the medical staff, but reading far fewer articles in total. Table 10.3 Time spent reading an article (Source: Carol Tenopir data)

Subject Area Medicine Science Engineers Social Scientists Humanities

Time spent on reading an article 24 minutes 35 minutes 43 minutes 36 minutes 37 minutes

Table 10.4 Proportion of articles read by respondents (Source: Tenopir and Nicholas)

Subject Area Medical Science Engineering Social Science Humanities

Proportion (University College London) 34 % 33 % 7.5 % 20.6 % 4.7 %

Proportion (Univ Tennessee) 19 % 27 % 10 % 23 % 21 %

There is now data also available on downloads of articles. Tenopir provides data on four Ohio universities, part of the OhioLink consortia, which provided the number of downloads according to staff and students. Professor David Nicholas has also analysed article decay – the extent to which the reading of an article falls off with age. He has identified that in the digital age

Specific Disciplines 147

Table 10.5 Article Downloads by subject area (Source: Tenopir)

Subject Area Medicine Science Engineering Social Sciences Humanities

Staff Downloads 35 72 -

Student Downloads 752 1,137 220 134 -

it is easier to get access to older material and this corresponds to a greater use being made of such material. If a search engine is used, the older material is four times more likely to be accessed than had been the case in pre-search engine days. With printed publications there is a substantial drop in usage after the first year or two of publication, indicating an enormous ‘long tail’. For downloads of articles, however, the decay curve is more gentle and older material stands a higher chance of being read. This differs by subject area. With social sciences, for example, there is a less pronounced dip in usage over time. These results have been corroborated by a study commissioned by the British Library. The BL study about the drivers for archive digitisation has identified some of the complexities involved while also highlighting a range of the opportunities that digitisation creates. Journal Backfiles in Scientific Publishing describes how digitisation can add value to existing deals, generate increased interest in – and links to – backfile articles, and can also help complete existing holdings of customers. The paper points out that about a quarter of articles read by scientists are older than five years and, further, that having such content available online in a searchable format is likely to increase use substantially. Overall the study stresses that accessibility is the main issue. Analysis of usage statistics of STM publishers’ platforms has revealed that about 20–25 % of the downloaded articles are at least five years old,” reports Jan Willem Wijnen in the study. “Surprisingly, the percentage of old article downloads is higher in biomedical areas than in humanities. Scientists and researchers are using substantial amounts of older literature. Making these articles accessible via a nearby computer instead of a distant library would certainly be attractive for them. It is very likely that older articles will be used even more extensively if additional online functionality is added to these articles, such as links.” There is clearly no generic approach which can be adopted in meeting the scientists’ information needs.

10.5 Specific Disciplines 10.5.1 Physics and Mathematics In the case of Physics there has been a long tradition to disseminate the results very early in the research cycle. This has meant sending preprints of any article to all researchers known to be potentially interested in the results. This by-passed the formal publication process in terms of communications. In 1991 one of the

148 Research Disciplines

physicists recognised that the electronic version of these preprints could be stored on a computer. He wrote the necessary ingest programmes, obtained funds from the National Science Foundation to upgrade the computer system, and enabled all high energy physicists to search for and access all articles held in what has become Paul Ginsberg’s ArXiv database. It is free, set up and run by and for the physics community. There has been speculation on whether this has resulted in a decline in sales of research journals in relevant physics areas. The open access advocates point to the fact that the main physics association journals (from the American Institute of Physics and the UK Institute of Physics Publishing) have not suffered from the affects of ArXiv. Subscription declines were allegedly not that different from journals in other areas. Much has been made of figures presented by Key Perspectives Ltd, a UK-based market research company, which used data provided by Institute of Physics to reach this conclusion. However, the Institute of Physics is not necessarily convinced that there is no relationship between self archiving by authors into open access repositories and subscription levels. The jury is still out. Their claim is that there may be many other variables at play and without a detailed audit of cancellation decisions by individual librarians it is premature to make claims that open access and subscription sales can live in harmony. 10.5.2 Astronomy By the same token, in astronomy there is also a strong reliance on preprints to inform the community. The following table shows that in the early stages, people use the preprints to gain information, but once the article is formally refereed, edited and published, it is the latter which takes over as the information medium of choice.

Figure 10.1 Usage of preprints versus Articles by Astronomers, 2005

Specific Disciplines 149

10.5.3 BioSciences and Medicine United Kingdom There are some 44,000 biomedics in academia (research and teaching) in the UK. The National Health Service employs a further 33,000 consultants – mainly clinical – and they produce 15 % of all UK originated papers. The total R&D staff employment of all pharmaceutical companies is 27,000 in the UK. They publish 800 papers pa, though this could be (potentially) some 2,700 but in many cases corporate research results are protected by in-house commercial confidentiality. UK-based biomedics overall are expected to publish 35,600 articles in ISI indexed journals in 2007, though the number in all titles could be 64,000 if publication in the long tail titles are taken into account. The annual output of UK biomedical research articles has grown by 2.69 % pa (during the period 1989–2000). The major areas were biochemistry, neurosciences and surgery. The top 10 fields produce 40 % of all UK-produced research articles. The current state of biomedical publishing is in as much flux as any other discipline. Thirteen publishers are responsible for 66.9 % of all UK biomedical articles (2006). However, 2,883 biomedical articles are published by 247 publishers in the UK (2006). There is the ‘long tail’ of biomedical funding agencies in the UK, and most of them have embraced an open access policy. These together should be linked to 13,000 papers in 2007 (or 35 % of the total). UK expenditure on biomedical research in 2002/3 was £ 2,507 million. The main funding agencies distributing these funds were AMRC charities which includes Wellcome Trust (27 %), also the National Health Service (18 %), the Medical Research Council (14 %), HEFCE (15 %), the pharmaceutical industry (13 %), BBSRC (5 %) and the Department of Health (2 %). The Medical Research Council is the largest single funding agency, but its annual growth has been a mere 1.15 % (1996– 2005), whereas the aggregated pharmaceutical industry grew at 4.12 % and the AMRC by 3.57 %. Based on Wellcome’s Research Output Database (ROD) 1989– 2000, the major funders of biomedical research account for 53 % of the total amount spent which resulted in biomedical research papers, unfunded papers account for 36 % of total spend and other funders for 11 %. Of the main funders 14 of the 26 have a clear open access policy, and six have one under development. Most of the main funders have a ‘deposit required’ mandate. One third of papers published carry no acknowledgement of the relevant funding source. Furthermore, 58 % of externally funded UK papers in biomedicine carry acknowledgements to more than one funding agency – which reflects the growing amount of collaboration in biomedical research. (27 % cite two funding agencies; 15 % cite three, 8 % cite four, 4 % give five and 4 % six or more). Note these proportions may have changed as the Research Output Database (ROD), on which the figures are based, only existed for the period 1989–2000. The newly created UKPMC, initially a mirror site of PubMed Central fulltext database at the National Institutes of Health in the US, is supported by the key UK-based biomedical funders. Less than 5 % of researchers say they actually use uncorrected article proofs or author’s submitted manuscripts, and concerns over versions seem to play a significant part in the choices of discovery method that researchers use, hence

150 Research Disciplines

their reported preference for formal tools such as Web of Science over Google. Self-reported CIBER data suggests that the top three methods used to find articles are (in order of preference): (1) Following up references (2) Web of Science (3) “other search engines” such as PubMed Way at the bottom of twelve methods identified came the physical library. As of 2004, 26 % of researchers had made their own research material available on an institutional web site (not necessarily an IR). Of those, only 7 % had made project data sets available in this medium. 10.5.4 General Considerations for Biomedicine Faced with the proposition (as put forward in the Elsevier study 2006): “In the electronic age the publisher adds little value”, 25 % agreed, 19 % sat on the fence, and 56 % disagreed. This is hardly a ringing endorsement of the value perceived of the current publishing system, but stands in contrast with those surveys which suggest that authors are willing to remain with their conventional publishing methods. When asked who should bear the costs of publishing, heaviest weighting went to research funders, followed by commercial sponsors, libraries, central Government, employers, readers and (unsurprisingly last) authors themselves. The CIBER data used came from Blackwell (336 journals), OhioLINK (1,952 journals), ScienceDirect (316 journals) and Nucleic Acids Research (1 journal from Oxford University Press). The data covers user source, type of content viewed, characteristics of article viewed and navigational routes. From this collection of studies and data it was evident that about 1/3 of users returned to the same online site within the same day (36 %) – in US it was 25 %. Over a longer period (6 visits over 17 month period) the return rates were high for biochemistry (74 %) and pharmacology (59 %). Length of time viewing a page varied by subject area. It was longer for neuroscience and pharmacology journals (1/3 pages viewed for longer than 1 minute) whereas less true for biochemistry. Biochemistry saw less views made of table of contents (ToCs). Biomedicine saw more usage of abstracts (25 % compared with 10 % in other areas). Neuroscientists preferred HTML views (42 %) whereas with biochemists this was only 24 %. Both were least likely to use abstract views. The US had slight preference for PDF views (50 % versus 40 % in UK). In terms of characteristics of articles, biochemists are much more interested in recent publications than other subjects such as immunology. Overall, UK biomedics were less likely to view current material (up to 1 year old) than their US equivalents −27 % compared with 60 %. The ratio switches in the 1–3 year age period. Science Direct data provided CIBER with the opportunity to identify prepublication articles – immunologists seem the keenest for earliest information. Biochemists showed the least interest. Those under 36 years also show a preference for earliest exposure. In general half of views were to journals which were in the same subject area as the searcher. Only pharmacology had a different pattern with one in four titles being viewed being in their area of expertise. Biochemists seemed most related to agriculture in their search preferences. In terms of searching procedures neuroscientists, immunologists and medical scientists relied on journal home

Case Study 151

pages (80 %). Biochemists were the ones which did the most searching (18 %) and achieved the best hit results. Neuroscientists were the most likely to come through a gateway service (69 %), whereas only 36 % of biochemists use this indirect search route. UK users more likely to use ISI Web of Science (13 %) compared with 1 % in the US – a feature of national licensing procedures for UK Higher Education.

10.6 Case Study 10.6.1 Case Study – Information Search (Biology) At a conference organised by the International Council for Scientific and Technical information (ICSTI) and held at the British Library in January 2007, Dr Sarah Coulthurst, from Cambridge University, gave a presentation which described a day in the information-seeking life of a biological scientist. According to Dr Coulthurst, the information requirements of a researcher vary from day to day depending on personal experiences and preferences, the field of study, types of techniques used, career stage and project cycle. Despite this individuality, there are some common themes and approaches. In terms of accessing information, the biological scientist seeks two categories: targeted/specific and speculative. The most important method for locating information is citation database searches (PubMed and Web of Science) and search engines (Google Scholar). Some drawbacks to keyword searching are when two or more things with the same name are searched for, when common author surnames occur, where there is lack of specificity regarding search terms, and multiple search item names are used. Other search methods include manual examination of tables of contents (TOCs) of key journals, automatic updates (electronic TOC alerts, PubCrawler), traditional citations in related articles, research highlights in ’news and views’, articles in other journals, and word of mouth (at conferences, seminars and journal clubs). Primary source information acquisition via international peer-reviewed journal articles is increasingly important in certain areas and there is growing interest in supplementary journal article information (e.g. moving images or detailed raw data) that may not be appropriate for inclusion in the main article but is often available from the journal or at the author’s website. Some concerns with supplementary information include poor formatting and a tendency among high-profile journals to relegate key information to the supplementary section due to space constraints. Publicly available scientific databases (e.g. genome sequences) are a key source of data that is appropriate for inclusion in journal articles. However, secure depositions of data and proper citation to articles in databases are key to quality maintenance. When a particular dataset entry is associated with a quality article, Dr Coulthurst knew that expert peers have examined the data and this serves as a form of validation. In terms of journal selection, the following criteria affect the choice made: the journal’s area of interest, reputation/profile, impact factors/citation indices, funding agency requirements, how much the journal charges for open access, page charges, style and layout, availability and previous experience. Coulthurst’s perception of the publishing experience as positive or negative is based on the clarity of instructions to authors, the extent to which contributors are kept informed of

152 Research Disciplines

progress, the speed and fairness of refereeing and editorial decisions, the accuracy of the final article and the quality of communication between author and journal. Journals and journal websites now offer a broad range of services that include news and opinion articles, debate fora, job boards and funding opportunity announcements, conference and course announcements, technical updates and educational tools. All of these raise the profile of the journal within the scientific community. In conclusion, peer review in high-quality international scientific journals is fundamental in acquiring and disseminating trustworthy and high-quality information. Multiple routes exist to identify papers of interest, and information acquisition varies across people and over time. Articles are usually obtained electronically and the occurrence of accompanying supplementary information is increasing. Proper interaction of journals with well-maintained public databases is crucial for ensuring the high standards of both. Multiple factors are involved in the selection of a journal for publication, and good communication with the journal is key to a positive experience.

10.7 Arts and Humanities Traditionally, arts and humanities researchers have placed greater emphasis on monographs, followed by edited collections, then edited textbooks, only then journal articles and finally other printed output. There is no clear hierarchy of esteem in the humanities – this is easier to establish in some subject areas than others. The tendency is to consider all forms of output in a personal evaluation and peer judgment is crucial. Impact and quality is not often seen to be the same, but for funding by government agencies they do tend to be conflated. There is a need for proxy metrics to support, not replace, existing quantitative data. Why is it hard to rank journals by quality in the humanities? Older work is often cited more heavily in the humanities, with the most cited authors in this area being Marx, Lenin, Aristotle and the Bible. There is a diversity of input, with multimedia having a great impact. Not all the important journals are included in the ISI database – in philosophy it has been estimated to be 52 %. The mode of communication and discourse in the humanities is therefore difficult to capture through impact factors. An expert group for the UK Arts and Humanities Research Council (AHRC) concluded that outputs could be used for their RAE but not proxies. It is unclear what will happen with humanities, though it has been stated that by 2013 the RAE process will be metrics based for STEM (Science, Technology, Engineering and Medicine), but evaluation will remain for social sciences and humanities. In Europe there is preparation for data to be submitted in the humanities, as part of ERIM and ESF. Fifteen disciplinary areas have been identified and expert panels set up for each one. These will list journals under three category headings – high-ranking, standard and local. Their aim is to strengthen the RAE system of peer review. The Humanities Indicator Project is looking at a range of statistics in the humanities, but not journals. Nor has the AHRC in the UK made any significant attempt in the past to provide a metric for journals – an earlier approach to do this was abandoned.

Other Subjects 153

10.8 Other Subjects 10.8.1 Chemical Biology It has been claimed that there is no such thing as a typical day for the chemical biologist (ALPSP International Serials Seminar, April 2007), as indicated by the email responses received by a speaker at this meeting from 50 of her colleagues in answer to her request for their working habits. They are juggling many tasks and functions – teaching, researching, consulting and domestic roles. On an average day a scientist may start by finding facts and images to enhance a lecture course they are giving. They evaluate textbooks, instruments, techniques, prospective collaborators, graduate students and faculty candidates by ‘Googling’ them. They help their graduate students to process data and determine whether the research results are new. They review the latest scientific literature of relevance to their work. All the while they are popping in and out of meetings. Some 60 % of their information-seeking time for research purposes is spent on soliciting research funds; 30 % on background reading and 10 % at committees and meetings. The overall conclusion was that the habits of the researcher in this area were changing insofar as they sought authoritative sound bytes rather than having to spend a great deal of time reading. Time is of the essence, and in this constrained time period there is a need to process as much information as possible. They need to perform research at a faster pace, and as such need relevant information on demand. There is little time to read articles in full and much of their ‘reading’ is done online. The Blackberry is used extensively for communication and electronic tables of contents are also used, often delivered as RSS (Really Simple Syndication). Green Monkey, grabbing information from a variety of sources, is also being accessed. In many instances, publishers’ material is not appearing on their radar screens as STM output and rarely finds its way to the top of the search engines’ hit lists. Seeking grants takes centre stage in their work processes. This can extend over three months and involve 150–200 hours of work for NIH grants, and this may occur as much as twice a year. In support of this, they want access to manuscripts which are authoritative, portable and personalised. They are increasing their interest in podcasts – they want information on the move. They also say they want to see interviews with authors. Traditionally, chemical biologists are not good at sharing information. 10.8.2 Geosciences A presentation given at the Winter ICSTI meeting in London (January 2007) by a scientist working in the area of geosciences, Dr Adrian Jones, University College London, provided examples from the perspective of geological research in volcanoes. He pointed out that geological research often requires making decisions or creating models from incomplete data. In his research capacity he made regular use of such information tools as ScienceDirect, Web of Knowledge and Web of Science. He also routinely obtains relevant data from the societies, research councils and partner institutes (e.g. Natural History Museum). Geoscientists also call upon large-scale facilities such as those relating to diamonds, synchrotron, ESRF Grenoble, and Bayreuth Geophysics. Dr Jones works with Diamond Trading Company and instrumental companies such as JEOL, Bruker, and Philips. The industrial link

154 Research Disciplines

is important when looking at exploration tools for oil, diamonds, or mineral wealth. Companies have a vested interest in not making those results available yet at the same time they want access to public information. One approach is to propose a time resolution ban, an embargo. Emerging sub-disciplines within the earth sciences make use of supercomputers and have access to datasets (such as those for mineral physics, crystallography and climate). Mainstream areas, such as mineralogy, generate vast amounts of data that are largely not accessible outside of direct ownership. This in turn creates a reliance on journal trawls and older reference books. There are issues surrounding the archiving of mineral data. The tools used for mineralogy have become intermingled with other subjects such as art theft, archaeology, forensics, astrobiology and space. His overriding hope is that more data will be accessible online and available to other people.

10.9 Summary As was mentioned at the beginning of this chapter – no one size fits all as far as subject interests are concerned. There is a huge disparity in how scientists from different subject backgrounds approach the information search and retrieval problem. The more numerate the more online they may seem to have become, but in doing so their information gathering has often moved on from the traditional print-centric approach to one which draws on other media forms – particularly datasets – to meet their need. Biomedicine still remains a stronghold of the print-based system, and its dominance as a research area indicates that the existing publication model still has some life left in it yet.

Phase Three

Drivers for Change

Change can be complex Multiple simultaneous changes create a complex environment. This cocktail currently exists in the electronic publishing sector. There are a number of parallel changes afoot, some more significant as agents for change than others. It would be wrong to focus on any one aspect of Change, or any one Driver, as dictating the future of electronic publishing in the scholarly information sector. It is perhaps less of a cocktail, more of a cauldron, with many different interactions taking place, at different times and to different extents, all resulting in a new paradigm being created which as yet has no clear form or structure. In addition to changes to the ‘traditional’ publishing model newer players, such as Google, Yahoo and Microsoft, have the potential to revolutionise particular segments of the current lengthy information chain. So how does one come to grips with tackling the future within this industry sector? Forecasts are notoriously off the mark, and as the ‘wisdom of the crowd’ suggests we may have been asking the right questions of the wrong audience. Rapid change and innovation is not always mono-featured or benign – as indicated above it can have unfortunate outcomes across a spectrum of situations. The result is that most of us are afraid of innovation, not just challenged by it. Why? Because several strategic considerations arise: • Change creates ambiguity – offers too many options. • Change creates complexity – it does not necessarily match up with our core

competences. • Change creates volatility. • Change attacks the comfort zone of what is known and understood • Change may require a change in our behaviour, we may need to adapt, which

can stressful Some industry-watchers claim that the way new product development is currently carried out in the electronic publishing area is not conducive to success. Studies undertaken by Doblin Inc, a US-based strategic consultancy organisation, show that innovation fails in 95.5 % of instances and this is across all services, geography and industries. So innovation efforts return a pathetic (4 %) success or ‘hit rate’. However, if innovation is implemented properly, 35 % to 70 % then becomes the (more acceptable) success rate – or 9–17 times better than the current norm.

158 Drivers for Change

According to Dobin, innovation comes in a spectrum of forms: Finance:

Process:

Offering:

Delivery:

Business Models

Enabling processes

Product performance

Channel

Networking

Core processes

Product system

Brand

Service

Customer experience

Each of the above boxes in turn requires attention, and different types of attention according to circumstances and the skills available to the organisation. So electronic publishing should avoid concentrating solely on the ‘offering’ boxes (where most of the innovative effort is currently being conducted) and instead consider looking at the extremes of the spectrum – for example at ‘business models’ and ‘customer experience’ where innovation may be more easily applied. This is where some of the most productive returns are made by some of the world’s more innovative companies. In the current situation facing electronic publishing in the scholarly arena, the emergence of new platforms may be the most challenging. Popularist consumer platforms abound – Youtube, Yahoo Groups, Google, Wikipedia, eBay, Zagat, FaceBook etc. – and if one draws up a list of, arguably, the top innovations of all time, most are likely to be platforms. Not only is a significant challenge facing EP (electronic publishing) the creation of platforms (or establishing relationships to them), it is also necessary to understand the role they play in disseminating all forms of publications and derivative data. The most significant task of the publishing industry will be to find the meaning amidst the vast amount of noise in these areas. But platforms alone are not the main source for Change. The following chapters identify some of the other determinants of EP in future.

Financial and Administrative Drivers

Chapter 11

Business Models as Driver for Change

11.1 Opening up the Market? Open Access initiatives are essentially a business model that emphasises the dissemination of publicly funded research results to the widest possible audience. It challenges the traditional foundations of the journal subscription model, and its online licensing derivative. The subscription model creates a limited market, controlled by access rights conferred by the publisher and based on the institution’s willingness or ability to subscribe to or licence a journal package within which the article is located. Take away these commercially-based controls and potentially a much wider audience within society can be reached and a broader community therefore stands to benefit. This is the promise of open access. However, there is no solid evidence that such a wider market actually exists. Suggestions have been made that the extensive knowledge worker sector (those outside the academic/research institutional environment, the so-called ‘disenfranchised’) exceed the academic/research community by a ratio of 6:1, but no-one has been able to substantiate this with hard evidence. Nor are they able to confirm that large potential numbers of disenfranchised users translate into actual buying power. However in the area of medicine and health care has it been shown that there is some interest from the non-institutional user in the latest medical findings. And this has led to a few major schemes in the US and UK to use open access to drive through a change in the scholarly publishing paradigm. But in general there is little solid evidence that open access leads to a major increase in society’s use of research materials. Only in the case of the academic argument put forward by experts such as John Houghton (see later) is there an attempted academic justification for the transition from a subscription or toll access (TA) model to open access (OA). 11.1.1 Open Access Initiatives During the late 1990’s a number of vocal advocates sought international platforms to promulgate their ideas of ‘openess’. These arguments were based on the iniquity of the existing situation – that research was being funded by the public purse and undertaken by researchers whose career progression was heavily dependent on research publication. Yet the written results of the research became owned by publishers who not only set high access prices in some cases but also applied these to the very same institutions that nurtured the research. The 30–40 % gross profit margins earned on such activities and reported by publishers such as Elsevier to their shareholders made libraries and policy makers believe that they were paying

162 Business Models as Driver for Change

both for the research and again for the resultant publications while the publishers were making a hansom, monopolistic profit. Furthermore, there was a restriction on access being made to publicly funded research experiments, and it was felt that society’s welfare would be enhanced, its productivity and efficiency increased, if the world’s research findings were made available to all to access and use. On December 1–2, 2001, the Open Society Institute (OSI) convened a meeting in Budapest, Hungary, of leading proponents of the then disparate open access movement. The goal was to see how far the many current initiatives could assist one another and how OSI could use its resources to help the cause. The OSI was a Soros Foundation funded exercise. This Budapest Open Access Initiative was endorsed by over 300 institutions and 3,600 individuals, and was the first to propose a new way for open scholarly communication. The Budapest Initiative was followed by declarations from Berlin and Bethesda. An open letter originating in the USA and signed by 34,000 scientists in support of a plea for authors to boycott commercial publishers disappeared in the mists of time, instead it spawned a Public Library of Science open access journal programme. Various national research agencies threw in their support, including the Max Planck Institutes in Germany. Individuals such as Jean-Claude Guedon (Montreal) and Stevan Harnad (Southampton) became idols or villains depending on one’s view of the open movement – in either case they became vocal spokespersons for their particular brand of open access. In fact there are several types of ‘openess’. The following table shows some of the main forms.

“OPENESS”

Green Route

Gold Route

Subject Based Repositories

Media Based Repositories

Institutional Repositories

(physics, mathematics, biomedicine, economics)

(ETHOS)

(Southampton, Nottingham)

Bibliographic (PMC)

Data (GenBank, CDS, ADS)

Repositories All institutional material

OA Journals (2,000)

“Hybrid Journals” (OUP, Springer)

Grey Route

Author Websites

Digitised backruns (publishers) HighWire

Repositories Scholarly material

Figure 11.1 Main forms of open access of scholarly material

The two main types of open access are the so-called Gold Route and the Green Route.

Opening up the Market? 163

11.1.2 Open Access Journals – the Gold Route to open access During the early years of this millennium the PR exercise in favour of open access was being won by the well-argued advocates who highlighted how inequitable and restrictive was the traditional ‘toll-based’ publishing system. However, nothing is free, particularly the creation and support of a quality-based refereeing system which sifts the wheat from the chaff in terms of articles produced. This refereeing system is something that provides benefits to authors and the community. Authors could bathe in the credibility conferred by having their article included within the ‘brand’ created by publishers for a particular journal title that had become well known and respected in their area. Therefore it was suggested that, since authors benefited most from the publication/refereeing system, they should pay for the costs of publication. The cost estimates varied. The first commercial venture to offer OA journals – BioMed Central – came up with a $ 500 per article processing fee. But without the ‘brand’ of an established journal or publisher, the build up of interest and manuscripts was slow, and still BMC is languishing in the financial red by most accounts. A different approach has been adopted by a not-for-profit organisation based in the US, the Public Library of Science. It started its life as a lobby group against commercial publishers, led by Dr Harald Varmus who was director of National Library of Medicine, and soon developed into a publisher of a prestigious online journal in biology – Public Library of Science (PLoS) – with the help of a $ 9 million grant from the Gordon and Betty Moore foundation. A title in medicine has followed, as have several more recent niche-type OA journals from PLoS, to reinforce the organisation’s commitment to excellence and the idea of open access journal publishing becoming, ultimately, a self-sustaining business model. The PLoS publication charge per article is $1,500 to reflect the extra costs associated with rejecting a high proportion of submitted works (in comparison with BMC). But these prices are considered insufficient by the professionals: even the Wellcome Trust, which strongly favours open access, seems to give greatest support to prices being set between $ 2,750 and $ 3,500 per article. Publishers took time to respond to these initiatives. They focused on the impracticability of the central theme of the OA movement – that authors would want to pay for publication services? To give up a slice of their research budget to enable their article to be produced through an expensive refereeing system? Even though refereeing was the heart of STM publishing the traditional ‘toll-based’ model gave authors the refereeing service at no additional cost to themselves. The costs were transferred to the library budget instead. Nevertheless, some commercial and learned societies have felt compelled, in the face of the overwhelming public campaign in favour of open access, to experiment with open access publications. Hybrid journals have also emerged, which includes a mix of author paid (OA) and toll-based article access (TA). Oxford University Press and Springer Science & Business Media have been leading lights in this latter process. Gold OA Journals – The Current Situation Nevertheless, Open Access (OA) journals have not been launched with the same energy and commitment with which its proponents had hoped. From Ulrich’s Pe-

164 Business Models as Driver for Change

riodicals Directory the following comparison can be made about all refereed journal starts and OA refereed journal starts. Year

All refereed Journals

OA refereed journals

2000 2001 2002 2003 2004

370 341 287 232 (prov) 222

60 88 78 47 50

But the numbers provided by Ulrichs differ markedly from the numbers provided by the Database of Open Access Journals maintained by the University of Lund in Sweden. The DOAJ figures are over twice the Ulrich numbers and presumably reflect a much broader definition and catchment of openess. Numbers of OA journals (‘gold route’) launched per year (Source DOAJ)

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

19 14 11 27 28 77 112 154 151 155 269 295 281 278 311 289

Today open access in all its various forms (gold and green) accounts for about 2 % of all newly published scholarly articles (of which there are 1.2 to 1.4 million per annum (Elsevier/Mabe) to 2.5 million (Harnad)). The author-pays model accounts for less than 1 %. A survey undertaken by ALPSP looked at the proportion of OA journals which actually charged author fees. This was undertaken in 2006. http://www.alpsp.org/ngen public/article.asp?id=200&did=47&aid=270&st=&oaid=-1)

In this study it emerged that only 48 % of the journals charged an author-side fee. The rest were funded from a variety of sources including charitable subventions, grants, etc. Most of the latter could hardly be considered sustainable in the long run. Meanwhile, the Directory of Open Access Journals (DOAJ) lists over 2,800 titles, and is growing at a rate of almost one title per calendar day. http://www.doaj.org

Opening up the Market? 165

11.1.3 Author self-depositing articles – the Green Route to open access This is a totally different way of implementing the open access agenda and it must be said that there is occasional conflict between the two schools. Whilst Gold and Green supporters are united in the ultimate aim of seeing a free-to-access system for research information, the means to achieve that end differ. And in that difference lies arguments, disagreements, conflict and confusion. Subject-based E-Print services ‘Green’ open access parentage goes back to 1991 when a physicist, Dr Paul Ginsparg, invited colleagues in the self-selected high energy physics community to deposit their electronic pre-publication manuscripts on his personal server at Los Alamos and allow all other physicists, worldwide, to interrogate this so-called ArXiv service for free. There was limited editorial control; that was exercised by the professionals who chose to use ArXiv for their information updates. The compact nature of the subject, the strong group control and the traditional culture of sharing printed preprints in this specialised physics area, meant that the ArXiv service was widely adopted by this community. During the following decade the service outgrew the resources of Ginsparg at LANL and so obtained National Science Fund support, and eventually outgrew Los Alamos in favour of Cornell University. The good news for the traditional publishers was that though the ArXiv content eventually became published in the refereed journals, subscription loss was no less or greater than journals in comparable disciplines where no such subject-based institutional repository existed. There was an apparent complementarity in function between such an informal and formal publication methods. Or so it was claimed. It also appeared that only few disciplines had the compactness that favoured a non-refereeing institutional-based approach. The number of discrete physics subjects in ArXiv increased, and mathematics took on a similar approach. Economics and cognitive disciplines also had strong advocates who created subject-based disciplines out of nothing, but the larger STM communities remained true to the primary research journal publishing model. The strength of the refereeing system encapsulated within the primary journal was seen as inviolate. Institutional-based repositories Much later, the idea of each research institution having its own server with a collection of research material (in whatever format) produced by researchers from within the institution became popular. It meant that each institution that had nurtured the research, in some instances co-financed the research, would be able to bathe in the glory as the results became accessible on the worldwide stage. The institution would become ‘the publisher’ – it gave a new meaning to university presses in everything but name. It enabled the institution to be seen as an eminent research centre in its own right without seeing the results distributed among the journal brands of publishers whose role in the research effort was seen as minimal. This struck a chord and there is much attention being given by funding agencies such as JISC in the UK to establish a national infrastructure to enable all higher education institutes to have their own institutional repository (IR). Other countries are dealing with this in a less centralised way, but even in these countries there is a groundswell of support at the local level to create such servers and having them easily accessible.

166 Business Models as Driver for Change

Organisers of a meeting held in Amsterdam in May 2005 on ‘Making the Strategic Case for Institutional Repositories’, undertook a survey of the current status of institutional repositories (IRs) in thirteen countries. The results were published in a subsequent issue of the D-Lib magazine. Academic institutional Repositories; state of the art in 13 countries – June 2005 Country

Number of IRs

Number of universities

Percentage of universities with an IR

Australia Belgium Canada Denmark Finland France Germany Italy Norway Sweden The Netherlands

37 8 31 6 1 23 103 17 7 25 16

39 15 n.r. 12 21 85 80 77 6 39 13

95 53 50 5 27 100 22 100 64 100

United Kingdom United States of America n.r.: not reported

31 n.r.

144 261

22 -

Average number of documents per IR n.r. 450 500 n.r. n.r. 1000 300 300 n.r. 400 3,000 / 12,500 240

Source: “Academic Institutional Repositories – Deployment status in 13 nations as of mid 2205” by Gerard van Westrienen and Clifford Lynch. D-Lib Magazine September 2005, Vol 11 No. 9. See http://www.dlib.org/dlib/september05/westrienen/09westrienen.html

The estimate shows a spread from around 5 % in a country like Finland, where repositories are just getting started, to almost 100 % deployment in countries like Germany, Norway and the Netherlands, where it is clear that repositories have already achieved some status as part of the common infrastructure across the relevant national higher education sector. It is also evident that there are strong differences per country in terms of records being held. The average number of records per IR in the countries surveyed seems to be typically a few hundred, with the exception of the Netherlands. Here there is an average of 12,500 records per IR. However, it became evident in analysing the data that the count of “records” meant very different things to different nations. In some countries, such as the USA, there is an assumption that the contents of institutional repositories are full source objects such as papers, images or datasets; in other nations, such as the Netherlands, some records in IRs were only metadata (essentially bibliographic entries). From the average number of 12,500 records in an IR in the Netherlands, only around 3,000 also have the full object file available. As objects become complex, with versions, or hierarchical structures, or composite data streams, many legitimate different interpretations are possible. The questionnaire distinguished only a few types of material that would be fairly commonplace in repositories across the participant nations. These included:

Opening up the Market? 167 Coverage of IRs related to type of objects (in % of total objects) Countries

Articles

Theses Books

Primary data

Australia Belgium Canada Denmark Finland France Germany Italy Norway Sweden The Netherlands United Kingdom United States of America

8 33

8 66

83

80 20 70 10 30 20 74

20 40–50 5 90 70 40 16

1

Video Music etc.

Other

0

5

1

Course material

4

1 20

25 5

40 4

Source: “Academic Institutional Repositories – Deployment status in 13 nations as of mid 2205” by Gerard van Westrienen and Clifford Lynch. D-Lib Magazine September 2005, Vol 11 No. 9.

There are therefore strong differences per country in the type of textual records held. In Norway 90 % of the current records are for books and theses, whilst in France it is estimated that 80 % of the current records are for articles. The “other” category for Germany (25 %) is textual proceedings – and the 40 % “other” for the Netherlands are mainly research reports. In contrast to some nations, in Australia 83 % of the records are primary data. The US repositories also hold a significant amount of non-textual content. This suggests that with the exceptions of Australia and the United States, currently the institutional repositories house traditional (print-oriented) scholarly publications such as grey literature, journal articles, books, theses, dissertations, and research reports. From this we can assume that open access issues in scholarly publishing may well be the key drivers of institutional repository deployment in countries other than Australia and USA, rather than the new demands of scholarly communications related to e-science and e-research. According to the report’s authors this may shift over time. For some countries that were able to indicate the disciplinary coverage there were strong differences. Australia and Italy apparently have a focus on the Humanities and Social Sciences while in a country like the UK, almost two thirds of the focus is currently on Natural Science and Engineering. In other countries, such as Sweden and the Netherlands, the distribution among disciplines is more evenly spread. The DRIVER project. DRIVER is an EC-funded project which is the largest initiative of its kind in helping to enhance repository development worldwide. Its main objective is to build a virtual, European scale network of existing institutional repositories using technology that will manage the physically distributed repositories as one large scale virtual content source.

168 Business Models as Driver for Change

DRIVER II, a project funded by the 7th Framework Programme of the European Commission and launched in early 2008, is the continuation of the DRIVER project. Whereas DRIVER concentrated its efforts on infrastructure building for scholarly content repositories, DRIVER-II will extend the geographical coverage step-by-step and will move from a test-bed to a production-quality infrastructure. This infrastructure will produce further innovative services meeting special demands that will be built on top. The infrastructure is complemented with several user services including search, data collection, profiling, and recommendations by the end user. Author Participation rates. Whilst the infrastructure may be in place, authors still need to change their habits and submit their material to these new entities. The choice is to place their research results with the established publishers and see their article benefit from the impact factor assessments which underlie research assessment exercises, promotion, tenureship, etc, or else merely see the potential for wider distribution of their articles from the local server or from a new OA publisher. So far this has been no contest: despite the power of the scientific ethic – when it comes to perceived personal opportunity the established journal publishing system wins hands down. So in terms of participation in IRs by the scientists the indication is that the number, as well as the percentage of total academics, is still very low. The Netherlands estimates about 25 % of the national research output, across a wide range of disciplines, is now going into its institutional repositories. Belgium gives larger estimates – 33 % in humanities and social sciences, 39 % in life sciences, 16 % in natural sciences, and 11 % in engineering. With the exception of a 10 % estimate for natural sciences in Germany and a 15 % estimate for engineering and computer science in the UK, the other estimates (when supplied) are negligible. It is clear that there is confusion, uncertainty and fear about intellectual property issues (not just getting copyright permissions to deposit, but questions about who will use material that has been deposited, how it will be used, and whether it will be appropriately attributed), about impact factors and scholarly credit. There seems to be a myth that material in institutional repositories is of low quality. There are multiples types of content some of which are more mainstream than others, and each has its own level of credibility with the research community. It is also evident that cumbersome and time-consuming submission procedures are a major barrier, and that efforts need to be made to minimise the amount of work faculty must do to submit their work into the institutional repository. The acquisition of content is still the central issue for most institutional repositories. Except perhaps for the United States and Australia, the focus seems to be almost exclusively on faculty publications. Voluntary or Mandatory? A point frequently cited is the lack of mandatory provisions in the policies of institutions or funding organisations to deposit the outcome of academic research into repositories such as IRs. However the establishment of such policies, particularly at an institutional level, continues to be controversial. Related to this is a trend towards greater accountability and evaluation of research (such as the Research Assessment Exercise in the UK) and the competition for funding. To the extent that IRs are directly linked to research funding and research evaluation (at the individual or institutional level) faculty have a very compelling reason to deposit material into them.

Opening up the Market? 169

There are two aspects: (a) The belief is that authors will submit their articles for inclusion within the local repository if it is made easy to do so – if it becomes a simple derivative of the report being submitted for other purposes. (b) However, the real stimulus to an IR publishing model is if it is made mandatory for the author to do so – if criteria are set by the funders to have their published work appear in nominated services, it is likely that the authors will accede to such demands. Particularly if there is an implied stick in that future funds may not be so readily available if the author ignores the request. Between 77–96 % of authors claim they would willingly accept a mandate policy (according to a Key Perspectives Ltd report) that suggests that this is the way IRs would become successful. There is some evidence that the large funding bodies are also willing to commit to a mandatory policy. The US Congress invited the National Institutes of Health to come up with a policy that would mandate the output of all research wholly or partially funded by the NIH to be included in NIH’s central fulltext database (PubMed Central). However, the latest proposal from NIH appears weaker than it might have been: it recommended a compulsory code of deposit but with 12 months (rather than six) as the period within which deposit should be made available. This became part of the US Appropriations Act legislation signed into law at the end of December 2007. The UK Select Committee of Enquiry on scientific publications (June 2004) suggested that the government should enforce a policy of local institutional repositories at all the key HE/FE centres. The government responses were less supportive of this prescriptive route, and proposed instead to allow market forces to determine whether OA journals, IR’s, the traditional journal publishing system, or combinations thereof, should operate in the UK. JISC is helping to create the infrastructure without becoming involved in the issue of mandating where an author makes their online article available. JISC is assisting individual universities to fund their own institutional repository, and as a backstop, to create a Depot which other higher education establishments could use if they were unable or unwilling to create one of their own. Research Councils UK has basically come out in favour of open access with mandates of about six months from date of publication being mandated for its grant recipients. A strong lead has come from the Wellcome Trust. This charity has stuck to a firm policy of insisting that those researchers who benefit from Wellcome’s research budgets (over £ 300 million per annum) should make their articles publicly available within six months. In Australia, Queensland University of Technology also has a mandated policy, but appears now to be taking a more ‘softly softly’ approach. Nevertheless, from a very low base it does look as if the growth in IRs has taken off. Whilst the “tipping point” may not have been fully reached as yet, the following chart does show that there has been growth in the author archives of material on their local IRs.

170 Business Models as Driver for Change Growth of Institutional Archives and Contents 140

300000

120

250000

100

200000

80

150000

60

100000

40

50000

20

0 1990-02

1991-03

1993-02

1994-08

1997-08 1995-02

1999-02

2000-08

2002-02

2003-08

No. of Archives (green)

No. of Records (red)

generated by http://archives.eprints.org/ 350000

0

Month (yyyy-mm) Number of Records

Number of Archives containing these records

11.1.4 Harvesting the open access material In tandem with the above dual approach to open access – the Gold and Green routes – a technical structure has been established to enable consistency in gaining access to the freely available research material. A common protocol – open access initiative, protocol for metadata harvesting or OAI-PMH – was developed. This would enable the header information on such articles to be ‘harvested’. Though minimalist in some descriptive areas, and subject to varying degrees of quality control from the authors, the basis for accessing a large corpus of free articles on demand was ostensibly put in place. The OAI-PMH protocol was derived from the open source movement which has become a significant force within IT This is in principle an ideal approach, enabling a multi-institutional search for particular items. Services such as OIASter have emerged to offer such aggregated search, and GoogleScholar and similar search engines are picking up the references contained within the growing number of IRs. But principle and practice are not always in synch, and there is still some suspicion that that metadata processing and federated searching are not fully sustainable. A distributed approach suffers from the way it is implemented at local level, and whether full interoperability is really achievable. The combination of limited deposit controls, broad range of content, variable quality, quantity and consistency of metadata means that content in IRs will be difficult to use as a replacement for traditional journals any time soon. Usage of OA Evidence does indicate that OA is taking off but from a very low base. According to surveys by CIBER among authors of articles in the ISI database, fewer than 10 %

Opening up the Market? 171

of biomedical authors know ‘quite a lot’ about IRs. However, as described earlier, deposition rates vary by country, by university and by department. Un-mandated deposit is about 10–15 % of the total. There is a very low level of awareness of OA among authors, whilst others have concerns that are not being addressed. OA can be seen as competitive to traditional journal publishing by some, and as complementary by others. Some are worried about the copyright infringement issue. ‘Versioning’ of an article – which particular form of the article at a particular point in the stage from author’s desk to the publisher’s online server system – is also a key issue. Author deposition intentions vary – 9 % would deposit voluntarily – 44 % don’t intend to do so. This resistance to open access deposition is less evident in further education (as opposed to higher education) institutions, and also in developing countries. The young are more willing than the old (only 33 % were against deposition in the under 26’s age group). However, willing authors (to deposit) are more knowledgeable (7.1x) and positive to OA, and have concerns about scholarly communication (16.7x). Those who are willing depositors are more against refereeing and impact factors – otherwise the two groups are similar. 11.1.5 Open Access projects BioMed Central. BioMed Central is a commercial organisation that was founded by Vitek Tracz, who has had a long and distinguished career as an innovator in scholarly publishing. He was an early pioneer in offering a totally new business model. He rejected the pre-packaged sale of journal subscriptions in favour of charging authors for the services that publishers provide for them. It has meant that every article published in a BMC journal – of which there are over 150 titles, mainly in the biomedical area – is paid for by the author. Having paid for the publishing services, the article then becomes freely available for all to see. Dr Matthew Cotterill, the publisher of BioMed Central, has described the growth of BMC and has traced the increase in author-paid article fee from $500 per article to $ 1,300 per article in 2005. He claims that a fee in the range of $ 1,000 to $ 2,000 would seem appropriate for an author-paid submission system (except where high rates of rejection are implemented by the publisher). The challenge facing this business model is to convince authors that they should spend part of the their research budget, or their own personal funds, on submitting their material to such Open Access (OA) journals when the traditional journal already has a ‘brand’, a citation status and free submission as far as the author is concerned. The struggle to remain commercially viable based purely on reliance on author fees has been commented on by other traditional publishers for many years. It has resulted in BMC spreading the load somewhat by soliciting ‘membership dues’ from universities and university libraries. This membership, in return for an annual membership fee (a subscription?), will enable researchers from that institution to get their articles published at a discounted rate. Some large research centres in the USA have questioned the value they get from such membership and cancelled their subscription (Yale University being a prime example in the summer of 2007). Currently 100 institutions are members of this ‘club’. Nevertheless, BMC now provides publishing services to some 150 journal titles (60 of which are actually BMC owned), with 15,000 open access articles. 69 % of the

172 Business Models as Driver for Change

articles are paid for by institutional membership, 28 % by author-pays, and 3 % by a new supportive members’ discount. Hindawi Publishing. Ahmed Hindawi established his company in 1997 and by 2007 had created an operation which includes 100 journals publishing over 2,000 articles each year. Annual growth over recent years has been 40–45 %. Some 220 staff mainly based in Cairo, Egypt, are employed by the company. There were three subscription based journals within the Hindawi organisation, but in essence it is a ‘gold route’ open access publisher. So why change? Why give up the steady, secure subscription income in favour of developing into an unproven, more risky author-paid business model? According to Ahmed Hindawi, the subscription model is highly dysfunctional. Open Access, in his opinion, has a lower financial risk and offered a more rapid return on investment. At Hindawi the article processing charges are paid by members, and the membership is largely institutional. The sources for payments are often advertising based and central funds provided by research funding agencies. The article processing charges vary per journal, from $ 560 to $ 840 per article, or $ 40 to $ 130 per page. The average article charge is about $ 800, or significantly less than either BioMedCentral (BMC) or Public Library of Science (PLoS), its two main comparable competitors. The acceptance rate for articles submitted is 40 %. There are 21 journals in the ISI Journal Citation Reports (JCR). Hindawi is now claimed to be the second largest publisher to be included in PubMed Central. It shows how an open access publication model can thrive in a low cost country. SCOAP3 – OA publishing of physics journals. SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics) is a new model that is being considered for possible open access publishing of journals in high energy physics (HEP). In the model, high energy physics funding agencies and libraries seek to re-direct library funding that has been traditionally used for purchasing journal subscriptions, to the SCOAP3 consortium, which would negotiate with publishers to make the electronic versions of their journals free to read. Authors would not be charged to have their articles published. Rather than selling subscriptions, journals would charge for the peer-review service and make the electronic versions of their journals freely available. SCOAP3 would negotiate with the major publishers in the field through a tendering process. The development has resulted in considerable debate on the LibLicense listserv as to the viability of the model and the way it might work. Representatives from the American Physical Society have homed in on the economics of the model and the prospect of reducing the aggregate cost to libraries of HEP journals, referring to the current costs of the 5 target journals involved, i.e.: JOURNAL Phys. Rev. D JHEP Phys. Lett. B Euro. Phys. Jour. C Nucl. Phys. B

$/article 1.69 1.79 10.98 18.71 32.33

$/citation 0.47 0.43 2.68 7.06 6.20

Publisher APS SISSA Elsevier Springer Elsevier

Opening up the Market? 173

APS states that “to raise $ 3.7 M, the US part of the $ 14 M of consortium funding, SCOAP3 . . . we estimate that only about 1/3 of the US subscription revenue for Physical Review D comes from these (US) institutions, so if only they are involved, each must be asked to triple what it now pays, presumably with offsetting savings from other journals. If we are to continue to provide quality peer review, distribution, and archiving of physics research, we must recover our costs. “The current subscription-based funding model, though far from perfect, has provided adequate and stable funding, in harmony with the arXiv and with our generous self-archiving provisions. An obvious concern is that once the journals are freely available, some libraries might divert their now voluntary contributions from SCOAP3 to other more pressing needs, because doing so would bring no immediate consequences. We are gravely concerned about the difficulty of reassembling our subscription model were SCOAP3 to fail.” They went on to indicate that “the funding and sustainability of the SCOAP3 model have yet to be developed and demonstrated. If they can be, then APS would be willing to make Physical Review D freely available on our site.” Ann Okerson, librarian at Yale University Library and LibLicence moderator, reported that a meeting had been held “with a leader of the SCOAP3 initiative”, who had indicated that one of the incentives for libraries to join was that the combined SCOAP3 group of subscribers would be large enough to negotiate with the publishers and to reduce the overall subscription fees by as much as 2/3.” She stated that “This seemed hugely optimistic to me, as none of the publishers listed, whatever their $/article, is making anywhere approaching 2/3 surplus”. However, prof. Hector Rubinstein of Stockholm and Uppsala University, (Chair of the J-Journals Executive Editorial Committee, a group of key high energy physics journals), responded by indicating that “if you look at the (above) table it is no accident that the two non-profit journals, Phys. Rev. D and JHEP, cost about the same, and 20 times less than Nuclear Physics B. How can you say that costs cannot be cut by 2/3?. No one is asking to lower the price of JHEP or Phys. Rev. D., but just to force the commercial companies to be reasonable.” The SCOAP3 website reports that funding agencies and libraries are currently signing Expressions of Interest for the financial backing of the consortium. Representatives from all the European countries undertaking research in particle physics (except from the UK) agreed to support in principle the formation of a consortium to fund OA publication charges in the most important journals. The funding for the transition to OA will come partly from research institutions and partly from libraries through the conversion of subscriptions into OA publication charges. A working party is to discuss the detailed structure of the consortium and the funding required from each country, after which the participating organisations will be invited to turn the support in principle into a funding commitment. The absence of the UK research and library communities from the meeting was noted but the group still hopes that UK organisations will join the consortium when formed. According to Professor Dr Rolf-Dieter Heuer, the newly elected director general of CERN in Geneva (as from January 2009), and reported at the Academic Publishing in Europe (APE) meeting in Berlin in January 2008, there are a number of reasons given why SCOAP3 may work. The community concerned is small and well defined (5,400 active researchers); there is a ‘reprint exchange culture’

174 Business Models as Driver for Change

which goes back decades; there are well established navigational tools (SPIRES and ArXiv) in this area which represents the first port of call for this community and the key research and funding centres are supportive of the SCOAP3 initiative (these include ATLAS, ALICE, LHC, CERN, etc.) The consortium has identified 6–8 publishers which dominate this area, representing 5,000 to 7,000 articles per annum. Euro 10 million per annum would need to be raised through the budget sharing of the member libraries, which currently come from 10 countries. Some 27 % of the targeted market has already committed to supporting SOAP3 and a further 20–30 % will be given soon according to professor Heuer. But this is not the end game. Professor Heuer felt that most scientists in this area wanted to gain access to a broader range of media, including data, conference slides, grey literature, computer code, conference proceedings, as well as the services which will expose the content in a more effective way, such as through text and data mining which break down the silos of publisher content. It was also pointed out that there is a huge latent energy of social collaboration which could be used to bring Web 2.0 into the physics information system. From a study referred to by professor Heuer 43 % of users would be willing to tag material for up to 0.5 hours per week – this would enhance the quality of metadata in particular immeasurably. The new SCOAP3 consortium will also exploit the competition that is now building up between publishers to attract OA authors within hybrid journals. The European Commission apparently views this initiative as a possible new model for scientific publishing, particularly as it firmly consolidates publishers as having a place in the model. The Commission is facing strong lobbying from commercial interests not to proceed with the recommendations in the scientific publishing study published earlier in 2007. Case Study – The US IR scene Despite the hype surrounding open access it appears the US has hardly taken IRs to its heart – by late 2006 there were only 96 IRs in a community which has 2,140 American colleges and universities, and those that exist are hardly stretched in terms of the content they hold. In February 2005, Dr Cliff Lynch claimed there were 40 IRs in the USA. In August 2006, the ARL Spec Report estimated between 37–40 IRs, or nil growth during the year. What therefore is the reality? A study was undertaken at Greenboro University in 2006 to see what the situation was then. Its terms were defined as being institutional (ie, excluding subject-only repositories); would accept multiple types of scholarly output from the faculty; would have database functionality; must contain the digital objects, not links to them; and offer search capabilities (such as through metadata provision). The growth in IRs has been: • • • •

November 1, 2005 March 15, 2006 July 31, 2006 October 31, 2006

– – – –

68 (using two different software platforms). 91 (22 of which were new). 92 (four were lost, five were gained). 96.

Opening up the Market? 175

There has thus been fairly limited growth, and this has slowed down. Between March 2006 and the end of the year there has been stagnation – 5 % growth. As far as Institutional Repositories as operational entities were concerned in the US the main features in 2006 were: • In California and New York mainly state consortia were involved. In these

instances the growth had been in adding additional libraries to the service. • 58 % of the IR sites are in public university sites, 42 % are private universities. • They cater for sites with more than 15,000 students (50 % of the total) whereas

sites that have fewer than 5,000 students represented 35 % of the total. Institutions with more than 40,000 students only accounted for 3 % of the total. • The proportion of total IRs which were ARL member libraries has fallen from 49 % in November 2005 to 44 % of the 96 institutions in November 2006. • 69 % were doctoral granting institutions (Carnegie classification) in both 2005 and 2006. The number of items in each of the IRs has been few. In November 2005, 33 % of the sites (16) had less than 100 items in their IRs. A quarter of these have been shut down since then and it is apparent that sites with few items never take off. Only 2 % of the sites had over 10,000 items (one of which was MIT) and 23 had less than 500 items. All liberal arts colleges that had IRs had less than 500 items. The average number of items held by all IRs was 2,067 items, but if MIT was excluded this average fell to 1,761. In November 2006 the number of sites with less than 100 items was 30 (or 31 % of the total) and only 4 % had over 10,000 items. Consortial IRs are something of a double-edged sword. The members’ schools tend to invest less, and those that chip in have more to lose. Of the 96 schools of a consortium, five had no items after 12 months, nine had less than 20 items and 46 (48 %) had less than 500 items. IRs with the most material were at research intensive sites, with nine of the largest IRs at schools which were in the top 100 in the USA. The average growth of items deposited was 60–100 items per annum and 50 % of the sites had more than 18 items deposited every six weeks. Though IRs are barely growing in number, the number of existing IRs with a percentage growth in items deposited is growing, with 15–20 % in this category. Of the original 48 IRs, the average number of items held is now 1,100 (with a median of 366). Thirty of these original sites have hardly doubled in size. Of the content included in IRs: • Electronic theses represented a large component of what is deposited and 37 %

of the material was ETD (41.5 % of which are student works). Sixteen of the IRs were responsible for more than 80 % of all ETDs. • 37 % of the material was scholarly communications. ◦ 13 % was peer reviewed (mainly published in journals). ◦ 23 % were working papers. ◦ 1 % was light grey literature.

176 Business Models as Driver for Change

MIT and Caltech between them represented 50 % of the entire scholarly communication collection. Michigan dominated in the area of working papers and technical reports. • Pictures and images represented 13 % of the material. • Non-scholarly texts represented 4.5 % of the material. These were newsletters, promotional material and other items of questionable use. • Historical texts represented 3 % of the collection (primarily the result of local digitisation projects). In looking specifically at the combined resources of the five largest sites (which includes MIT, Ohio and Georgia Tech): • • • • •

53 % of the material was student produced material (including ETDs). 21 % were images. 14 % were half-peer reviewed. 7 % were historical texts. 5 % were non-scholarly materials.

ETDs and dissertations represent the ‘low-hanging fruit for IRs’. 11.1.6 Economics supporting open access John Houghton, professional fellow at the Centre for Strategic Economic Studies at Victoria University, Melbourne, Australia, has produced a number of reports and papers that analyse macro-economic factors and the impact which open access could have on a nation’s economy. In reports such as “The Economic Impact of Enhanced Access to Research Findings” (CSES Working Paper 23, July 2006) he freely admits that the data is not yet robust enough to be conclusive. In his general impression open access will produce a net improvement in society’s welfare. However, he does marshal some figures that – in theory – point in this direction. His argument is based on a number of examples which suggest that existing government funded R&D produces both a private and social rate of return of around 25–50 %, but by applying growth models which incorporate ‘access’ and ‘efficiency’ into the equation, substantial additional net benefits to national economies can be achieved. He further speculated that the improved access and efficiency factors could be attributable to the substitution of a closed subscription model by an open access business model. Using statistical equations, models and formulae, Houghton attempts to convince readers that the benefits that could be obtained by the change to open access could be substantial for all countries, and particularly: • In Germany, its $ 58.7 billion investment in government funded R&D could be

improved to the extent of $ 3 billion in 2003 from a 5 % increase in access and efficiency. • In the United Kingdom, the government’s $ 33.7 billion investment in R&D could be increased by $ 1.7 billion from a 5 % increase in access and efficiency. • In the United States, the government’s $ 312.5 billion could be improved by $ 16 billion as a result of a 5 % increase in access and efficiency.

Opening up the Market? 177

Houghton’s treatise focuses on the way a social return of 50 % on R&D in general, and a further 5 % increase in access and efficiency as a result of open access was arrived at. As an academic exercise in support of open access it has been pioneering though the statistical models and equations do not necessarily relate to the real world. Nevertheless, it is another powerful method used in support of a migration from subscription to open access publishing. However, the analytical concepts used by Houghton et al are being challenged for some of its assumptions and recommendations. It uses some spurious facts and models (which could become folklore if left unchallenged) to make an economic case for open access and mandates. It invalidates the current publishing industry system and proposes a more central approach and governmental control. 11.1.7 Impact of OA on Publishers In a study undertaken in 2006 by SIS (Scholarly Information Strategies, Oxford) for the Publishing Research Consortia (PRC), the issue of self archiving and journal subscriptions – coexistence or competition – has been investigated. The fear which stimulated this research project from the publishers’ side was that ‘good enough’ articles appearing for free on IR sites would result in journal cancellations. The issue was not about paid-for OA journals – just the material that was self-archived by authors on their institutional websites. This posed a methodological challenge as there is a need to assess the value of the IR postings. As such, the SIS team used ‘Conjoint Analysis’ as the technique for measuring value. This allows users to determine which features of a product have more or less relevance for them. Using a number of features provides better results than one or a few features only. It allows the model to be recast again and again until an accurate picture of value emerges. Some of the features used include: • • • • • •

Version availability. Percentage of articles available. Reliability of the process. Quality of the content. Cost. Recency

Other features which could have been included, but weren’t in this instance, were the amount of activity and the importance to the collection. Each user was asked to select which feature they preferred most, and which they preferred least and the list of preferences was rotated eight times. In total, 424 responses were received to the questionnaire. North America and Europe’s responses each amounted to 172 and Asian respondents amounted to a further 26. Of these responses, 59 % came from academic libraries, 15 % from other academic and 5 % from corporate libraries. In terms of subject coverage, 48 % were multidisciplinary, 15 % were each from science and medical libraries, and 10 % from social sciences. The results can be illustrated on a six-point star configuration. From the returns, quality accounts for 24 % of the importance; cost 19 %; recency (or currency) for

178 Business Models as Driver for Change

18 % etc. It also appeared that there was very little difference between version options (except a poorer showing for raw author manuscripts). A simulator can be created from the results. 59 % of the respondents favour free access to published material, whereas 41 % prefer the current paid system. There are regional differences – Asian libraries are more concerned with price, whereas Australians are more interested in quality. Some of the other results are: • • • • •

There is a challenge to traditional publishing posed by self-archiving (81 %). The content available on IRs is considered reliable. 40 % believe the library is wasting money on subscriptions. Self archiving will have an impact on the system. It will not only impact low-quality journals.

An extended embargo on the journal is the biggest factor protecting the journal subscription. The quality of peer review was not considered as a feature to be measured. What emerged was that there is a scale issue – once open access reaches a significant amount of material, the challenge to the traditional publishing sector becomes severe. Whilst it remains peripheral, the two business models coexist – but for how much longer? 11.1.8 Trends favouring Open Access Nevertheless, the emerging welter of conflicting practical and operational trends might not give comfort to either friends or foes of OA, or to anyone trying to forecast the future. However, it does indicate that we are in a period of dynamic flux. There are a number of factors that might explain why OA is not moving faster or slower than it in fact is. • First there are the many trends created by OA proponents themselves: the grow-

ing number of OA repositories, OA journals, OA policies at universities, OA policies at public and private funding agencies, and public endorsements of OA from notable researchers and university presidents and provosts. Funding agencies are now considering OA policies in part because of their intrinsic advantages (for increasing return on investment by increasing the visibility, utility, and impact of research) and in part because other funding agencies have already adopted them. • Secondly, although knowledge of OA among working researchers is still low, surveys show an increasing rate of deposits in OA repositories and submissions to OA journals. The absolute numbers may still be low, but the trajectories are nevertheless up. More scholars are posting their articles online even if they do not have their publisher’s permission. • Thirdly, subscription prices are still rising faster than inflation after more than three decades. Rapidly rising prices undermine the sustainability of the subscription model. They undermine publisher arguments that all who need access can get access. • Fourthly, the cost of facilitating peer review is coming down as journal management software improves, especially the free and open source packages such as DPubS, E-Journal, ePublishing Toolkit, GAPworks, HyperJournal, OpenACS,

Opening up the Market? 179

•

•

•

•

•

•

•

•

SOPS, TOPAZ, and Open Journal Systems. This reduces the cost of publishing a peer-reviewed journal, improves the financial stability of peer-reviewed OA journals, and multiplies the number of business models that can support them. Fifthly, more publishers are launching hybrid OA journals, which will make any of their articles OA if an author or author-sponsor pays a publication fee. Even with the current low uptake they will increase the volume of OA literature, (slowly) spread the OA theme to more authors and readers, and (slowly) give publishers first-hand experience with the economics of one kind of OA publishing. More journals are willing to let authors retain key rights, especially the right of postprint archiving, and more are willing to negotiate the terms of their standard copyright transfer agreement. More authors are willing to ask to retain key rights and more institutions are willing to help them. More organisations are drafting “author addenda” (contract modifications to let authors retain key rights), and more universities are encouraging their faculty to use them. There are now major addenda from SPARC, Science Commons, OhioLINK, the Committee on Institutional Cooperation, and a handful of individual universities More and more subscription-based journals are dropping their print editions and becoming online-only. A Wiley-Blackwell representative predicted that 50 % of scholarly journals will become online-only within the next 10 years. As high-quality, high-prestige journals make this transition, scholars who still associate quality and prestige with print will start to unlearn the association. More journals, both open access and subscription-based, encourage OA to the data underlying published articles. Major publisher associations such as ALPSP and STM, which lobby against national OA policies for text, encourage OA for data. Even when these policies do not cover peer-reviewed articles, they accelerate research, demonstrate the benefits of unrestricted sharing, and build expectations and momentum for OA in other categories. More journals (OA and subscription-based) are integrating text and data with links between text and data files, tools to go beyond viewing to querying data, dynamic charts and tables to support other forms of analyses. Thomson Scientific is selecting more OA journals for Impact Factors, and more OA journals are rising to the top of citation impact in their respective fields. For scholars and institutions using Impact Factors as crude metrics of quality, this trend legitimates OA journals by showing that they can be as good as any others. There are other gains as well. Because OA increases citation impact (studies put the differential at 40–250 %, depending on the field), high-quality OA journals can use citation impact to shorten the time needed to generate prestige and submissions commensurate with their quality New impact measurements are emerging that are more accurate, more inclusive, more timely, and less expensive than Impact Factors. These include Eigenfactor, h-Index, Journal Influence and Paper Influence Index, Mesur, CIBER, Usage Factor, Web Impact Factor, and Y Factor. What most of them have in common is the harnessing of new data on downloads, usage, and citations made possible by OA. In this sense, OA is improving the metrics and the metrics are improving the visibility and evaluation of the literature, especially the OA literature. Download counts are becoming almost as interesting as citation counts. Not only are they being incorporated into impact metrics, but a CIBER study

180 Business Models as Driver for Change

•

•

•

•

•

•

•

•

(September 2005) discovered that senior researchers found them more credible than citations as signs of the usefulness of research. A study produced by Brody, Carr and Harnad from Southampton University (March 2005) found that early download counts predict later citation counts. No one thinks download counts mean the same thing as citation counts, but they’re easier to collect, they correlate with citation counts, and they’re boosted by OA. In turn they boost OA; repository managers have learned that showing authors their download tallies will encourage other authors to deposit their work. Market consolidation is growing, monopoly power is growing, and bargaining power by subscribers is declining. It gives the players representing research rather than for-profit publishing (universities, libraries, funders, and governments) additional incentives to support open access. It also gives the smaller, non-profit publishers, excluded from big deals and competing for limited subscription funds against the large publishers’, reasons to consider the large publishers more threatening than OA and reasons to consider OA a survival strategy. More for-profit companies are offering services that provide OA or add value to OA literature. For example, repository services, search engines, archiving software, journal management software, indexing or citation tracking services, publishing platforms, print preservation. These services create or enhance OA literature, fill the cracks left by other services, create a market for OA add-ons, and show another set of business judgments that OA is on the rise. More mainstream, non-academic search engines such as Google, Yahoo, and Microsoft are indexing institutional repositories and open access journals. This makes open access content easy to find for users unacquainted with more specialised tools. This in turn helps persuade even more publishing scholars that open access increases visibility and retrievability. Whilst open access journals and institutional repositories continue to develop, other vehicles which adopt the open access principles are mushrooming. These include blogs, wikis, ebooks, podcasts, RSS feeds, and P2P networks. In some instances this is driven by a desire to find ways to bypass barriers to communication, collaboration, and sharing. Like cell phones, wifi, and the Internet itself before them, these tools are overcoming the stigma of being trendy and moving from the periphery to the mainstream. Since the rise of peer-reviewed journals in the 17th century, most publicly disseminated works of scholarship have been refereed and distributed by publishers. Letters and lectures were exceptions. Today, the categories of exceptions, the volume of research-reporting they represent, and their integration into the workflow of ordinary research, are all growing. New and effective tools for collaboration are also triggering adoption and excitement. Social tagging, searching by tags, open peer commentary, searching by comments, social networking, community building, recruiting collaborators, facilitating work with established collaborators, following citation trails backwards and forwards, following usage-based “similar to” and “recommended” trails, open APIs, open standards, and ‘mash-ups’ – these have opened up a whole new vista in electronic publishing (EP) A new generation of digital scholars is building on the new collaboration services that build on OA Huge book-scanning projects, particularly those from Google, the Open Content Alliance, Microsoft, The European Library, the Kirtas-Amazon partnership,

Opening up the Market? 181

•

• •

•

and Project Gutenberg, increases the number of print books available in some free-to-read digital form. Also, the price of book scanning is dropping quickly as large organisations see the investment return potentials from large-scale digitisation projects. Evidence is mounting that OA editions increase the net sales of print editions for some kinds of books, including scholarly monographs. This not only enlarges the corpus of OA literature, but chips away at the fear that OA is incompatible with revenue and profit. University presses are exploring this space by creating an imprint dedicated to dual-edition monographs (open access editions and priced/printed editions. There are also now major projects producing OA textbooks More universities and independent non-profits are creating open courseware, OA teaching and learning materials, and other open educational resources (OERs) There is a rising awareness of copyright issues in the general public, and rising support for remedies by governments (legislation) and individuals (Creative Commons licenses and their equivalents).

One potentially promising future for non-OA publishers is to shift from priced access to priced services for adding value to OA literature, such new services could help opponents become proponents 11.1.9 Implications for Authors It is unclear whether authors fully appreciate the slippery road they may be going down when committing their articles to open public use. In its purest form, open access under Creative Commons 3.0, the reader will be allowed to take the information published under open access, to copy it, manipulate it, mash it up, and rewrite it in another from. Particularly when the article appears in an XML format. It will happen. How will the original author take this? Will it slow down the publication process as author’s wait until they have fully mined their research results in all possible ways before sending it for publication? There are many issues which still need resolution before there is total and universal compliance with open access as the sole business model available to the research community. 11.1.10 Implications for Publishers In a recent issue of The Journal of Electronic Publishing (Michigan) available at http://journalofelectronicpublishing.org., Donald Waters summarised the situation as follows: “It is all too easy to focus on the trendy, glitzy, heart-pounding rhetoric about the initial step of making materials freely available, especially those materials that “your tax dollars helped make possible,” and to trust that only good consequences will follow downstream. It is much harder to focus strategically on the full life cycle of scholarly communications and ask hard questions such as: open access for what and for whom and how can we ensure that there is sufficient capital for continued innovation in scholarly

182 Business Models as Driver for Change

publishing? One worry about mandates for open access publishing is that they will deprive smaller publishers of much needed subscription income, pushing them into further decline, and making it difficult for them to invest in ways to help scholars select, edit, market, evaluate, and sustain the new products of scholarship represented in digital resources and databases. The bigger worry, which is hardly recognized and much less discussed in open access circles, is that sophisticated publishers are increasingly seeing that the availability of material in open access form gives them important new business opportunities that may ultimately provide a competitive advantage by which they can restrict access, limit competition, and raise prices.”

11.2 Online Advertising as a new business model Besides openess, there is also the potential for publishers to embrace online advertising as a new business development. 11.2.1 Online Advertising The Internet advertising market in the UK represents a growing segment in an otherwise depressed advertising market generally. In the first half of 2007 Internet advertising in the UK grew from 10.5 % to 14.7 % of the total advertising market reaching over £ 1.3 billion in spending. The total advertising market achieved a growth of 3.1 % whereas the online part had grown 41% over the previous period in 2006. Internet advertising has grown faster than any other mainstream advertising medium according to the IAB (Internet Advertising Bureau). Press, TV, Radio, cinema and direct mail all experienced falling revenues. The main formats within online advertising have also shown growth. • Display advertising accounts for 21.5 % of all online advertising • Paid for listings remained the largest online advertising format with 57.1 %

share • Classified advertisements amounted to 20.8 % of the online advertising market • Email is a new and growing category, still only 0.6 % of the online ad spend

however There are a number of drivers for future growth in online advertising. With over 52 % of the adult population in the UK having broadband at home (an increase from the 13 % of three years earlier) advertisers are able to be more creative in their online advertising campaigns. Also, on average 26 % of Internet users’ media day is spent online. As such the Internet is second only to TV as the most consumed medium. Nor is this just a young generation issue – the over 50’s account for nearly 30 % of all time spent online. 11.2.2 Advertising in the scholarly area However, it has been a historical feature of scholarly publications that advertisers, and online advertisers in particular, are not enamoured with the opportunities arising from the small number of buyers/subscribers which are typical of the sci-

Online Advertising as a new business model 183

entific journal, and that this low subscription number is also distributed globally. The impact from an online advertisement attached to a specialised research journal was seen as low in comparison with other media. But given the robustness of online advertising and its rate of growth a number of experiments are now being conducted which impinge on the research sector. Reed Elsevier is trying out an experiment in the USA that stands this model on its head. In September 2007 it introduced a Web portal, www.OncologySTAT.com, that gives doctors free access to the latest articles from 100 of its own medical journals and that plans to sell advertisements against the content. The new site asks oncologists to register their personal information. In exchange, it gives them immediate access to the latest cancer-related articles from Elsevier journals like The Lancet and Surgical Oncology. Prices for journals can run from hundreds to thousands of dollars a year. Elsevier hopes to sign up 150,000 professional users within the next 12 months and to attract advertising and sponsorships, especially from pharmaceutical companies with cancer drugs to sell. The publisher also hopes to cash in on the site’s list of registered professionals, which it can sell to advertisers. Mainstream publishers have wrestled for years with the question of how to charge for online content in a way that neither alienates potential readers nor cannibalises their print properties. So far, few definitive answers have emerged. Reed Elsevier is taking a risk that its readers will drop their paid subscriptions and switch allegiance to the new Web site. However, Elsevier will select the articles that go on to the ad-supported site. This means that they forecast few subscription losses from libraries, as libraries will want the entire journals not just a selection. If this can be made to work, then the advertisement revenue will be seen as additive – as one would expect, considering Elsevier’s history and status as a publicly traded company. A boon for shareholders, but little relief for librarians. Elsevier will supplement their own journal content with summaries of the content from third-party journals. This means that part of the readership of those journals will find those articles through Elsevier’s portal. A good marketing position, as it permits Elsevier to sell ads against summaries of others’ content (as search engines do). This will have the effect of weakening the economics of those third-party journals, who will lose some advertising potential to Elsevier. The markets are those clinicians not associated with an institution. This is, in other words, a new market, for which ad-supported media rather than subscriptions provide the best return. Long term, this strategy will improve the editorial quality of Elsevier’s journals (at least for articles on oncology) at the expense of competitors’ journals. This is because authors will seek the best of both worlds: institutional representation through Elsevier’s subscriptions, and Open Access through the ad-supported portal. At present only Elsevier can offer this. Thus authors will migrate to Elsevier publications, enhancing the prestige of Elsevier’s programme, and enabling Elsevier to continue to raise prices on subscriptions. All this needs to be set in the context that several large newspapers in the USA are considering switching their business plans from relying on online subscriptions to online advertising. The New York Times was the first to make this major move in the Summer 2007, and there are rumours that the Wall Street Journal might make the same move once Rupert Murdoch takes over. It could be that the WSJ would be taking a bigger risk given that the NY Times is foregoing $ 10 million in annual

184 Business Models as Driver for Change

TimesSelect subscriptions, where the WSJ online subscription service is pulling in an estimated $ 50 million annually. This is definitely an area worth monitoring as the implications on the business practices of electronic publishers in future could be dramatic.

11.3 Summary The shock of the new is wearing off. OA is gradually emerging from the fog of misunderstanding. Time itself has reduced panic and misunderstandings of OA. Everyone is getting used to the idea that OA literature can be copyrighted, that OA literature can be peer-reviewed, that the expenses for producing OA literature can be recovered, and that OA and subscription-based literature can co-exist. As such the paradigm is changing, and open access is a key driver in making this happen. How significant this will be in the long term may be questionable, but in the short and medium term is will dominate the agenda for electronic publishing and has become a crucial issue within the corridors of those who can make an impact on the publishing scene – the large research funding agencies. Only when this impact drills down to change the behaviour of research authors, when they are either persuaded or forced to change their views on where their research results are made optimally visible (by their own criterion), will open access make the breakthrough which pundits have long claimed and advocates have long hoped for.

Chapter 12

Funding Research as a Driver for Change

The development of laws and public policies usually lags behind the emergence of new technologies. That is certainly the case with the digital technology and the network revolution. Governments have been trying to keep pace with the rapid changes in the digital age and to regulate the implementation and uses of these technologies at the national and international levels. However in some areas political intervention is proving to be prescriptive rather than reactive and as far as adopting new publication business models is concerned this has become a contentious area.

12.1 Political developments In effect politics is giving traction to the open access movement. Open access needs a powerbase of support, and this is coming in part from a greater national and international political awareness and involvement. International collaboration among supporters of the open access movements has come to the fore. Several international meetings have been held at which the outlines for a common approach to open access have been signalled. There have been many international conferences held on the subject, notably at CERN, Geneva, in Switzerland – CERN being a leading advocate of open access within the global physics community. Besides which there are listservs, moderated lists, newsletters, etc, being spawned around the subject, led by vocal supporters such as Stevan Harnad, Jean-Claude Guedon and Peter Suber. All-in-all, there has been a dramatic rise in a high-level focus around open access for scholarly publishing, with the aim of winning the hearts and minds of decision makers in government and science policy implementation, as well as researchers and authors, to commit them to and actively support this new business model. However, it is not that easy. Government agencies do not necessarily have a free or even independent hand in effecting a change in the business model of scholarly communication. Namely, if the public is paying for the research effort, the public should be allowed to access the published results without restriction. But there are other issues at stake. Governments have other policies to consider, some of which may be counter to their adopting an open access agenda. For example, the economic wealth of the nation could suffer if the traditional publishing industry was fundamentally changed. Governments stand to be accused of interference in the scientific research process if they pushed for a particular untried business model.

186 Funding Research as a Driver for Change

One particular example of the difficulties faced by a national organisation around their policy on open access can be seen in the US with the actions of the National Institutes of Health (NIH). In 2005 the US Congress authorised the NIH, funders of some 65,000 research projects each year, to ensure that the recipients of these NIH grants deposit a postprint version of the resultant article in the NIH’s central repository, PubMed Central (PMC). The postprint is a pre-publication version of the article, which includes refereeing but excludes final formating for online access. The NIH, under pressure from the powerful publishing lobby, watered down the ‘demand’ to deposit to a ‘voluntary request’. The result has been that barely 4 % of the grant recipients complied, and instead continued to submit their manuscripts to the established publishing channels. It became clear that voluntary strictures were not appropriate and a mandated demand is required for significant compliance. This has now been adopted by the NIH, and henceforth all grant recipients are required to deposit their published research results into PMC within 12 months of publication under the December 2007 passing of the US Appropriations Act. Other countries have learnt from this example. In the UK, the Research Councils have largely followed the mandating path. Not all the eight research councils have adopted a mandated approach but most of them have, including the powerful Medical Research Council, ESRC and BBSRC. They were preceded by the Wellcome Trust which has gone further than most other funding agencies by not only mandating publication within a subject-based repository but also being instrumental, along with the research councils, in setting up a UK mirror-site of the NIH-based PubMed Central. This was launched in January 2007.

12.2 Open Access Initiatives The Budapest Open Access Initiative (BOAI) was a conference convened by the Open Society Institute on December 1–2, 2001. This small gathering of individuals is recognised as one of the major historical, and defining, events of the open access movement. The opening sentence of the Budapest Open Access Initiative encapsulates what the open access movement is all about, and what its potential is: “An old tradition and a new technology have converged to make possible an unprecedented public good.” The Budapest Open Access Initiative spawned a number of other similar open access conferences. Following on from the spirit of the Budapest Open Access Initiative was the ECHO Charter and the Bethesda Statement on Open Access Publishing. The Max Planck Gesellschaft (MPG) was instrumental in the Berlin Declaration that emerged from a conference held in Berlin in October 2003. Further conferences along the same theme of open access then took place at CERN (May 2004), Southampton (February 2005), Golm (2006) and Pardua (2007). All these took as an expression of faith that the Internet had fundamentally changed the practical and economic realities of distributing scientific knowledge and cultural heritage. For the first time ever, the Internet offered the chance to constitute a global and interactive representation of human knowledge and the guarantee of worldwide

Ranking countries by research output 187

access. In October 2003 there were 19 signatories to the Berlin Declaration – in March 2006 this had climbed to 155 agencies. As important as access to the research articles is the access to data, to supplementary material, to new research tools. With this as background, there are some countries that have become involved in making changes to the way electronic publishing is implemented. Their influence in effecting such change at a global level relates to the scale of their scientific R&D and their output of scholarly material.

12.3 Ranking countries by research output The US has produced a huge number of scientific papers over the past decade – more than 2.9 million – and took the lead in total papers among the top countries. Japan ranked second with 790,510 published papers, or roughly one-third of the US publication output. The ranking of total papers is given below. Total Papers in All Fields from 1996–2006: Country United States Japan Germany England France China Canada Italy Spain Australia India South Korea Taiwan

Total Papers, 1996–2006 2,907,592 790,510 742,917 660,808 535,629 422,993 394,727 369,138 263,469 248,189 211,063 180,329 124,940

This gives an indication of the past and current position of the countries as contributors to the world’s scholarly knowledge base. But it does not necessarily reflect how important the countries will be in the future. The changing proportion of R&D activity on a regional basis will have some significance, for example, as will the quality of the national infrastructures, the support given to academic and industrial research in each country and many demographic and educational features.

12.4 National and International government initiatives 12.4.1 A model for a new electronic publishing paradigm In the Summer of 2006, a 150 page report was published by John Houghton, Colin Steele and Peter Sheehan from Victoria University, Melbourne, Australia, which sought to establish whether there are new opportunities and new publishing models for scholarly communication that could enhance the dissemination of research

188 Funding Research as a Driver for Change

findings and thereby maximise the economic and social returns to public investment in R&D. Houghton claimed that it is impractical to look at individual segments of the scholarly communication process in isolation – as the market changes so money moves around the system, and it is the size and extent of the research budget overall which is important. The Australian estimate for the total spend on scholarly communication in all its forms – from author creation, through publishing, storing and reading the research results – was put at Aus$ 3.6 billion (plus or minus 50 %). The mean costs of published staff reading was set at A$ 2.7 billion; writing costs A$ 480 million; refereeing cost A$ 100 million; library acquisition costs were A$ 317 million and ICT infrastructure was over A$ 1 billion. Houghton then estimated the benefits from a one-off increase in efficiency and found that: • with public sector R&D expenditure at Aus$ 5.9 billion (2002/3) and a 25 %

rate of social return to R&D, a 5 % increase in efficiency would be worth Aus$ 150 million a year. • with higher education R&D expenditure at Aus$ 3.4 billion, and a 25 % rate of social return to R&D, a 5 % increase in efficiency would be worth Aus$88 million per year These are sizeable figures. The team then estimated that it would cost Aus$ 10 million over 20 years to convert the scholarly communication system into open access (through mandated institutional repositories). Setting these costs against the above efficiency gains, the • cost/benefit ratio for open access in the public sector research is 1:51 (ie, 51

times more benefits than costs) • cost/benefit ratio for open access to higher education was 1:30.

Both these cost/benefit ratios are very high suggesting that the social benefits from moving to open access – on the basis of the Australian experience – is heavily supported. An international group of economists is currently taking the model and refining it. (See: “Research Communication Costs in Australia: Emerging Opportunities and Benefits – a report to the Department of Education, Science and Training”, September 2006.) 12.4.2 European Commission FP7 e-infrastructures On 6 February 2007, a virtual information day was hosted by the EC in Brussels. This invited the first calls for proposals under the e-infrastructure topic of the ‘Capacities’ Specific Programme within Framework Programme 7 (FP7). This will be a seven-year programme (2007–2013) with a budget of at least € 570 m. The overall budget for FP7 will be in excess of € 50 bn. The e-infrastructures area supports a number of interrelated topics designed to foster the emergence of a new research environment in science in which ‘virtual communities’ are expected to play a leading role. In this context, the term ‘science’ is intended to cover all disciplines from arts and humanities, through social sciences,

National and International government initiatives 189

to science and engineering. In short, e-infrastructure “aims to create in all fields of science and technology new research infrastructures of pan-European interest needed by the European scientific community and to help industry to strengthen its base of knowledge and technological know-how.” The first call had four themes: • Scientific digital repositories (€ 15 m). This involves a coordinated approach to

the development of digital repositories for the scientific community by pooling existing resources at European level and supporting data storage, archiving, interpretation, interoperability, management and curatorial activities, and contributing to common open standards and their widespread adoption. • Deployment of e-infrastructures for scientific communities (€ 27 m). This activity aims to reinforce the impact, adoption and global relevance of the e-infrastructure across various areas of science and engineering by addressing the specific needs of new scientific communities and providing advanced capabilities and applications to more researchers. • Design studies (€ 15 m). The aim is to support feasibility studies, based on totally new ideas for new research infrastructures that are of a clear European dimension and interest in the long term. Major upgrades of existing infrastructures may also be considered when the end result is intended to be equivalent to, or be capable of replacing, a new infrastructure. All fields of science and technology could be considered, but such studies should address as a priority topics identified as ‘emerging’. • Preparatory phase for ‘Computer and Data Treatment’ research infrastructures in the ESFRI Roadmap (€ 10 m). Calls for proposals for the preparatory phase of the construction of new infrastructures will be restricted to the list of projects identified by the Commission. A subsequent call will be made in the autumn of 2007. See http://cordis.europa.eu/ist/rn/ri-cnd/fp7info-day.htm

The Sixth Framework programme ended in 2006, and preparations are underway for the Seventh Framework (2007 to 2013) that as indicated above will considerably expand the budget available for the EU Research directorate. The existing budget for Research, which already is the third largest expenditure area in Europe (after agriculture and regional support), absorbs Euros 5 billion. As a result of the Lisbon Strategy, this will rise to Euros 10 billion. Euros 1.5 billion of this will be set aside in a new basic research programme. The stimulus for this activity is recognition that research is a major source and stimulus for innovation and economic development. The Research directorate commissioned a survey to look at the scientific publications market to see whether there are structural or economic factors that formed a barrier to the flow of scientific information from creator to user. It is believed that the European approach is weaker than in the US, and the level of public understanding of the role of Science in society is poor. A number of activities are planned besides the scientific publications study. A survey that covered 30,000 respondents in 32 countries was completed, which explored public perception of Science. Initial results suggest that, compared with an earlier survey in 1992, there has been a general decline in public interest in science issues.

190 Funding Research as a Driver for Change

Currently the Research directorate funds some 2,000 research projects per annum, most being published through the RTD and the special Publications division of the EU. 1,600 staff administer the research programs from the offices in Brussels. 12.4.3 EU Study of Scientific Publishing (2006) The University of Brussels, together with a team of economists at the University of Toulouse, were commissioned by EU to analyse the European scientific journals market in 2005. The remit of the study was to undertake an economic assessment of the current system, assess proxies for other models, and analyse the new models that included looking at open access. The certification process was seen to be a key element of the study. In essence Professor Legros, who headed the project, confirmed that it was an economic study, but with apparently little focus on the operational intricacies of publishing, or using operational data collected from the publishing sector. The focus was on identifying barriers of entry to the industry, and if these barriers exist what impact this has on the efficiency of STM publishing within Europe. They looked at the changes that took place, the forces which drove these changes, the resistances which existed and what the consequences to the users. The methodology adopted was to look at the evolution of the market structure (as an indicator of the future), current pricing policies, the copyright and licensing situation, and the new business models. It became evident that several other economists held considerable sway with those undertaking the EU study – Bergstrom and Bergstrom, and McCabe (BB&M). This was unfortunate for the publishers as there was much that publishers had been challenging about the writings and economic modelling of BB&M. Collection of data from the publishers by the EU project team had been more at the instigation of publishers and publisher associations than from the research team itself. Nevertheless, from the data sources used, the strong emphasis has been to look at the differences in price per page and price per citation between the commercial publishers and the not-for-profit publishers. They came up with the following table that is indicative of their general approach and findings: Journals with many and few citations Publisher and Sector For Profit Elsevier Academic Press

Journals with ‘many’ citations 59 31 8

Journals with ‘few’ citations 65 18 3

Not for Profit University Chicago Oxford Univ Press Cambridge Univ Press American Economics Assoc

35 7 9 3 3

10 0 2 5 0

National and International government initiatives 191

In effect, from their analysis they deduce that it is easier for the ‘for profit’ publishers to expand their title output – that they are more active or aggressive compared with the ‘not for profit’ sector. In terms of price per citation, they have produced the following: Price per citation For Profit Elsevier Kluwer Springer

1.32

Not for Profit American Econ Assoc Univ Chicago Press Oxford Univ press

0.08

1.34 1.70 2.64 0.01 0.05 0.37

On a price-per-page basis, the For Profit publishers average out at 0.75, whereas the Not for Profits are 0.16. On these bases, the researchers claimed that the For Profit publishers were more expensive. They also claimed that the more expensive For Profit publishers were also the ones more dynamic in starting new journal titles. High prices equates with greater innovation in their analysis. The point is also made that the For Profit publishers discriminate when it comes to citations – if the citation count grows by 1 % to a title, the price is increased by 0.22 %. With not for profit publishers, or indeed abstract and indexing services, no such correlation is discernible. According to Professor Legros, new technology allows everyone to become an author – the entry to the market is technically easier. New business models are emerging – not just open access but also models that unbundle the production from the certification. The Faculty of 1,000 (BioMed Central) and the New Journal in Economics provide different types of sifting and certification. But are there barriers that prevent such new business (and functional) models being adopted? The team looked into the barriers to competition. The requirements for scientific publication include: • Need for some element of certification. This requires a network or pool of quality

referees. Once established this will generate good authors and readers • Need to provide access to a stock of knowledge. This favours those publishers

that have a large portfolio of journals (a reader requirement). The current publishing system gives a powerful competitive advantage to those publishers already firmly entrenched in the market – they have the stock of publications, they have the referees. The question is do they leverage this advantage? Yes, according to Professor Legros most notably through the ‘Big Deals’. Librarians have become enamoured (though decreasingly so) with obtaining a large number of titles at a low price. The marginal cost for additional titles in the Big Deal bundle is low. However, this causes a barrier to entry to the market from those smaller publishers that are not part of a Big Deal. The Deal protects the revenue base of the

192 Funding Research as a Driver for Change

larger publishers whilst leaving less library funds available for the small, specialized publisher. Professor Legros suggests that high prices in themselves are not always in violation of competition law. Relatively high prices (in comparison with other sectors of the market) suggest the existence of entry barriers and the creation of natural barriers. The Big Deal powers are particularly unattractive in his view. His aim was to see a more competitive market for publications in Europe, and a reduction in the natural barriers. To do this he felt: • • • • •

New copyright rules should be introduced Open Archives should be supported Deal with the access rights problem Public funds should be leveraged to support dissemination Maintenance and access to repositories enabled.

It seems that the authors of the report have dwelled on the market imperfections in terms of the supply sector, and felt that buyer power was limited. However, the multiple pricing options offered by publishers, not just confined to the Big Deal, does reduce some of the market imperfections assumed by the research team. Also, slower start up rates for journals by Not for Profit publishers, also used as an indication of imperfection, may be as much related to the interdisciplinary nature of commercial over learned society publishers – the greater ability to move with the new disciplines which For Profit publishers can do. Buyer Power is also being exerted through the listservs and open discussion of particular publisher’s pricing model, in some cases with successful changes being effected. One other key obstacle that did not appear on the research team’s radar screen was the market distortion caused by the different VAT rates applied to print and electronic journals. In the UK the 15–20 % price reduction which could be achieved by libraries in switching from print to electronic is negated by the 17.5 % VAT application on the e-version (whilst the print is zero rated). How this applies across Europe seems unclear, whether the same rules apply that universities are not able to reclaim VAT, and whether this is an economic issue that the research team could usefully analyse and present to the EU in their report remains to be seen. The report was published in July 2006. It identified the potential policy issues, gave recommendations, but the final decisions will be taken by the EU and could involve multiple directorate generals. As suggested above, there was substantial criticism voiced against the report from the publishing sector. It was felt wrong for the report’s authors to equate citations with circulation of journals, a key feature of the economic model that was adopted in the report. It was felt that inadequate attempts had been made to obtain journal circulation figures either from published sources or from publishers themselves. In some low citation but high circulation medical journals, advertising revenues were a feature that was ignored. No reference was made to Tenopir and King’s work in this area. All this led to questions about the validity of claiming that For Profit publishers were 2.7 times more expensive that Not For Profit publishers – a key suggestion by the report’s authors. According to the publishers, the emphasis by the Brussels/Toulouse team on looking solely at publisher’s list price for titles meant that the results and conclusions from their work on the pricing issue was questionable. It was also suggested that there was inherent conflict between the

National and International government initiatives 193

recommendations proposed – for example, the first recommendation proposed that mandates should be introduced for authors – yet this contravenes the ‘level playing field’ to which they aspired in the second recommendation. In general it was felt by sectors of the publishing industry that the authors insufficiently understood the reality of publishing practice, and the impact on learned societies was overlooked. Many related questions were either ignored or dismissed. Having said that, there was support for the report being a platform on which a new and cooperative approach could be made to some of the challenges facing scientific publications. There were some good recommendations that could be worked on – generating access to public funded publications, perennial access to material, interoperability, etc. The above EU report focused on classical economic issues. The assumption was that there were market constraints, and the report dwelt on this. There were barriers to entry given the monopolistic nature of journals, and publishers used ‘value pricing’ rather than cost pricing as a form of self-protection. There was a steep rise in journal prices between 1975 and 1995, and these were only ameliorated in recent years by ‘bundling’ and Big Deals. Nevertheless, For Profit (FP) publishers remained 180 % more expensive than Not For Profit (NFP) publishers, according to the report. 12.4.4 EU open access developments Stemming from the above report a conference was convened in Brussels by the EC from 15–16 February 2007, entitled ‘Scientific Publishing in the European Research Area – Access, Dissemination and Preservation in the Digital Age’. Some 500 participants attended from 50 countries, representing leading stakeholders in the scientific publications sector. The key issue was whether the EC would commit to an OA mandate and, in particular, whether the results of research conducted under the Framework Programme 7 (FP7) would have an OA deposit mandate. The initial auguries were not good for those publishers relying on a subscription/licensing business model for their survival. Intense activity by open access advocates in the months running up to the conference highlighted the concerns about the dysfunctional scientific publication process. Just prior to the conference, a petition was presented to the commissioner responsible for science and research (Janez Potocnik) registering the support of 18,500 signatories – including 750 from institutions – for free and open access to European public-funded research results. Stevan Harnad (from universities Montreal and Southampton) even went to the lengths of committing the petition results to a multimedia film at the end of the conference. The publication of the European Research Advisory Board (EURAB) report in December 2006 also endorsed open access. Acknowledging the complexity of the issues involved, it was suggested by EURAB that the Commission should consider implementation of a mandated approach to article inclusion in repositories on a phased basis, starting with research funded by the European Research Council. However, a communication which was issued by the EC just prior to the conference failed to offer a clear ruling on an open access mandate, much to the chagrin of the open access school. The communication, On Scientific Information in the Digital Age: Access, Dissemination and Preservation (COM (2007) 56), outlined the actions it

194 Funding Research as a Driver for Change

proposed to take at a European level to help increase and improve access to and dissemination of scientific information. The intention, the Commission says, is not to mandate open access publishing and digital preservation, but to promote best practice and initiate a policy debate on these matters. The conference itself might have resulted in definitive policies in favour of OA being produced, but in the event this did not happen, and the two commissioners who opened and closed the conference outlined the need to strike a balance between providing OA and destroying the present system. Further research (to provide more evidence) and further dialogue amongst all stakeholders were the main recommendations. Whilst the EC broadly embraces the move towards open access to scientific knowledge, it recognises the fact that, in the European Union, 780 scientific publishers employ 36,000 persons and produce 49 % of world scientific publication output. The sheer size of the industry and the state of current information makes it unlikely that there could ever be outright support at the present time. This leaves the field open to self-regulation, which means that research funders, research organisations, universities and libraries will have to negotiate the way forward. Though the publishers and their associations may not have been overtly responsible for this ‘wait and see’ policy by the EC, they were active just prior to the conference, presenting a Brussels Declaration on STM Publishing signed by 43 companies and publishing organisations. This put into context the concerns of the publishing industry that frenzied attempts were being made to impose OA on the scientific publishing sector. Ten major principles were identified in the declaration covering what the mission of publishers is; the increase in access which is being achieved; the need to ensure that quality control through an effective peer review system is sustained; and that all this incurs costs – there is no ‘free lunch’. The statement ‘one size does not fit all’ was repeated a number of times during the conference. The conference did not end without some proposals being promulgated. The frequent concerns which were expressed about the need to provide preservation of digital objects were addressed, and commissioner Viviane Redding (information society and media) made much of this with the commitment of co-funding research infrastructure development (€ 50 m for digital repositories, € 25 m for digital preservation and collaborative tools and € 10 m for access and use of scientific information). Much was also made of the need for datasets to be brought into the open access debate, and recognition of the importance of such raw data in the scholarly communication process. However, the supporters of open access failed to achieve a major breakthrough in EC mandating of publications arising from Commission-funded research. Stevan Harnad, who claimed that the meeting was hijacked by discussions of revenues and business models, and did not focus on issues of optimal access, lamented upon this. Additionally, Harnad felt that publishers had an advantage in having contributed to the drafting of the Communication that was rushed out on the first morning of the conference. So, whereas the open advocacy battle still seems to being won by the open access movement (as reflected in the now over 20,000 signatories to the petition), this particular skirmish in Brussels seems to have been won by the publishing industry on points. However, this is clearly not the last word from the EU on the scholarly communication issue.

Publisher Initiatives 195

12.4.5 European Research Council In fact it was not long before one of the European agencies disclosed their hand. At the end of December 2007 the European Research Council (ERC) stated its intentions. The European Research Council (ERC) published its position paper with regard to mandating self-archiving for open access of research articles which derive from ERC funded projects. Their recommendation is that “All peer-reviewed publications from ERC-funded research projects be deposited on publication into an appropriate research repository where available, such as PubMed Central, ArXiv or an institutional repository, and subsequently made Open Access within 6 months of publication.” See: http://erc.europa.eu/pdfScC Guidelines Open Acces/s revised Dec07 FINAL.pdf

To place the ERC in context, the European Commission (EC)’s Directorate General for Research has an annual budget of 50 billion euros (US$ 73 billion), for the years 2007–2014. Recipients of an EC Research Grant sign a copy of the EC’s Standard Model Grant Agreement. This does stipulate that electronic copies of the published versions or the final manuscripts accepted for publication shall be provided to the Commission. It does not, however, mandate OA. The wording of the Grant Agreement was fixed in 2007 and is most unlikely to change before the year 2014. The European Council (together with the European Parliament) exercises democratic control over the European Commission. It does not administer a research budget as such. The European Research Council (ERC). In December 2006, the ERC and its sevenyear 7.5 billion Euro budget got the seal of approval by the European Council. On 17 December 2007, the ERC issued mandatory requirements for OA of all peerreviewed publications from ERC-funded research projects. As with the recently amended mandate on Public Access agreed under the delayed 2007 US Appropriations Act (passed on December 27, 2007) for the National Institutes of Health (NIH), the ERC mandate is also an immediate-deposit mandate, with the allowable embargo applying only to the date that the deposit is made open access, not to the date it is deposited (which must be immediately upon publication). The ERC embargo is also shorter (6 months, whereas for NIH it is 12 months). The ERC mandate also differs because it includes the option of an institutional deposit, not just a central subject-based repository, which will please the European ‘Green’ open access lobby.

12.5 Publisher Initiatives 12.5.1 US publishers’ PR campaign According to an article in a 24 January 2007 issue of Nature (DOI 10.1038/445347a), public relations specialist Eric Dezenhall had been approached to help US publishers counter the sort of PR activity undertaken by the open access movement within the United States. Dezenhall made a name for himself in the US by helping companies and celebrities protect their reputations. He worked with Jeffrey Skilling, the former Enron chief now serving a 24-year jail term for fraud, and Dezenhall’s activities have resulted in his acquiring the title of ‘the pit bull of public relations’.

196 Funding Research as a Driver for Change

The Nature article claims a group of scientific publishers had hired ‘the pit bull’ to take on the open access movement in the US. Some traditional journals, which depend on subscription charges, say that open access journals and public databases of scientific papers, such as the National Institutes of Health’s (NIH’s) PubMed Central, threaten their livelihoods. Furthermore it is anticipated that federal activity will make this even more of a threat in months to come. It is claimed that staff from Elsevier, Wiley and the American Chemical Society met with Dezenhall in July 2006 at the Association of American Publishers (AAP) meeting. A follow-up message from Dezenhall included a robust strategy that publishers could adopt. There appeared to be one simple message that Dezenhall suggested should be promulgated by publishers – to whit: ‘public access equals government censorship’. He further suggested that publishers should attempt to equate traditional publishing models with peer review, and ‘paint a picture of what the world would look like without peer-reviewed articles’. Dezenhall’s view is that if the other side is on the defensive, it doesn’t matter if they can discredit your statements: ‘media messaging is not the same as intellectual debate’. Dezenhall also recommended joining forces with groups that may be ideologically opposed to government-mandated projects such the Competitive Enterprise Institute, a conservative think-tank based in Washington DC, which has used oil-industry money to promote sceptical views on climate change. Dezenhall’s fee for orchestrating this campaign is estimated to be $ 300,000–$ 500,000. Nature admits that no confirmation was obtained on the details of the meetings between AAP and Dezenhall, but there are indications of some discussions. The AAP is ratcheting up its public relations activity against serious competitors to the journal subscription model (even though most of the significant publishers in the biomedical area are currently cutting deals with PubMed Central on having their titles included). “We’re like any firm under siege,” said AAP’s Barbara Meredith. “It’s common to hire a PR firm when you’re under siege.” This initiative reflects how seriously publishers are taking recent developments on access to published research. In the UK, similar concerns by publisher trade associations led to an ‘Apocalypse Now!’ meeting convened by publishers in early 2007 to focus attention on the growing challenge to the scholarly publishing industry of open access in all its forms. However, and perhaps not unexpectedly, there was comment from the open access movement that the funds being made available for the ‘pit bull’ may have been better invested in publishing. However, the late Peter Banks (consultant) pointed out: “It is quite astounding to hear the outcry over publishers engaging in ‘media messaging’ rather than ‘intellectual debate’. For years, the OA camp has used media messaging – with its attending distortions and gross simplifications – to great effect. Consider a pearl like, ‘Taxpayers have the right to access research they have already paid for.’ Indeed they do. They can look at exactly what they have paid for – which is research up to the stage of preprints. They have not, however, paid for peer-review, copy editing, composition, or any of the other value that a publisher adds to the manuscript. That inconvenient fact has not, however, stopped OA advocates from disingenuously implying that publishers are cheating taxpayers from something they already own. (By this logic, one might argue that citizens have the right to free bread for having paid agricultural subsidies)

Publisher Initiatives 197

Before OA advocates start huffing about the need for ‘intellectual debate’, they need to demonstrate their own intellectual integrity.” It appears that this particular argument is starting to resolve into a few fundamental issues and is becoming somewhat testy, and that perhaps the publishing industry is finally taking on the emotive advocacy promulgated by the OA movement in recent years in a hard-nosed way. 12.5.2 PRISM – Advocacy programme from the publishers Following on from the above AAP initiative, a new project was launched in August 2007 which aimed to bring together scholarly societies, publishers, researchers and other professionals in an effort “to safeguard the scientific and medical peer-review process and educate the public about the risks which the proposed government interference would have on the scholarly communication process.” The Partnership for Research Integrity in Science and Medicine (PRISM) is a coalition launched with support from the Professional & Scholarly Publishing Division of the Association of American Publishers (AAP) “to alert Congress to the unintended consequences of government interference in scientific and scholarly publishing.” The group launched a website at: http://www.prismcoalition.org/, where it articulated the PRISM principles, which included a robust highlight of publishers’ contributions to science, research, and peer review, and a call for support for continued private sector efforts to expand access to scientific information. Patricia Schroeder, President and CEO of AAP, claimed that “Only by preserving the essential integrity of the peer-review process can we ensure that scientific and medical research remains accurate, authoritative, and free from manipulation and censorship and distinguishable from junk science.” According to Schroeder, there have been recent legislative and regulatory efforts in the US to compel not-for-profit and commercial journals to surrender to the federal government published articles that scholarly journals have paid to peer review, publish, promote, archive and distribute. An example of this was the NIH mandate to have the article outputs from the research they funded included in the public PubMed Central database. Schroeder stressed that such government interference in scientific publishing would force journals to give away their intellectual property and weaken the copyright protections that motivate journal publishers to make the substantial investments in content and the infrastructure needed to ensure widespread access to journal articles. It would jeopardise the financial viability of the journals that conduct peer review, placing the entire scholarly communication process at risk. Critics argue that peer-reviewed articles resulting from government-funded research should be available at no cost. However, the expenses of peer review, promotion, distribution and archiving of articles are paid for by private sector publishers, and not through taxes. Schroeder pointed out that these expenses amount to hundreds of millions of dollars each year for non-profit and commercial publishers. “Why would a federal agency want to duplicate such expenses instead of putting the money into more research funding?” she said. The PRISM website includes factual information and commentary designed to “counter the rhetorical excesses indulged in by some advocates of open access, who believe that no one should have to pay for information that is peer reviewed

198 Funding Research as a Driver for Change

at the expense of non-profit and commercial publishers.” See: http://www.prismcoalition.org/

12.5.3 The European PEER Project In Europe there has been similar activity by publishers on the PR front. It has been clear that the European Commission have felt uncomfortable with the criticisms voiced by the publisher trade associations to the report undertaken by Brussels and Toulouse economists, and have been reluctant to take the advice of the pro-open access lobby to mandate open access for the published output of research funded by the Commission. So far at least though there are suspicions that their collective heart is more in support of the open access arguments. Discussions have been held to provide more quantifiable evidence of the impact that open archives, particularly in institutional repositories, will have on the publication process of journals. This has led to the PEER project. In order to resolve the conflicting views on the commercial impact of the deposit of manuscripts within institutional repositories the international STM Association has developed the PEER project for consideration (and funding) by the European Commission. This large-scale collaborative pilot would enable publishers and the scientific community at large to gather quantitative evidence on the impact of such deposits and form the basis for negotiated (as opposed to legislated-only) solutions. A proposal was submitted in October 2007 for EC eContentplus funding which would have the buy-in from as broad a sector of the scholarly journal publishing sector. So far an outline proposal has been circulated to publishers in July 2007 summarising the project and issues. An invitation to publishers to provide lists of journals to be included in the project has been made. It is intended that some 300 titles would be involved from a cross section of publishers, though a strong emphasis on European focused journals may be more appropriate given the interests of the proposed funding authority. However, attempts to distort the results in publishers’ favour, through the selection of candidate titles that prove their case, was to be avoided. However, some issues do warrant investigation, and some of these were listed for PEER participants. • Whether the deposit (of stage-two manuscripts) will harm publishers – STM

• • • •

believes it will through resulting in a decline in downloads at the publisher sites Will this generate “new” usage or just migration from publisher sites? Will such deposits in institutional repositories raise the productivity of researchers? Will it be cost effective for publishers to set up a system to deposit the manuscript? PEER will test this out and assess the additional costs One size does not fit all and patterns will vary per subject area, e.g. differential usage, author deposits, optimum embargo times, etc.

Though a three year time frame was discussed informally with representatives from the EC, it was also felt that such a short timeframe (given the nature of the problem to be tested) would not be sufficient to study the impact on citations.

Publisher Initiatives 199

12.5.4 Publishers’ White paper on academic use of journal content The International Association of Scientific, Technical and Medical Publishers (STM), the Professional and Scholarly Publishing Division of the Association of American Publishers (AAP/PSP) and the Association of Learned and Professional Society Publishers (ALPSP) have released a white paper on the academic use of journal content. The paper seeks to “create a more balanced understanding of the actual rights policies in place at most journals, and to temper the often overheated rhetoric regarding the role of copyright in scholarly communication”. The new joint white paper notes that the vast majority of academic publishers offer a high level of usage rights by authors and their institutions, including use within the classroom and internal postings for scholar-friendly uses. Also, it points out that the principles of ‘fair use’, ‘fair comment’ or ‘fair dealing’ and the fact that copyright protection does not extend to underlying facts or ideas, (but only to their expression) give academics and critics the freedom to note and comment on research developments. The position paper lays out general terms for the appropriate balancing of rights for academic journal publishing. According to the paper, academic research authors and their institutions should be able to use and post the content that such authors and institutions themselves provide for internal institutional non-commercial research and education purposes. It further states that publishers should be able to determine when and how the official publication record occurs. They should also be able to derive the revenue benefit from the publication and open posting of the official record (the final published article), and its further distribution and access in recognition of the value of the services they provide. Copyright transfers or exclusive licences, even with the rights reserved by authors for academic uses, provide the legal basis for subscription and licensing activities, whether in print or digital environments and whether for journals or individual articles. Transfers or exclusive licences ensure that publishers have the right to deal with uses beyond the ‘first publication right’, to facilitate electronic delivery and investments in such systems, and to manage permissions and similar rights management systems. Exclusive rights also provide a legal basis for publishers to administer copyright and permissions matters for authors and enforce copyright claims with respect to plagiarism and related ethical issues. Recently, a number of major funding agencies have asserted the right to control the distribution of articles that result from funded research programs. Publishers claim to recognise the importance of research funding, and the public interest involved, but are concerned about the potential to waste money with unnecessary duplicate systems, to confuse the scientific record, and to undermine journal revenue. Many publishers also question whether the goals of these agencies could be better met through alternative means (the posting of abstracts or pre-prints, links to publishers’ own websites, or the creation of more consumer-oriented content). Publishers are concerned that, on the one hand, their investments in and contributions to the editing and peer-review systems are dismissed as trivial, while on the other hand, these agencies insist that nothing will help to meet the agencies’ goals other than open public access to the articles that benefit the most from publishers’ contributions. The majority of publishers do recognise that most academic or scholarly uses by authors of their own papers are appropriate and unlikely to harm business

200 Funding Research as a Driver for Change

models. Typically, publisher policies and publishing agreements note the retention or granting of permission for extensive use of author’s papers within the author’s institution, notably for teaching purposes, and posting of some version of the paper for institutional repositories and the author’s personal web pages. See: Author and Publisher Rights For Academic Use: An Appropriate Balance at www.alpsp.org/ForceDownload.asp?id=391

12.6 Library Initiatives 12.6.1 SPARC SPARC, the US-based Scholarly Publishing and Academic Resources Coalition, is an international alliance of academic and research libraries working to correct imbalances in the scholarly publishing system. Developed by the Association of Research Libraries, SPARC has become a catalyst for change. Its focus is to stimulate the emergence of new scholarly communication models that expand the dissemination of scholarly research and reduce financial pressures on libraries. Action by SPARC in collaboration with stakeholders – including authors, publishers, and libraries – builds on the opportunities created by the networked digital environment to advance the conduct of scholarship. Today membership in SPARC numbers nearly 800 institutions in North America, Europe, Japan, China, and Australia. SPARC worked with the Ligue des Biblioth`eques Europ´eennes de Recherche (LIBER) and other European organisations to establish SPARC Europe in 2001. SPARC also is affiliated with major library organisations in Australia, Canada, Denmark, New Zealand, the UK and Ireland, and North America. SPARC finances its efforts through coalition member fees that support operating expenses and help build a capital fund to provide start-up money for its programmes. SPARC also seeks grants to augment the capital fund. The key to SPARC’s success, however, is the commitment of its approximately 200 coalition members to support SPARC initiatives. SPARC’s role in stimulating change focuses on educating stakeholders about the problems facing scholarly communication and the opportunities for change. It also advocates policy changes that advance the potential of technology to advance scholarly communication and that explicitly recognises that dissemination is an essential and integral part of the research process. SPARC also incubates business and publishing models that advance changes benefiting scholarship and academe. Since its launch in June 1998, the SPARC coalition, subject to the fiscal oversight and controls of ARL, has advanced this agenda by: • demonstrating that new journals can successfully compete for authors and

quickly establish quality; • effectively driving down the cost of journals; • creating an environment in which editors and editorial board members claim

more prominent roles in the business aspects of their journals; • stimulating the development of increased publishing capacity in the not-for-

profit sector and encouraging new players to enter the market; • providing help and guidance to scientists and librarians interested in creating

change;

Global Research Trends 201

12.6.2 SHERPA/ OpenDOAR SHERPA is a 33 member consortium of research-led universities which specialises in promoting and advising on the development of open access repositories. Other services developed by SHERPA include JULIET and RoMEO. SHERPA and its services provide summaries of * * * * *

open access to research results what funders want authors to do what publishers will allow authors to do which repository to use where to go for help

The OpenDOAR directory, also created by SHERPA, identifies over 1,000 repositories from across the world. As OpenDOAR forms a major quality target resource for services such UK national information services such as Intute RS and the Depot, 1,000 entries is felt to be a significant step forward in enabling the global virtual repository network to cooperate in new and innovative ways. OpenDOAR aims to create a bridge between repository administrators and the service providers that “harvest” repositories. The typical service provider is a search engine, indexing the material that is held. General Internet searches often bring back too many “junk” results. Information from OpenDOAR enables the search service to provide a more focussed search by selecting repositories that are of direct interest to the user – for example, all Australian repositories, or all repositories that hold conference papers on chemistry. OpenDOAR can also be used by researchers to check if their institution has a repository. Each entry classifies whether archives hold research papers, conference papers, theses and other academic materials that are available as “open access”. Some of these archives hold material on a single subject; others are based in universities and hold information from across many different subjects. Each of the repositories listed by the OpenDOAR service has been visited by project staff to ensure accuracy and precision of the gathered information. This in-depth approach gives a quality-controlled list of repository features.

12.7 Global Research Trends Given the massive commitments to science and R&D by China and India, the feeling is that they could transform the global balance of scholarly information creation and dissemination during the next decade. In response to this realisation, the Research Councils of the UK (RCUK) has established offices in Beijing and Washington in 2007, and possibly New Dehli in 2008. These will help foster international collaboration and partnership in science. A further message which is coming through is that the core quality basic research could, in the near future, come from the western economies and the mass outpouring of applied research findings will emerge from the east. However, to ensure that quality research was still being undertaken in Europe and North America, a continued financial commitment to the science budget would have to be upheld.

202 Funding Research as a Driver for Change

But within the UK there are indications that commitment to STM research would have to face a declining interest from the current student population. A revolution is needed to get younger people excited about STM, and this needs to be addressed in schools. Unless this is done the government’s aim to increase graduate enrolment levels from the current 29 % to a targeted level of 40 % may not be achieved. Science has to be seen to become more desirable. Meanwhile China is turning out 3 million graduates each year and India 2.5 million. A further aspect of the globalisation of Science is that money and capital is also becoming international – it will flow where funds spent on science and research are optimised. So far, the UK is punching above its weight, with 1 % of the world’s researchers producing 9 % of the world’s published papers (in ISI) and achieving 12 % of the global citations. However, China is catching up, rising from 2 % of published output ten years ago, to 6.5 % in 2006. But science cannot be considered insular any more – the RCUK’s new international strategy is in some ways dictated by the high proportion of the grants allocated which involve international collaboration. Government sources in the UK claim they need to raise the proportion of GDP spent on R&D to 2.5 % (and the EU has an even more challenging target of 3 %). The newly appointed Director General for the Department of Innovation, University and Skills (DIUS) in the UK, Sir Keith O’Nions, has claimed during a 2007 conference on Science and Innovation (October 2007) that there is a significant international money flow with regards to innovation. He has presented a map (see next page) which indicates how the R&D money is flowing on an international basis. The point he made is that research will be carried out where it is done best, and money will move to support this. As such, nations need to be seen to perform world-class research. The role of the government as far as product innovation is concerned is to support the economic drivers. These include investment in research, investment in education and training and providing support for IPR. The influences which are brought to bear on the science budget include the macro-economic climate, regulations and legal structures and, in the case of the UK, the government’s £ 130 billion procurement spend. University spin-offs are also a crucial incentive for universities to support innovation and R&D. The last 25 such university spin-offs from universities in the UK have produced a stock market valuation of £ 1.5 billion. There are some severe societal challenges that STM research still needs to address. These include obesity that is set to rise from 21 % currently to 30 % in 2010. The Thames Barrier is another important issue, as the cost of getting this wrong could amount to £ 30 billion. The ageing population means a rise in the retired population from 7.4 million in 1971 to the current 9.7 million, or 31 % of the labour force. Security is another problem that STM can address. Overall it is quality of research, not the cost of research per unit that is important.

Research funding as a driver for change 203

12.8 Research funding as a driver for change Research is funded from a number of sources. Electronic publishing deals with some of the output of this research. As we have seen, in the UK, public funding is a leading source, but there are others. For example, as was seen in an earlier chapter, Industry is a great source of R&D funds, almost 50 % of the total, as indeed are overseas funds that add another over 15 %. 12.8.1 Public funded R&D in the UK The influence that research bodies could have on the way electronic publishing is implemented within the publishing and library communities is therefore considerable. It is a top-down influence given that decisions of a few people in select central organisations could have a significant bearing on EP implementation. These few influencers are the target for lobbying activities from those stakeholders who have vested interests in seeing their respected visions of research directions but also of electronic publishing implemented. They are the target for advocacy programmes from all sides. An example of how different standpoints from people in authority can affect the outcome can be seen from the UK House of Commons Select Committee on Science and Technology enquiry in 2004 during which, after much analysis and debate, the committee came down in favour of implementing an open access policy for the UK. However, the government department responsible for looking after the economic welfare of the nation was influenced by the publishing industry – which made a significant tax contribution and employed a sizeable labour force. Their concern was that open access could destroy the economic base of British publishing. As such the then Department of Trade and Industry (DTI) disagreed with the Select Committee’s report and the government failed to give support for the open access movement. Economics and politics are therefore at the heart of funding agency’s activities and this has a bearing on electronic publishing.

204 Funding Research as a Driver for Change

12.8.2 Structure of Research Funding in the UK The following table summarises the dispersion of research funding by source of funds and by subject area within the UK in 2005 was Distribution of Research Funding in the UK, 2005 Subject Area

Higher Education

Research Councils (and gov)

Corporate

Science, Technology, Medicine Social Sciences Arts and Humanities Business Other

£ 1,202 £ 314 £ 279 £ -

£ 1,661 £ 96 £ 569 -

£ 9,600 £ 1,900

As far as public funding of research is concernedthere is a dual research funding system operating in the UK and increasingly in other countries where measurement of output from research is critical. The essence of the dual support system is that one funding system (provided in the UK by the Higher Education Funding Councils) is as a block grant, dispersed under a Research Assessment Exercise (RAE), to higher education institutes for university management to reallocate to departments. Though the RAE is about to be revised, to be made more metric-based, the current method of allocating funds under RAE costs £ 14 million for an amount allocated of about £ 1.4 billion. The second support system is through the former Department of Trade and Industry (now called BERR) and disperses a similar amount through the seven Research Councils (RCUK). However, the review mechanism used by the RCUKs, in their dispersion of £ 1.5 billion per annum, is £ 196 million. This implies a transaction cost of almost 20 % for the Research Councils compared with less than 1 % for the Funding Council. The different costs for disbursement are attributable to the different processes adopted for evaluating research projects, with the research councils supporting a more intricate peer review mechanism. At the time of the change in leadership in government in the UK in the Summer 2007, the two funding sources for research were brought together under a single new department, the Department of Innovation, Universities and Skills, which will maintain the current dual support system and the assessment methods they both adopt.

12.9 Research assessment Both assessment methods are being looked at and could see some change over the next few years. The issue is that they both incur costs in the process of making assessments and whether these costs are justifiable. 12.9.1 A Dangerous Economy (RCUK) “A Dangerous Economy” is the title of a booklet issued in early 2007 by Dr Bahram Bekhradnia, director of the Higher Education Policy Institute in Oxford, which as-

Research assessment

205

sessed the implications of proposed reforms to the UK Research Council’s (RCUK) peer review system. At stake is the system for evaluating research proposals submitted to the RC’s by individuals for financial support. Assessments are made on a project-by-project, individual-by-individual basis. As suggested above, the seven research councils in the UK (RCUKs), have a much higher assessment cost than the panels used in the higher education funding council (HEFC’s) research assessment exercise. Faced with the huge differential in costs between the two systems, the RCUK has come up with a set of proposals to reduce the costs of their review system by controlling the number of applications. These include: • • • •

Allocating a larger proportion of funds to multi-project awards Setting quotas for applications from each institution Banning resubmissions Recommending greater submission of outline proposals

Most of the above involve cost cutting, getting the costs for assessing each individual project submission down to something nearer the RAE costs, rather than making their evaluation system ‘distinctive’. This is where the ‘dangerous economy’ comes in. According to HEFI the two systems will cause the role of the Funding Councils and Research Councils to converge to the extent that dual support will exist in form only. The other consequence is that the research agendas in the future will be driven by ‘research stars’ who lead the five star celebrated research groups – a small sector of the 40,000 research active staff in UK academia. To remain distinctive and to support the continuation of the dual support system it behoves the research councils to direct a portion of public funding for research at public priorities (as opposed to the research priorities set by the researchers themselves or their institutions). This places emphasis on the ability of the RCUKs to anticipate which fields of research will prove most productive – to ‘cherry pick’. The HEPI does not suggest abolition of RCUKs but it does claim that it should seek a distinctive role rather than focusing just on cost cutting. There is still a need to ensure that the overall level of support for researcher-driven research across both sides of dual support does not fall. 12.9.2 The Death of Peer Review (RAE) Equally evocative was the title carried in an issue of EducationGuardian.co.uk which commented on the planned overhaul of the Research Assessment Exercise(RAE) itself. In a pre-Budget announcement made by Gordon Brown in early December 2006 he placed ‘research’ at the top of his list of priorities for helping the UK compete with the burgeoning economies of India and China. There were three specific announcements: 1. Gordon Brown has decided to scrap the old system of assessment review as conducted under the RAE. From 2010–11, science, engineering, technology and medicine subjects will be assessed using statistical indicators. These will include: ◦ the number of postgraduate students in a department ◦ the impact of papers

206 Funding Research as a Driver for Change

◦ the amount of money a department brings in through its research The aim is to reduce bureaucracy in this area as well as the assessment costs for RCs (see above). However, some agencies feel it is a bad move. The Royal Society, for instance, claims there is no substitute for RAE for expert judgement in assessing research. Universities UK took a much more relaxed attitude to this, stressing that bureaucracy would be minimised. 2. There was also an announcement in the pre-Budget that a new £ 60 million fund would be made available for applied research. 3. An independent review had recommended that the research budgets of the National Health Service (NHS) and the Medical Research Council (MRC) be brought together. The aim is to ensure more of the basic research discoveries coming out of the MRC are fed into useful applications for the NHS.

Brown reflected these announcements in March 2007 budget statement. In that he confirmed that 2008 would be last RAE using the traditional ‘gargantuan’ system of research assessment, and a metrics-based quality related (QR) method would be adopted. It will, however, take some time to sort out the detailed implications of the above. One thing is clear – the mechanism of research assessment will never be the same again in the UK, and this has an impact on the output of research information/publishing. 12.9.3 The 2008 RAE The RAE is purely a mechanism for allocating the £ 1.4 billion in research funds per annum (and growing) and should not be confused with having any other aims, such as introducing quality within teaching, for example. In many respects the RAE has been a success – in the five previous RAEs there has been an increase in the UK’s share of citations in the ISI Journal Citation Reports from 5 % in 1989 to nearly 12 % in 2001. In addition, the RAE has been a British export (to New Zealand, Australia, Hong Kong, etc). The basis for RAE is panel evaluation (there are nearly 70 of them, subject-based, and a further 15 overarching). However it can be claimed that the RAE has distorted the supply of published material according to assumptions about what the panels want. This has meant that journal articles have been given greater prominence in the past and the authorship of books in particular (notably textbooks) has suffered. Applied research has also been marginalised as universities seek to enhance their basic research standings (and funding) through the institution’s block grants. What will be different in 2008 is that articles (in journals) will no longer necessarily hold centre stage, particularly for the hard and life sciences. It was mentioned by the chairman of one of the panels (Sir John Beringer from Bristol University) that digital objects such as patents, plant life, or similar objects would also be included in the assessment. That is, not just the main four articles need to be submitted – other relevant digital outputs could be included as proof of research achievement and excellence. This could impact on the need to ensure that multimedia content is produced in support of Science.

Other funding agencies 207

12.10 Other funding agencies 12.10.1 JISC in UK In 2006, JISC committed £ 15 million towards Digital Repositories and Preservation activity, to support the UK educational community in realising the benefits of digital repositories. As part of this work, JISC has now established JISC RepositoryNet which brings together a number of activities that have been funded by JISC including: • Repositories Support Project. • The Depot, a central repository for UK researchers who do not have access to a

local repository. • Intute: a repository search system • A Repositories Research Team.

JISC is also funding a number of universities and colleges to set up or further develop repositories for their research and learning assets. The aim of JISC RepositoryNet is to form an interoperable network of repositories. It will do this by providing UK universities and colleges with access to trusted and expert information about repositories and by supporting some key services that form the building blocks for a network of repositories. 12.10.2 MPS and DFG in Germany The Max Planck Society has indicated that with an annual Euros 1.3 billion budget, funding 80 institutes in Germany with 4,300 scientists and 10,900 doctoral candidates, the MPS took serious interest in the paradigms that were emerging. There is strong interest in the serials crisis, in the preprint culture, the wide range of openess movements and failing quality control. The emerging forms of publication being monitored include new forms of peer review, inclusion of access to primary data, providing infrastructure for research and publications and producing “living reviews”. MPS is looking beyond text, text and data to virtual information systems. MPS has also approved work to be done on a Digital Library concept that would affect the current Euros 9 million being spent on MPS’s 72 specialised libraries. In moving the paradigm forward MPS is aware that there is potential confrontation with publishers on such issues as open access, new business models, new types of document delivery, but looks to collaborate in achieving a paradigm shift in science. The Deutsche Forschungsgesellschaft (DFG) has also outlined the expenditures the DFG is making in various information activities. The DFG Committee on Academic Libraries and Information Services involves an expenditure of Euros 34.1 million, of which Euros 13.9 million was on traditional library services. Euros 9.5 million on cultural heritage and Euros 5.13 million was spent in 2005 on new forms and methods of publication. The DigiZeitschriften project, analogous to JSTOR and covering 15 disciplines, is also being developed by DFG, as are a number of subject specific virtual libraries. However, the DFG can only be involved in creating the infrastructure – the universities themselves need to maintain it.

208 Funding Research as a Driver for Change

12.10.3 Charities In the UK another major funding source for R&D is from the charity sector, and a leader in this field is the Wellcome Trust. With over £ 300 million invested in biomedical research every year, any conditions that it chooses to impose on grant recipients and their publication of output could be important. The Wellcome Trust has taken a strong position on the business model which is applied to the publication of research results from grants. Not only on those grant recipients who are totally dependent on Wellcome funding for all their research work, but also for grant recipients who get only part of their money from the Trust. Stunned by the inability of some of their research workers to get free access to research reports that the Wellcome Trust itself had funded, the governing body at Wellcome has thrown its support behind the open access agenda. It has mandated the published results of projects that they fund or co-fund be made available in a freely accessible digital repository within six months of publication. In order to make the choice of repository easier, Wellcome was instrumental in establishing the UK PubMed Central service. This has become the subject repository for biomedical research publications that it and related research councils (MRC) support. This reflects the power that funding agencies can have – if they have a mind to it, they can mandate the way research reports are accessed and in Wellcome’s case can also influence the mechanism into which the reports can be deposited.

12.11 Summary As many studies have indicated, there is a strong correlation between R&D growth. Number of research staff and published research output. As such careful attention needs to be given to the collective mindset of funding agencies and what agendas they will operate with with respect to some of the other drivers which are described in subsequent chapters. It is fairly clear that most of the main funding agencies are aligning themselves behind an open access business model (see earlier) and this has an impact on EP. But whilst open access may be the flavour of the moment, the same research agencies may also be key drivers behind other developments. So far data sets, e-Science, social collaboration, Web 2.0, semantic web – these all may be their focus of future attention. Their priorities may well change, and infrastructural developments could take precedence over short-term business models in their funding.

Technological Drivers

Chapter 13

Efficiency Improvements as a Driver for Change

13.1 Industry Collaboration to achieve improved efficiency in EP 13.1.1 Trade Associations The role of the publisher and library trade associations play a significant role in easing the transition by its members from the print to electronic publishing world. Publisher Trade Associations The publishing industry has several active associations which represent their interests to the outside world. Their success has been in lobbying, frequently behind the scenes, to convince policy makers that those who propose changes to the industry structure are usually impractical in their claims. Publishers are not idealists – they are pragmatic. This pragmatism has its roots in the fact that in leading western countries the publishing industry contributes significant taxes to the government coffers and also provides considerable direct and indirect employment. The idealists have no such concrete arguments on which to fall back. The key trade associations which represent the publishers’ interests include: • The International STM Publishers Association, with a team of fulltime staff, mainly

headquartered in Oxford, England. It is a global professional association with all the largest scientific, technical and medical commercial publishers involved as well as some of the larger learned society publishers. Size is determined by the subscription fee which precludes many from the ‘long tail’ of scholarly journal publishing becoming sustainable members. The current chief executive is Michael Mabe, formerly with Elsevier Science, and is a major contributor to the understanding of the historical development of the stm industry and the need for evidence-based research before making extravagant claims about alternative business models and new technological solutions. It is very proactive in protecting the intellectual property rights over its members’ publications. • ALPSP – Association of Learned, Professional and Society Publishers. This has for many years been a trade association which looked after the interests of the small, learned publishers primarily in the UK. In recent years it has extended its brief to include an international target group of learned society publishers, and has local partnerships and affiliations in the United States and Australia. It also has

212 Efficiency Improvements as a Driver for Change

a fulltime secretariat, with Ian Russell (formerly with the Royal Society) as its current chief executive. • The Publishers Association (UK). This association, through its SPE directorate, represents the interests of UK scholarly book and journal publishers and is led by its director, Graham Taylor. • The American Association of Publishers also has a division specialising in serving the interests of Professional and Scholarly Publishers (AAP-PSP) with its permanent representative being Barbara Meredith. • Society of Scholarly Publishers. This is a US-based association of scholarly publishers which took its lead from ALPSP in the UK. It now has an active programme of support for publishers which is organised by a largely volunteer group. There are no fulltime staff. There are additional trade associations specialising in particular sectors, such as university presses, medical publishers, and other national memberships. Whilst these are individually and in aggregate powerful defenders of the current publishing business models and practices, in many respects they have been eclipsed by the researcher, library and policy makers in their advocacy. There has been much more of a strident voice coming from these other agencies, a voice which has reached a broader audience with their message of being the promoters of a new, morally-higher ground, information paradigm. Besides their behind-the-scenes lobbying activities, they all provide some training and education programmes to help their members migrate from print through hybrid to electronic publishing. As such they are important mechanisms for improving the efficiency of the scholarly publishing sector. Library Trade Associations These are also seeking to make a case, and the case in this instance is usually to support an open access mandate. The serials crisis, and the current concerns about the Big Deal and the domination this gives to a few large commercial and society publishers, is a major stimulus for them. • Association of Research Libraries. This represents the just over 100 largest research

libraries in North America with a huge combined spending power. They have instigated a number of research projects to get things changed for the better, one of which is: • SPARC. This US-based organisation has a strong agenda in support of open access. It is led by a former director of BioOne a consortium of small biomedical publishers. SPARC now has a UK-base as well which serves the European research library sector and also has a fulltime staff. • American Library Association. This is the leading professional association for librarians in the United States and as such has a broad remit. There are sectoral associations in the US as well, such as the Medical Library Association, the Special Library Association, the Canadian Library Association, etc. • The UK Serial Group and the Charleston Group (the latter with a spin-off known as the Fiesole conference) are also active players in effecting change but in these cases there is more interaction with other stakeholders, including publishers, which reduces the extent of polarisation in approach to key and potentially sensitive topics.

Industry Collaboration to achieve improved efficiency in EP 213

As with the publishers, the library associations assist their members to adapt to the changing world and do so through briefings, training programmes, education etc. They also have effective advocacy often coming from the members themselves. 13.1.2 Research Information Network (RIN) The Research Information Network (RIN) was set up in the UK as a result of a Follett Review into libraries in the late 1990’s. RIN began operations in 2005 and is supported by a consortium of UK sponsors: the four Higher Education funding bodies, the three National Libraries, and the seven Research Councils. RIN’s mission is: “To lead and co-ordinate new developments in the collaborative provision of research information for the benefit of researchers in the UK” The key role of the RIN is to give the strategic leadership required to establish a national framework for research information provision, and to generate effective and sustainable arrangements for meeting the information needs of the professional research community. It covers all disciplines and subjects. Consequently, the RIN addresses the different and distinct information requirements of the various research communities, not only in higher education, but in a range of other sectors. The RIN is also takes into account the needs of other users, such as teachers and distance learners. RIN’s remit also covers a wide range of information sources, providers and types. It includes published books and serials; manuscripts; museum collections; grey literature; sounds and images; and datasets produced and held in a wide range of formats. Such resources may be digital or not. The hybrid world of non-digital and digital worlds acts as the backcloth to their activities. RIN Is suported by two groups – the Funders’ Group, comprising representatives of the RIN’s sponsors, anf the Advisory Board, which consists of sixteen members from various sectors of both the research community and the library and information community. Its role is to advise on and review the development of the RIN’s strategy and workplans, within an overall framework of objectives and budgets set by the Funders’ Group. In addition, The RIN has established four consultative groups to provide vital bridges between different sectors of the research community on the one hand and the library and information community on the other. According to the RIN web site, the main projects completed and in the course of being studies include: * Researchers use and perception of library services. Jointly with Consortium of Research Libraries – CURL – the project incorporates detailed survey work with both researchers and librarians. * Research Funders’ policy and practice on management of research outputs * Discovery services: user behaviour, perceptions and needs. * Data concerning scholarly journal publishing. This study was jointly organised with Research Councils UK and the Department of Trade and Industry. It analysed information about scholarly journal publishing, assessing the data available about the process and the reliability of that data.

214 Efficiency Improvements as a Driver for Change

* National framework for collaborative collection management. This laid foundations for the development of and building on the achievements of the CoFoR (Collaboration for Research) initiative. * Usage and impact of e-journals in the UK. RIN has commissioned a study, based on deep log analyses. By relating to usage data, the project will tap into actual researcher behaviour, and thereby provide some evidence in an area where – as identified in the RIN’s earlier analysis of scholarly journal publishing – there is currently a significant paucity of evidence. * Economics of all stages in the lifecycle of the scholarly communications process in the UK. A review of the availability, scope and quality of finding aids to enable researchers to discover information about collections of physical objects and artefacts of relevance to their research. * Data publication. This is an investigation of the range of arrangements for making research data as widely available as possible and the role that data outputs currently play alongside or as an alternative to conventional publications in the research communication process. Associated with this, the project will investigate current practice for ensuring the quality of such data. * Coverage of online catalogues. Identification of the priorities for researchers in cataloguing library holdings that are as yet uncatalogued, and in converting manual catalogues to digital form accessible over the internet (retroconversion). * Impact of training in research information methodologies and tools Reviewing the extent, quality and provided for academic researchers. The above list of past and ongoing projects funded by RIN demonstrates the crosssectoral nature of the organisation’s activities. 13.1.3 Publishers Research Consortium This collaborative network of publishers and learned societies has come together through the auspices of the ALPSP, PA and STM publisher trade associations to share the results of research using evidence as a basis for establishing actionable and well-supported results. There are also corresponding partners to PRC, including AAUP, AAP/PSP, which for legal or regulatory reasons could not be full members. The group was formed in the Autumn of 2005. The key is using unbiased data and objective analyses to promote a better understanding in society of the role of publishing. There have been five main projects thus far: • AAP/PSP Statistics Programme. This attempts to define baseline statistics in

publishing. With the buy-in from PA and STM there are now 45 publishers participating in this statistical collection. 65 % of the CrossRef journals are included in this group, and 43 % of the ISI database of journals. • NIH Authors Postings Study. 1,100 responses were received as a result of an email to biomedical authors participating in the NIH public access policy. Though there was a high awareness of NIH, there was a low understanding of the open access policy. In general, past attempts to communicate open access by the NIH have not been effective. It was concluded that education and advocacy, rather than policy, is the key issue.

Industry Collaboration to achieve improved efficiency in EP 215

• Journals and Research Productivity. This study tried to identify the barriers that

stood in the way of undertaking research in Immunology and Microbiology. The biggest issue by far was ‘funding’. This was followed by administrative issues. Gaining access to journals represented a low barrier. In fact high value was attributable to the role of the journal. • Impact of Author Self-Archiving. Responses were received from librarians to test out whether self-archiving could have an impact on cancellations of journals subscriptions. Inevitably the results, which indicated that such an effect could occur, have been disputed in some library quarters. • The Peer Review system. See above for the results of Mark Ware’s summary of how the global scientific author community view the existing and alternative peer review mechanism. The net effect of the trends reported above is that a volatile and changing structure of publications exists that is aimed at meeting the changing needs of users. Packaging information in a digital world has become a sophisticated technique, one that is increasingly focusing on end users’ real needs. This is in contrast to the old days when the format of the publication was set – books or journals – and the publishers’ role was that of identifying new titles rather than exploring new physical containers. This has changed and the publishers that are adapting better to the challenges of the new information system are those that are experimenting with new ways of packaging and delivering research results. 13.1.4 Publishing Cooperatives One of the originators of the debate about ‘open access’ was a consultant advising SPARC, the US-based library consortium. Raym Crow produced a thought piece which some claim initiated the change in the journal publishing business model. Raym Crow has returned to the scene, having written another report for SPARC summarised in a recent issue of FirstMonday about the need for Publishing Cooperatives to support the smaller not-for-profit publishers. This item has also generated a flurry of discussion. According to Crow, publishing cooperatives operating in support of not-forprofit publishers would provide an organisational and financial structure which would assist in balancing society publishers’ twin objectives of maintaining financial sustainability whilst not compromising their main society missions. “Publishing cooperatives would allow society publishers to remain independent while operating collectively to overcome both structural and strategic disadvantages and to address the inefficiencies in the market for academic journals”. Pursuit of a profit–maximising strategy can result in pricing and market practices that compromise the society’s mission by limiting its ability to disseminate research broadly in its field. At the same time, competitive market pressures require society publishers to operate efficiently to ensure financial sustainability. The market power wielded by large commercial publishers, combined with the structural limitations of non–profit organisations, hinders non–profit publisher attempts to sustain their journal publishing programs. The collective power of cooperatives could help non–profit publishers counter these market constraints and imbalances. The benefits which a cooperative might provide include production cost reductions, access to business management services and resources, increased

216 Efficiency Improvements as a Driver for Change

market presence and greater access to markets, risk sharing and mitigation, and alignment with the society’s mission and non–profit ethos. The role of organisations such as ALPSP, J-STAGE, BioOne, Project Euclid, SSP and SPARC are somewhat underplayed in the detailed expose by Crow. They all provide elements of support services for smaller not-for-profit publishers, as does HighWire Press (Stanford University Libraries) to some extent. In subsequent debate on the listservs, Joseph Esposito (US based consultant) suggests that an alternative approach would be for the major research universities to make substantial commitments to their university presses, which would aggregate large numbers of society journals, yielding the efficiencies Raym Crow outlines in his report, even as they continue with their mission-based programmes. This would be good for the professional societies, good for the universities, and good for the academic libraries. This may be another example of the sort of business model that the new technologies and market trends may sustain. Esposito goes on to point out “Wouldn’t it be a great thing if Harvard decided to put its balance sheet to work?” The smaller publisher, unable to fund these investments, can follow two routes. The first is to club together with other like-minded small publishers in an independent service which offers them the technical expertise and infrastructure which they can ill-afford on their own. Services such as HighWire Press, Atypon and Ingenta (now part of Publishing Technology Ltd) have come into being to provide such assistance. Publishers do not own these organisations but rather they act as a publishing support service. These support services compete between themselves on the functionality that they offer to small publisher clients and on the price which they charge for providing an online delivery mechanism for the publisher’s products. With the exception of HighWire Press, an outgrowth from Stanford University Libraries, the other main service companies in this area face a particular problem. When a publisher is ‘small’ it needs the help and technical support provided by Ingenta, Atypon, etc. However, once it has achieved a critical size it has the potential to trade its dependency on the technical service, with which it shares its brand to some extent, for full independence. At this point the publisher has reached the size or scale where it can create and maintain its own online service. As the technical costs continue to fall relative to costs of production, the option of setting up its own independent file server becomes ever more attractive. Once this level of selfsufficiency has been achieved the need for a third party technical support agency falls. As a case in point, Ingenta was established in 1999 to provide such support services for publishers and its initial client base consisted of the main publishers including Blackwell Scientific, Taylor and Francis, etc. Over the years the client base has changed from the large publishers, each of which has created its own proprietary service, to the small publishers. This has produced a commercial conundrum. Whilst the few large publishers were able to divert significant resources into their own online activities managed by a third party such as Ingenta, the smaller publishers have smaller discretionary budgets for IT support. Even in aggregate the numbers do not add up as each individual small publisher, with few titles, involves more work (both in selling and supporting) than a few large publishers with many titles. Increasingly Ingenta was in the business of scraping the barrel for ever smaller and less profitable customers. It now has a large number

Changes in Format 217

of publishers using its services but most of them are careful with their budgets. It is these publishers who are most affected by the squeeze in the market, it is these small publishers who represent an increasingly difficult market sector to sustain. Meanwhile, several of these smaller publishers are being integrated into the large publisher sites, losing some of their commercial independence but gaining an immediate financial payback and more guarantees of sustainability for their titles. Despite and because of some of the major challenges which face publishers and librarians there is an ongoing process to adapt to the electronic publishing from internal, in-house developments. One of the most significant is the changes which are being made to the format and the means of navigation which are enablers for some of the changes which are stimulated from external market conditions and trends.

13.2 Changes in Format In the print-only publication system the printing systems largely determined the presentation of the content, and trends in recent decades was from letterpress/hotpress to coldpress to camera ready copy. Cost reduction was high on the agenda in determining the method of printing adopted, with the less expensive method being increasingly adopted, where appropriate. In a digital world there are different drivers. Increased functionality is becoming more important as the new functions which are being demanded of articles, journals and books. This is adding an additional cost element as functionality trumps finances in the new electronic publishing scenario. One major development was the introduction of portable document format (PDF) which provided a picture of the printed page but disseminated in electronic mode. The Portable Document Format (PDF) was the file format created by Adobe Systems in 1993 for document exchange. PDF is fixed-layout document format used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system. Each PDF file encapsulates a complete description of a two-dimensial document (and, with Acrobat 3-D, embedded, 3-D documents) that includes the text, fonts, images, and 2-D vector graphics that compose the document. PDF is an open standard moving towards ISO 32000. However there has been criticism of the reliance on PDF as the electronic publication system. It lacks the functionality which allows for detailed interrogation of the content, of being able to link between digital objects, etc. Some EP pundits have claimed PDF is a ‘disaster’ as an enabler for effective scholarly communication (Peter Murray-Rust, Cambridge University). Markup languages are crucial to providing added functionality. 13.2.1 Markup Languages A markup language provides a way to combine a text and extra information about it. The extra information, including structure, layout, or other information, is expressed using markup, which is typically intermingled with the primary text. The best-known markup language in modern use is HTML (HyperText Markup Lan-

218 Efficiency Improvements as a Driver for Change

guage), one of the foundations of the World Wide Web. Originally markup was used in the publishing industry in the communication of printed work between authors, editors, and printers. One of the main focus of attention is the move towards to XML. The Extensible Markup Language (XML) is a general-purpose markup language. It is classified as an extensible language because it allows its users to define their own elements. Its primary purpose is to facilitate the sharing of structureddata across different information systems, particularly via the Internet. It is used both to encode documents and to serialise data. In the latter context, it is comparable with other text-based serialisation languages such as JSON and YAML It started as a simplified subset of the Standard Generalised Markup Language (SGML), and is designed to be relatively human-legible. By adding semantic constraints, application languages can be implemented in XML. These include XHTML, RSS, MathML, GraphML, Scalable Vector Graphics, MusicXML, and thousands of others. Moreover, XML is sometimes used as the specification language for such application languages. XML is recommended by the World Wide Web Consortium. The W3C recommendation specifies both the lexical grammar, and the requirements for parsing. It is a fee-free open standard. 13.2.2 Metadata Metadata is data about data. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items. Metadata is used to facilitate the understanding, use and management of data. The metadata required for effective data management varies with the type of data and context of use. In a library, where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the author, the publication date and the physical location. In the context of an information system, where the data is the content of the computer files, metadata about an individual data item would typically include the name of the field and its length. Metadata about a collection of data items, a computer file, might typically include the name of the file, the type of file and the name of the data administrator. Metadata is the key to unlocking the content within electronic publishing systems. The more descriptive and accurate the metadata the more effective access will be to the content. But such metadata creation can be expensive, in many cases manually-based. This has limited the widespread application of high quality metadata. It requires resources which is often unavailable to small publishers and content providers of other digital objects. This raises questions about whether the traditional ‘cottage industry’ approach which has been typical of publishing in earlier decades can be migrated over to the evolving electronic publishing world. Is the structure of publishing appropriate for the new challenges?

Structural Efficiencies 219

13.3 Structural Efficiencies 13.3.1 Mergers and Acquisitions Consolidation in STM publishing took a significant step forward with the announcement of Wiley’s acquisition in 2006/7 of Blackwell Publishing for £ 572 m. Blackwell, privately owned, has been a target of takeovers by rivals for a number of years. In 2002 Taylor & Francis made an informal offer of £ 300 m. But it is another business with strong family involvement that has now succeeded (US-based John Wiley and Sons) – at a considerably increased price. The price paid is around 2.7 times Blackwell Publishing’s 2005 revenues of £ 210 m and 16 times its operating profits Blackwell Publishing had sales of £ 210 million in 2005. Its revenues roughly equate to that of Wiley’s STM business. The company publishes about 600 books a year and has a backlist of about 6,000 titles. Wiley-Blackwell will have a stable of some 1,250 journals (Blackwell’s 825 and Wiley’s 425). Blackwell Publishing’s 825 journals are fairly evenly distributed between STM and the social sciences and humanities. A thousand staff are employed worldwide, with operations in Europe, the U.S., Australia and Asia. Blackwell Publishing was formed in 2000 as a result of the merger of Blackwell Science (founded in 1939) and Blackwell Publishers (founded in 1922). Wiley has publishing, marketing and distribution centres in the U.S., Canada, Europe, Asia and Australia and has a staff of 3,600. Its European operations are based in Chichester, in the U.K. Wiley’s total sales in 2005 were $ 974 million (around £ 504 million). How does this compare with other S+T publishers? Reed Elsevier’s total revenues in 2005 were £ 5.1 billion, with revenues in Elsevier, its STM division, at $ 2.8 billion (£ 1.43 billion). Springer’s approximate consolidated sales in 2005 were € 838 m (£ 567 m), whilst Taylor & Francis reported revenues of £ 729 m for 2005. (However, differing business segments in the companies mean that these are not direct comparisons of scholarly and research publishing sales). According to EPS’ MarketMonitor for STM Information (June 2006) Elsevier dominated the market with 24.6 % share of STM industry revenues. Springer has 7.4 %. Wiley and Blackwell together would also have a market share of 7.4 %. Wolters Kluwer, no longer involved in STM publishing, had revenues in 2005 of £ 2.28 billion, whilst Thomson, predominantly offering databases, had revenues of £ 1.74 billion. Elsevier has around 2,000 journal titles; Springer’s website indicates 1,450 and Taylor & Francis has publishes around 1,100. According to investor reports, Informa has a market value of some £ 2.35 billion. Informa, which bought Taylor & Francis Plc in 2004, publishes more than 2,000 titles and organizes more than 10,000 conferences annually around the world. Its firsthalf sales more than doubled to £ 533 million after it bought conference organiser IIR Plc. Springer Science & Business Media, was spun off from Bertelsmann AG’s book unit under the name BertelsmannSpringer. It was sold in 2003 to Candover and Cinven, which merged it with Kluwer Academic Publishers the following year. With 5,000 employees worldwide it publishes more than 1,400 journals and 5,000 new book titles a year, according to its Web site. Colin Steele, Emeritus Fellow at the Australian National University and former Chief Librarian at ANU, stated on one of the listservs with reference to the Blackwell aquisition by Wiley: “This is yet more evidence of the increasing domination of,

220 Efficiency Improvements as a Driver for Change

particularly STM, scholarly publishing by a small number of international publishers . . . the trends identified will undoubtedly continue, arguably to the budgetary detriment and accessibility of content of smaller publishers, learned societies and regional publishers in the social sciences and humanities. There is clearly a need to continue to debate and adopt more strategic and holistic approaches to scholarly communication frameworks”. 13.3.2 Economies of Scale Assuming finite library budgets (see Tragedy of the Commons earlier), success for the large publishers in increasing their revenue base means draining institutional budgets of funds from other library-operated services. Although the library will be getting much more through these big deals, it is not necessarily for less – in fact there is usually a percentage increase in price to get the full publisher’s e-journal list. And whilst this may be attractive (much more material at a lower per unit cost) it denudes the library budget and prevents other spending on print journals from smaller publishers and books that are not part of such Big Deals. Particularly as library budgets have not been increasing in line with output. This library budget squeeze is to the detriment of small publishers that are not part of the licensing process (thereby extending industry polarisation). In fact the weak link in the publishing infrastructure is now being claimed to be the learned society publisher. Though they have a mandate to provide a range of services, loss of real contact with their members, and hiving off their publication programme to commercial publishers to achieve market reach, is leaving them exposed. As the market size – determined essentially by the budget constraints facing the research library budgets – is static or in many cases in relative decline, there are few opportunities for all publishers, but particularly the small, specialised publishers, to grow. The large are getting larger at the expense of the small. The indications are that there are economies of scale in an electronic publishing environment. Following on from the above, debate on the industry listservs over the issue of whether the current market conditions – static overall market with the larger publishers squeezing the small learned society and university presses out of existence – is a valid assumption. Initially hypothesised by Joseph Esposito, an independent consultant based in the US, a number of critics have surfaced to question this view. Chris Armbruster (MPG, Germany) has suggested that the theory that publishing is a no-growth market is based on the now faulty assumption that the Oldenburg model of combining peer review with dissemination in a final archival publication may be transposed to the Internet era. According to Armbruster this kind of publishing may indeed be a no-growth market and he goes even further by saying that in the medium term it has no future at all. Armbruster’s argument is that digital technology and economics favour the severance of certification from dissemination. In that scenario, the functions of registration, dissemination and archiving will lie with (digital) libraries-cum-repositories, whereas certification and new kinds of value-adding navigation services will be a growth market for publishers among others. He therefore suggests that we will witness the renaissance of society publishers and the return of the library for scientific and scholarly publishing. In this construct he is more afraid for the big commercial publishers that, because of their size and inflexibility, might find themselves in a big squeeze quite soon.

Structural Efficiencies 221

Whilst this is an interesting construct, the evidence points in another direction. That small publishers are fleeing from independence to shelter behind the power, reach and financial muscle of the larger publishers. The recent indications are also that the few large publishers are getting larger, with growth rates in excess of the market norm, which leaves the smaller independent, specialised publishers seeking ever more viable niches outside the radar of the market leaders. As indicated earlier, in 2005 the top five publishers generated 53 % of industry revenues, compared with 30 % for the next 15 largest suppliers. Furthermore, the publicly traded STM publishers generated $ 5.4 billion in revenues in 2005 with an 8.6 % growth. This is greater than the overall market expansion which suggests that some of the other stakeholders are finding trading conditions difficult. Further indications of market consolidation can be found from inspecting the financial reports of the main publishers during the first half of 2007. The big three publishers (Reed Elsevier, Wolters Kluwer and Thomson) have all exited, or in the process of exiting, the educational publishing market. Each has now divested their education assets and are in a strategically interesting position. Each has indicated that resources are now available with which to invest in their areas of perceived strength and growth. It appears that acquisitions are a possibility in the foreseeable future but they will be selective in their investment strategy. Thomson has its hands full with the proposed acquisition of Reuters in 2007/8, which at present is to be reviewed by the U.S. Department of Justice, the European Commission and the Canadian Competition Bureau. However the former CEO, Richard Harrington, was “confident that the transaction will be approved”, but in the meantime any further major acquisition activity seems unlikely. However, at the same time as it released its recent results Wolters Kluwer announced the acquisition of 55 % of the International Centre for Financial and Economic Development (IFCED), a Russian publisher operating in the tax, accountancy and human resources sectors with annual revenues of Euro 50 million. WK has the right to purchase the remaining 45 % at a later date, and plans to integrate the business with its existing Russian operations. Reed Elsevier’s acquisitive intentions have also been highlighted, with Sir Crispin Davis, the company’s chairman, reportedly considering acquisitions larger than the recent purchases of BuyerZone and Emedia, both of which were less than $ 100 million. 13.3.3 Why is market consolidation taking place in scholarly publishing? Until the early 1990’s scholarly journal publishing was seen as something of a cottage industry. There were thousands of small journal publishers competing with the few large commercial publishers on a fairly level playing field. Brand recognition and quality of editorial boards within the specialised audience being targeted counted for as much as financial muscle. Print-based publishing conferred few economies of scale, or if they did, it was economies which did not prevent other competitors co-existing in the market place without serious lack of competitiveness. The onset of electronic publishing has changed that, and there are substantive economies arising from the investment in sophisticated production systems and IT that allows electronic products to be accessed and disseminated online. These investments are beyond the reach of the average publisher or those who are part of

222 Efficiency Improvements as a Driver for Change

the ‘long tail’. Smaller players were therefore absorbed into the larger publishers, or grouped together with other small publishers in mutual defence, or agreed to have their publications managed by the large publishers while retaining their brand. In any event the smaller publishers faced some difficult questions. Significant among these publishers were the specialised learned societies. These were often dependent on one or a small number of journals to provide them with revenues. They were constrained by the subject discipline in which they operated – unlike the large commercial publishers they could not forage in the wider world of Science for more lucrative, expanding areas in which to publish. 13.3.4 Why are the larger publishers able to succeed where small publishers find it difficult? Part of the reason goes back to a commercial strategy adopted by the larger publishers in parallel with, but independent of, the migration to online delivery. As was commented earlier, the library market has faced budget constrictions, and is unable to keep pace with the growth of scholarly research output. In the 1990’s many libraries had to undertake swingeing cancellation exercises – to cut their book programmes to the bone, and to cancel journal subscriptions whose prices were rising out of synch with their budget growths. However, this has occurred where new cost factors are being introduced in order to make publications suitable for electronic publishing systems. Cost of formatting to XML levels, and providing quality metadata, are new costs which are being borne by content providers at a time of library budgetary restrictions. During the latter years of the 1990’s a few publishers, led by Academic Press at the time, introduced the concept of the Big Deal. Overnight the number of titles the library would have in its collection would soar – its budget would cover a greater number of titles, and the actual average cost per title acquired would fall. It would also absorb much of the document delivery and interlibrary loan budget. The Big Deals took off. They rapidly became a feature of the sales tactics of those large publishers, with enough titles in their programmes, who could justify bundling their journals in this way. The librarians were happy as they were getting more bang for their buck. Combining the three separate features, (a) The growth of the large publishers establishing their own file servers (b) Increasing costs of producing EP formatted publications (c) The establishment of Big Deals for libraries meant that the large publishers achieved dominance over smaller publishers. Even with small publishers seeking security by, (a) Including their titles within an aggregated online support service (Ingenta, Atypon) (b) Including their titles within consolidated Big Deal services such as being offered by ALPSP (the publisher trade association primarily representing small learned and professional societies) was insufficient to afford them protection from the larger publishers.

Standards and Protocols 223

Hence the concerns about the future for small specialised publishers and smaller learned societies who only publish a few titles.

13.4 Standards and Protocols 13.4.1 ONIX for Publisher Licences Whilst pressure is being exerted on the principle of Big Deals, there is also considerable cross-industry cooperation in making the process of licensing electronic products more efficient. This is focused through standards setting organisations such as NISO, and facilitators such as EDitEUR which has ONIX as its output. ONIX is a family of standards for communicating rich metadata about books, serials, and other published media, using common data elements. The ONIX standards include ONIX for Books, ONIX for Serials, and ONIX for Licensing Terms. ONIX for Serials is a group of XML formats for communicating information about serial products and subscription data using the design principles and many of the elements defined in ONIX for Books. Initial work concluded that a standard exchange format for serials data would benefit all parties in the supply chain. Since then, three sets of application messages have been (or are being) defined and piloted with business partners: • Serials Products and Subscriptions (SPS) • Serials Online Holdings (SOH) • Serials Release Notification (SRN)

Each will be supported by a specification, XML schema, and full documentation. All three will share a common glossary and a common file of coded data elements, including permitted code values for each element. ONIX for Serials might be viewed as a growing ‘toolkit’ of individual and composite elements and content definitions for constructing messages for a variety of applications. However, ONIX for Licensing Terms is also being pursued, and the first manifestation of this electronic expression of licensing terms is now available. It was premised on the basis that publishers have a variety of digital products to be licensed, and this can be a long and tedious process when faced with many different libraries each with their own requirements. By the same token, libraries face licences from many publishers each different in some respects, some marginally some significantly. An ONIX for Licensing Terms for Publisher Licences has been produced as a first manifestation. It is not a Digital Rights Management (DRM) system – rather it is a communication of usage terms. It is a mechanism for enabling the publisher’s licence to be merged within the ERM (electronic rights management) system which library automation suppliers provide. One feature of this work is that the language of licences has become ‘English’ rather than wrapped up in legalistic jargon. The licence will also carefully explain what can and cannot be done with the online material. There will be links from relevant parts in the licence to the resource. Editing tools are also in the process of being developed for ONIX LP. Both JISC and Publishers Licensing Society in the UK are co-funding this development. Oxford University Press, Cambridge University Press, Springer and Serials Solutions are all active in implementing ONIX for Publishers Licensing terms.

224 Efficiency Improvements as a Driver for Change

13.4.2 ACAP (Automated Content Access Protocol) This is a similar development in that it is also a cross industry collaboration to establish an open business model which will allow for the expression of permissions and other terms related to a particular digital resource. ACAP was launched on 29 November 2007 as a workable, non-proprietary global permissions tool primarily to facilitate the relationship between content owners and search engines. Devised by publishers in collaboration with search engines after an intensive year-long pilot, ACAP will provide structure to the creation, dissemination, use, and protection of copyright-protected content on the web. ACAP is destined to become the universal permissions protocol on the Internet, an open, non-proprietary standard through which content owners can communicate permissions for access and use to online intermediaries. In the first instance, ACAP provides a framework that will allow any publisher, large or small, to express access and use policies in a language that search engines’ robotic “spiders” can be taught to understand what they can and cannot do with the publishers’ content. During the next phase of ACAP, the scope will be extended to other business relationships and other media types including music and the audiovisual sectors. Technical work is ongoing to improve and finesse ACAP V.1. ACAP arose from a protocol (Robots Exclusive Protocol) that was developed in 1993 when search engines were more primitive and crawling remote sites was still emergent. The early protocol essentially offered a limited range of options when search engines came to crawl – either they were allowed in or were denied access. It was a blunt instrument. ACAP aims to extend the protocol to provide more options, and perhaps more protection, through an agreement on the metadata standard that the search engines would interrogate at ACAP-supported sites. It aims to provide greater technical flexibility – it will enable the providers of all types of content published on the Web to communicate permissions information (relating to access and use of that content) in a form that can be automatically recognised and interpreted, so that business partners can systematically comply with publishers’ policies. There is nothing in ACAP that addresses business models. Scholarly publishers are involved in ACAP through the participation of WileyBlackwell, Macmillan/Holtzbrink and Reed Elsevier, as well as through International Publishers Association (IPA) and Publishers Licensing Society (PLS). But their feeling seems to be that without the agreement and support from the major search engines the ability to set limits on what can and cannot be harvested by the search engines would result in failure for the ACAP concept. With ACAP, publishers will now be able to make more content available to users through the search engines, and to continue to innovate and invest in the development of business models for network publishing without losing control over their investments. With ACAP, the online publishing environment will become as rich and diverse as the offline one. Whilst the ACAP project might have originally been seen as a way publishers, particularly newspaper publishers, were seeking to prevent the large search engines stealing their jewels, in fact ACAP is more than that. It has more affinity with the formalisation of relationships which are part of the ONIX programme. It complements ONIX-PL. They will share a dictionary and adopt compatible data

New Technical Offerings 225

models. The pilot phase will provide a robots.txt where even the layman can enquire in more detail what they can do with the content. The main initial experiments within the ACAP group are focused on web sites and the British Library. In fact not just newspapers, nor locking away content from the robots. It is a communication tool, not a protective device. What will make ACAP succeed is if a critical mass can be achieved. At the moment the three main search engines are observers to the project – with critical mass and industry support they may be willing to buy more fully into ACAP. 13.4.3 Refereeing Nature’s open peer review At the end of December 2006, Nature announced that it would not be continuing with its open peer review trial that ran from June to September 2006. The system allowed authors to choose whether to allow their paper to be posted online; anyone could then comment on these papers, provided that they signed their contribution. Despite considerable efforts to stimulate usage of the open review system by Nature, take-up from both authors and potential reviewers was low. Of the 1,369 papers published by Nature during the period, only 71 authors (5 %) made their papers available for open comment. Of these, 33 received no comments at all, with the remaining 38 receiving a total of 92 comments, of which 49 were about eight papers. All comments were rated by Nature editors on a five-point scale (from ‘actively unhelpful’ to ‘directly influenced publication over and above reviewers’ comments’). None of the comments were considered by Nature editors to warrant the most helpful category (5). As with many features of the scholarly communication process, it may be that there is a distinction to be drawn between what authors/readers like to do themselves, from that which they would like others to do (as exemplified in the Elsevier study by Michael Mabe on What Authors Want, (http://hdl.handle.net/1885/44485) or by the studies on access to data sets where open access is preferred as long as it is another person’s data). Participating authors were surveyed following the trial, with some expressing concern over possible misuse of the results prior to publication (for patent applications, for example). Nature also found that its research suggests that researchers are ‘too busy, and lack sufficient career incentive, to venture onto a venue such as Nature’s website and post public, critical assessments of their peers’ work’.

13.5 New Technical Offerings 13.5.1 Current Awareness and Alerting There are a range of services emerging that enable individuals to be targeted with relevant information to their needs. The vast noise of unwanted publications is filtered out by these new intelligent services. But this is no more than a re-invention, using the power of current search technology, of a concept that existed in the 1970’s and 1980’s. Then it was known as SDI – Selective Dissemination of Information against a profile a research interests maintained by the database owner of content. When new content entered the

226 Efficiency Improvements as a Driver for Change

database and this was matched with an individual’s interest profile the item was despatched to the individual without having to be asked. It was information delivery in anticipation of demand. 13.5.2 Publishing a semantic journal A new generation of journals are being made possible through the adoption of standards and procedures that enable computers as well as humans to ‘read’ the articles. This requires more sophistication than is currently applied to keywords, tags or even identifiers. It demands a much more intense analysis of the text with particular items of information being classified rigorously so that the computer can read them without any misconceptions as to their meanings. A proven ontology, one which is both structured to cope with all possible applications for words and digital objects and yet flexible enough to grow’ with the discipline, is necessary. One example of a semantic journal is project Prospect from the Royal Chemical Society. This approached the structuring of the text in such a powerful way that it earned the ALPSP’s award for Innovative Product of the year in 2007. It meshes vocabularies with ontologies in an HTML environment to allow the journal content to be fully explored by computers. In addition project Prospect generates extensive abstracts and descriptions which are not only searchable by all forms of third party search engines but also permits RSS feeds (Really Simple Syndication) to individuals of relevant material. The information is pushed rather than just pulled from the RSC site. Although this process could be challenged insofar as it amounts to a large inhouse investment in the application of codes to the relevant parts of the articles, and that much of this is then made accessible for free, the business model still stands up. Some of the data that are extracted from the article will give the remote end user a clearer picture of what is in the article, but not necessarily everything. The contextual aspects of the received data feed would still need to be acquired from the publisher. There is potential for such semantic journals to be networked together in a version of CrossRef to provide a rich new search environment. But it is all contingent on sophisticated and consistent markup of the text being applied in a rigorous rather than social collaborative way. This requires more effort for little immediate commercial gain, though the better offering may result in additional pay-per-view sales. A more detailed description of the semantic web and its implications for scholarship in general will be given in later chapter. 13.5.3 Publishing in virtual reality SecondLife is an Internet-based virtual world launched in 2003, developed by Linden Laboratory Research, Inc, in USA, which came into prominence through mainstream news media in late 2006. A downloadable client programme called the Second Life Viewer enables its users, called “Residents”, to interact with each other through motional ‘avatars’, providing an advanced level of a social network service.

New Technical Offerings 227

Residents can explore, meet other Residents, socialise, participate in individual and group activities, create and trade items (virtual property) and services from one another. Second Life is one of several virtual worlds that have been inspired by the cyber literary movement, and particularly by Neal Stephenson’s novel Snow Crash. The stated goal of Linden Lab is to create a world like the Metaverse described by Stephenson, a user-defined world in which people can interact, play, do business, and otherwise communicate. Second Life’s virtual currency is the Linden Dollar (Linden, or L$) and is exchangeable for real world currencies in a marketplace consisting of residents, avators, Linden Lab and real life companies. While Second Life is sometimes referred to as a game, this description does not fit the standard definition. It does not have points, scores, winners or losers, levels, an end-strategy, or most of the other characteristics of games, though it can be thought of as a game on a more basic level because in many instances it is “played for fun”. In all, more than 9.8 million accounts have been registered, although many are inactive, some Residents have multiple accounts, and there are no reliable figures for actual long-term consistent usage. Despite its prominence, Second Life has notable competitors, including There, Active Worlds, and the more “mature” themed Red Light Center. Whilst Second Life emerged from the Silicon Valley culture and has a strong entertainment bias, it and its related services also have potential within the research environment. After all, who else, other than the geeks themselves, are best able to understand and experiment with the rich opportunities afforded by Second Life than the research community, notably the research community with strong IT skills? And so it appears that experimentation is taking place on Second Life by some scholarly publishers. Examples of more professional content in Second Life are an international space flight mission centre, which includes visits to full sized replicas of space shuttles; a schizophrenia house, which emulates how someone suffering from schizophrenia would see the world; and a four-dimensional house and how one would navigate through it. Some people are able to make a commercial return from participating in Second Life by converting accumulated Linden dollars for real world currency. A notable experimenter is the Nature Publishing Group. With comparatively little up-front investment, NPG is using Second Life to experiment with the creation of global virtual conferences and meetings on scientific subjects. One such example is to provide an alternative way to take part in the Bali Global Warming conference being held in late 2007. The carbon footprints saved from using avatar instead of real life people flying long distances have so far not been measured but could be considered. NPG have also bought islands on which services such as M4 (Magical Molecular Model Maker) and Bacterium operate. However, the main focus is on creating virtual reality conferences and meeting areas. These are in the areas of cell biology and chemistry. These meetings or events can extend over a lengthy period, are not tied to one fixed location. They can include multimedia presentations, discussions, questions and answers and discussions with peers and colleagues. Currently there is a technical limit of 50 participants in the NPG services but this is likely to be upgraded in the future.

228 Efficiency Improvements as a Driver for Change

Not that Second Life has achieved technical perfection. In many respects it is ‘clunky’. It is also sparsely inhabited, to the extent that it has been referred to as ‘The Lonely Planet’ in a Wired magazine. It also faces the challenge of creating a sustainable business model. NPG rationalise their involvement with Second Life in terms of visibility and credibility within the social collaborative community. It also foresees sponsorships emerging in due course in support of arranging virtual conferences. Linden Labs is a commercial company and is having a struggle of balancing its need to achieve a return on its investment with remaining in tune with the open access movement. Also at present Second Life can be bewildering to the new participant and may explain why there is not a great deal of regular participation. However, in defence of Second Life and its supporters one could envisage a scenario whereby if one were to consider how best to disseminate research communication in the current environment such services as Second Life might well be considered as much a starting part point as scholarly journals.

13.6 Summary It is tempting to overlook the considerable amount of activity which is taking place within the research information industry when reviewing the major new challenges on the horizon. But there is much being done to bring a traditional industry which has survived for generations in a print paradigm and moving that industry to a new set of conditions appropriate to a hybrid and to an all-electronic world. This cannot be done overnight – many internal restructuring requirements need to be made to enable the transition to be made a painlessly as possible. One could question whether there are enough resources pouring into the process of making the publishing industry more efficient, and whether current activities to effect greater efficiency are fit for purpose. The following chapters explore some of these evolving trends. But in the meantime one cannot say that the industry is lying still and allowing the steamroller of new technological and market conditions to ride right over them.

Chapter 14

Technology as a Driver for Change

14.1 Background Recent years have seen significant changes in information communications technology (ICT), more contextual than some of the changes described in the previous chapter. These also have a profound impact on the way scholarly information is created, formatted and disseminated to the global research community. These changes have occurred as improvements have been made in electronic publishing methods, as powerful networks and the Internet have emerged, and as standards have been developed to underpin the increasingly robust global telematic infrastructure. The combined effects these have had on the information gathering habits of scholars and researchers have varied as outlined in previous chapters. For most STM (scientific, technical and medical) areas the transition from a print to an electronic information system has not been a simple trajectory. Some disciplines have adapted more easily then others and the reasons for this has not always been technology. Nevertheless, technological advances are powerful drivers for change but in the intricate world of electronic publishing relationships it works together with other stimuli to bring about change. It is worth focusing on the new and emerging technology prowess to see if there are any obvious directions into which technology is leading us. But first, as is traditional, it is useful to look at recent history to assess the impacts of technological impact on e-publishing.

14.2 Past impact of Technology Electronic publishing of bibliographic data has been around for decades. Large stores of searchable abstracts and indexes have been a feature of much of scholarly information, notably in chemistry, physics and biomedicine. More recently these have been added to by vast accumulations of raw data and graphics which various, often public, agencies have collected almost en passant. Some of these collections have been lost to posterity and a great deal have been hidden in the deep web, the vast underlay on which the World Wide Web currently rides. Nevertheless, we have seen a growing data-centric emphasis in search behaviour in areas such as astronomy, bioinformatics, crystallography, etc. The trend towards aspects of e-Science – an era when all required information and manipulative tools will be accessible from the desktop of the researcher in a networked environment – is emerging though not yet entirely with us.

230 Technology as a Driver for Change

Whilst this progression from print to digital information has been taking place, attention has been focused on making sure that the many new sources are discoverable through online navigation services. Undiscoverable information is worthless – new resource discovery tools running on the web are required to give life to old or under-utilised data. Google has been a pioneer in this area. The launch of GoogleScholar in late 2004 indicates the focus that is being given to collecting not only the formal literature that is available digitally but also attempts to penetrate that hitherto vast deep web of hidden, grey literature of relevance to scholars. Google is not alone, with both Yahoo and MSN launching and improving their own comparable services in this area. Recognition that this is an important service for scholars is reflected in the fact that one of the main publishers of both printed and electronic publications, Elsevier, has launched its own generic resource discovery tool – Scirus. Underlying these new approaches to making scholarly information available has been the need for standardisation and interoperability. The issue of formats and metadata has been commented on in a previous chapter. However, behind the scenes there have been many other worthy attempts to ensure that key developments are not bespoke, but have universal applicability and longterm sustainability. The adoption of DOI as a persistent identifier by the publishers, leading to the creation of CrossRef which provides an almost barrier free networking of references between 10,000 journal titles, the agreement on a COUNTER code of practice (to collect and monitor usage statistics in a consistent way), and the emergence of open access procedures, particularly protocols for metadata harvesting, is potentially creating an even more solid technology framework for the future.

14.3 The technological infrastructure In their book “Blown to Bits – How the new Economics of Information Transforms Strategy” (Harvard University Press, 2000), Philip Evans and Thomas Wurster show how technology has brought about a new way of using information. Underlying the spread of electronic information is the powerful force of Moore’s Law. Gordon Moore, then chairman of Intel, showed that every 18 months it was possible to double the number of transistor circuits etched on a computer chip. This ‘law’ has been in existence for the past 50 years – a tenfold increase in memory and processing power every five years. By 2012 computers will be 40 times more powerful than today. By then one terabyte of storage will cost $ 10. This extraordinary technological driver is behind the current fall in prices of personal computers in particular and the democratisation of remote access to large collections of bibliographic data. At the same time as Moore’s Law was making its impact on hardware available for a whole range of support activities in electronic publishing there was a similar technical development occurring in telecommunications. The total bandwidth of the US communications industry, driven by the improvements in data compression as made possible through fibre optic strands through which information can now pass, is tripling every year. This effect is often referred to as “Gilder’s Law”. A further ‘law’ is important here. Robert Metcalfe, developer of Ethernet, has observed that the value of the resulting network that is created by the above is proportional to the square of the number of people using it. The value to one individual

The technological infrastructure 231

of a telephone is dependent on the number of friends, relatives an acquaintenances who also have phones – double the number of friends and the value to each participant is doubled and the total value of the network is multiplied fourfold. This is known as Metcalfe’s Law. Combining these technologies and concepts one sees that there is an escalating process whereby technology advances the scope for electronic publishing. Though content is being driven by other (editorial) forces, technological advances along the lines of laws advanced by Moore, Gilder and Metcalfe provide the infrastructure whereby the content, the books, journals, articles, data and supporting multimedia, can flow quickly and efficiently. 14.3.1 Digital Resource Management (DRM) Electronic publishers who have invested considerable sums in creating an information product or service have the natural inclination to ensure that only those who are willing to pay back a contribution for that investment are able to access the material. Those who are unable or unwilling to pay the access fee are locked out by a digital rights management system. Digital rights management (DRM) is an umbrella term that refers to access control technologies used by publishers and copyright holders to limit usage of digital media or devices. It may also refer to restrictions associated with specific instances of digital works or devices. To some extent, DRM overlaps with copy protection, but DRM is usually applied to creative media (music, films, etc.) whereas copy protection typically refers to software. The use of digital rights management has been controversial. Advocates argue it is necessary for copyright holders to prevent unauthorised duplication of their work to ensure continued revenue streams. Opponents maintain that the use of the word ‘rights’ is misleading and suggest the term digital restrictions management be used instead. Their position is essentially that copyright holders are attempting to restrict use of copyrighted material in ways not included in the statutory, common law, or constitutional grant of exclusive commercial use to them. This has become the battleground for conflict between publishers and the open access movement which has taken many guises over the years. In support of the DRM process, however, there are systems which have been put in place within libraries and end user institutions to ensure that the ‘rights’ of publishers are in fact observed by restricting access to the underlying content. The two more prominent of these are Athens and Shibboleth. 14.3.2 Athens Athens is an access and identity management service that is supplied by Eduserv in the UK which enables institutions to provide single sign-on to protected resources combined with full user management capability. Organisations adopting the Athens service can choose between the Classic Athens service, where usernames are held centrally by Eduserv, or Local Authentication where usernames are held locally and security tokens are exchanged via a range of protocols. Over 4.5 million users worldwide can now gain access to over 300 protected online resources via the Athens service. Athens replaces the multiple usernames and passwords necessary to access subscription based content with a single username

232 Technology as a Driver for Change

and password that can be entered once per session. It operates independently of a user’s location or IP address. There are therefore two main elements to Athens. Firstly, the ability to manage large numbers of users, their credentials, and associated access rights, in a devolved manner where administration can be delegated to organisations, or within an organisation. Secondly, Athens provides a managed infrastructure which facilitates the exchange of security tokens across domains in a secure and trusted way. 14.3.3 Shibboleth A shibboleth has a number of meanings, some linked to antiquity, but in present informatics vernacular its wider meaning is to refer to any “in-crowd” word or phrase that can be used to distinguish members of a group from outsiders. Within the field of computer security specifically, the word shibboleth has been applied to the general concept of testing something, and based on that response to take a particular course of action. The most commonly seen usage is logging on to one’s computer with a password. If you enter the correct password you can log on to your computer; if you enter an incorrect password, you can go no further. In technical terms shibboleth is an Internet2 Middleware Initiative project that has created an architecture and open-source implementation for federated identitybased authentication and authorisation infrastructure based on SAML (Security Assertion Markup Language). SAML is an XML standard for exchanging authentication and authorisation data between security domains, that is, between an identity provider (a producer of assertions) and a service provider (a consumer of assertions). SAML is a product of the OASIS Security Services Technical Committee). Federated identity allows for information about users in one security domain to be provided to other organisations in a common federation. This allows for cross-domain single sign-on and removes the need for content providers to maintain usernames and passwords. Identity providers supply user information, while service providers consume this information and gate access to secure content. Shibboleth emerged as the frontrunner for the most widely-adopted standardsbased approach. Shibboleth separates authentication from authorisation. Authentication is controlled by the user’s home institution and authorisation is based on user attributes and controlled by the service provider. Users don’t have to acquire and remember a separate identity for accessing protected services – they simply use their local institutional username and password. This should increase the use of subscribed services. Shibboleth is therefore now being applied to the system whereby there is a decentralised allocation of access rights by the institution, and the institution itself is responsible for ensuring that access rights to an external online service are updated and accurate. It is the common framework for access management that is being increasingly adopted by education and commercial sectors across the world. It is held together by trust. Some 150 institutions have currently adopted Shibboleth, including UK higher education institutes, and many of the leading information providers (Elsevier Science Direct, Ovid, Ebsco) have adapted their services to Shibboleth authentication. It is more library-centric, very complex to implement, covers both open source and commercial sources, and has a growing amount of support from both library and publisher communities.

The technological infrastructure 233

14.3.4 UK Access Management Federation The Federation is the framework for the new authentication and authorisation system for access to e-resources for UK higher education. The UK Access Management Federation uses Shibboleth software. Supported by JISC and Becta, and operated by UKERNA, the Federation provides a single facility to access online resources and services for education and research. In setting up the Federation and adopting shibboleth, JISC migrated away from Athens as the main system for authentication in the UK. JISC claims that there are a number of advantages for institutions and users in adopting a federated access management system based on shibboleth technology, in particular the evolving needs of e-learning and e-research communities for a single access management system that supports a range of authentication scenarios, including access to internal resources, external resources and collaborative requirements. However, as part of its programme for moving to ‘federated access management’ for UK education and research, JISC has been seeking an extension to the Federation Gateway Services contract with Eduserv which would allow Athens to work in the open standards environment of the UK Access Management Federation. They have, however, not managed to reach ‘an affordable agreement’ for the provision of the service, and it will no longer form part of the JISC Services portfolio from 1st August 2008. Athens will not disappear completely, however, and will be available beyond 2008 on a subscription basis. In March this year, Eduserv launched OpenAthens which enables institutions gain secure access to Shibboleth protected UK Federation resources and maintain access to Athens protected resources. Other countries have also developed their own solutions to the problem of accessing multiple resources with a single identity. 14.3.5 OpenID The other main authentication system is OpenID. The web-developing community have come up with this option. It has been incorporated within Microsoft Vista. OpenID is a decentralised single sign-on system. Using OpenID-enabled sites, web users do not need to remember traditional authentication tokens such as username and password. Instead, they only need to be previously registered on a website with an OpenID “identity provider” (IdP). Since OpenID is decentralized, any website can employ OpenID software as a way for users to sign in; OpenID solves the problem without relying on any centralized website to confirm digital identity. OpenID is increasingly gaining adoption among large sites, with organisations like Yahoo, AOL and Orange acting as providers. This is a user-centred approach, simple to implement, incorporates a growing number of open access resources, and can be adapted quickly. However, it does not address the issue of ‘trust’.

234 Technology as a Driver for Change

14.4 Technology and Standards 14.4.1 Digital Object Identifier (DOI) DOI is a persistent identifier which links through a redirection service to a source item. Unlike universal resource locators (URLs), which have in many cases the tendency to become broken after several months or years, the DOI is ‘persistent’ and follows the item wherever it goes. The process is coordinated through the International DOI Foundation, established by a group of publishers in the late 1990’s. Since then the number of journal articles with registered DOIs has grown to close to 30 million (mainly through a service application known as CrossRef), and there are procedures in place to include books, conference proceedings and possibly other digital grey literature including Ph D theses. The procedure is based on the Handle technology, invented by Bob Kahn, an early Internet pioneer, and now operated through the Corporation for National Research Initiatives (CNRI). This acts as the catalogue repository for DOIs and switching mechanism for matching the DOI with the appropriate (text) item. For example, if a journal moves from Elsevier to Springer, the prefix of the DOI would change to accommodate such a move, whilst the suffix would remain unaltered and follow the article from the Elsevier to the Springer site. A key element in this chain (which in many respects is a version of ISSNs operating in a digital environment) is the role of Registration Agents. These are empowered by the IDF to allocate prefixes to those organisations that wish to include in DOIs their published output (currently CrossRef deals with over 2,400 publishers, representing 18,000 journal titles). There is a cost to information providers in allocating DOIs to their digital objects. This cost is justified on the basis that the digital object becomes part of the network of linked items, and will benefit from being accessed as part of the network. There are other financial consequences involved, with registration agencies paying fees to the IDF. The final step in the chain is e-commerce that is made possible with each DOI becoming a transactional item – a price can be set on the purchase of the digital object whether it is an article, a part of an article or a non-text item. New business rules are therefore a crucial element of the DOI process, which includes business models, pricing structures and the (re)packaging of digital items. Rights of access are “attached” to the object by its unique identifier. By adopting open standards such as IOTP (Open Trading Protocol), XML, ebXML sellers of digital items can create complex business models employing hybrid subscription and transactional models, free and pay models, etc. This ties in with a general trend of ‘issue deconstruction’ – whereby the volume or issue of a journal title is no longer a logical or even viable entity. The article (or even parts of the article) becomes the key item separately identified by a DOI. This identification also ties in with an equally important aspect of information – linking from one item to another. There is a new twist to the old Internet concept; if you aren’t linked you don’t exist! DOIs enable secure, permanent linking to be implemented. Concerns The DOI highlights the issue of the Appropriate Copy (or the so-called Harvard problem). A library user will have a hierarchy of sources for obtaining an article

Technology and Standards 235

that a user wants. The preferred choice would be to point the user to the local copy (within the institution) on the library shelves; as the DOI system is owned and controlled by publisher interests, DOIs point to the publisher site which may mean the library/user paying twice for the same item. To overcome this problem, the rapid adoption of the Open URL services such as SFX, 1CATE, ArticleLinker, SIRSI and LinkSource allow multiple choices to be given to end users. Some of these choices enable the user to acquire a version of the article or part-of-article at no charge; other pointers are to the authorised, final publication of record on the publisher’s site but often at a price. Whilst journal articles currently represent an important source of research material, there are indications that a Grid-type approach and a change in user behaviour is opening up opportunities for a variety of other more ephemeral digital objects to be either important supplementary resources or resources in their own right. Grey literature is one such resource. According to a recent Outsell report, “Information Objects are hot; Documents are not!” This reflects the changing nature of research information with data also being sought increasingly more in its ‘raw’ state (to enable local emulation of a research project in line with specific needs), and broken down into more useful nuggets. This is all tied into a growing adoption of XML production techniques within the publisher/content provider communities. In the longterm, the adoption of a persistent identifier for digital objects seems inevitable. A proprietary system of persistent identifiers runs counter to a growing adoption of open systems and interconnectivity. The issue is how much support there is for the publisher controlled DOI system and whether more open approaches will have an impact over time. 14.4.2 CrossRef CrossRef has consolidated its position in the past few years, and by the Summer of 2007 has 2,400 publishers who participate, with 18,000 journals and who have issued 27 million DOIs. Its importance is that it facilitates linking between items with DOIs. Early in 2007 CrossRef, otherwise known as the publisher’s citation linking service, reported that over 25 million content items had been registered at that stage in the CrossRef system since its inception in early 2000. Although the majority of these Digital Object Identifiers (DOIs) are assigned to online journal articles, there are over 2 million DOI strings assigned to conference proceedings, components and books, at chapter as well as title level. CrossRef has also been supporting assignment of DOIs to technical reports, working papers, dissertations, standards, and data elements. CrossRef hit the 10 million DOI mark back in January 2004, after roughly four years in operation. Since then, the rate of growth in DOI creation across the scholarly publishing community has accelerated considerably, with the next 10 million DOIs being created and registered in just over two years. In April 2006, CrossRef registered the 20 millionth DOI. Of the five million DOIs created and assigned during the past year, a large number are associated with archival journal articles. The Royal Society, for instance, recently registered its complete journal back-file. In so doing, it joins several CrossRef member publishers who have completed, or are in the midst of,

236 Technology as a Driver for Change

vast retro-digitisation initiatives, including Elsevier, Springer, Sage, Kluwer, Wiley, Blackwell, the American Association for the Advancement of Science, and JSTOR, among others. Other features currently being added to the system include multiple resolution, which allows a single DOI to be associated with multiple elements, such as additional URLs, email addresses, pointers to other metadata records, etc.; parameterpassing, which allows a key or some encoded text, such as information about the source of the incoming link, to be sent along with a DOI; and matching of inexact queries, in which every value provided in a query is considered in a weighted manner, adding greater flexibility and accuracy to the DOI matching process. See: http://www.crossref.org. 14.4.3 Other Identifiers Crystallography is a subject area where standardisation of data is important. The International Union of Crystallography (IUCr) publishes eight journals, two of which are wholly electronic, and most having links to data. The data is all in public domain. As early as 1991 the IUCr developed the CIF (Crystallographic Information File) that created a standard for data in this area. Since then the IUCr has developed online only journals, has retro-digitised all its journals back to first edition and since 2004 has offered an open access option. The international tables of crystallography will soon be available online (but under subscription). Nearly all the journals are available on subscription, though for £ 500 (€ 800) the author has the option of having open access.

14.5 New Products and Services If access to vast amounts of electronic publications becomes all-pervasive, how does one locate trusted sources of information that are relevant to users? Web 2.0 is allegedly not working effectively across all frontiers, so what will be next? Some of the future technical developments could include elements of the following. It is felt by some pundits that Web 3.0 will involve a ‘smart search’ and will fix the failures of tagging. The semantic web will also come along but will be in evidence first in the corporate sector before it migrates over to the academic research area. Another development will be web operating systems. The web will become the computer, with all the necessary applications resident on the web. Metadata and data will become more prolific with more descriptors of data being made. Digital asset management will also emerge, to enhance services such as Flickr. Lower cost portals will emerge, moving away from the highly structured services that currently exist. Vocabulary editors will be embedded in the web and the information scene will become embedded in the social collaboration process. The publishing industry could suffer because of its inherent ‘silo’ approach. The article within an issue that is part of a journal will become redundant. Instead users will ‘subscribe’ to those items that are specifically relevant to their needs, irrespective of source. Open web publishing will facilitate this. As for the delivery device, the mobile technologies such as iPod, cellphones and their derivatives will take on a greater role in future.

Other Technical applications 237

However, it is speculated that a range of information delivery devices appropriate to different contexts will emerge. Besides searching, users also need to communicate, collaborate, be educated, be informed, etc. In particular, there is the need to consider the role of the user as being part of a collaborative team. Existing networks are used to communicate with their peers, not new users, but in future this will change. YouTube, MySpace, flickr, FaceBook – these are what the new generation are using for communication and collaboration. However, there are ownership issues – how much information does Google own? – and privacy issues. One feature of Facebook was recently taken off because of abuse of the individual’s profiles. Stalking is an example of unintended consequences. Some further forecasts are also worth considering: • Computation will shift back to the mainframe (browser and web server). Be-

cause it is proving easier to network, large players may move out of browsers. • Privacy will survive, despite current concerns. A Declaration of Rights might

be created? • Digital independence will rise, with individuals controlling their own web sites. • Individuals will have more remote control over devices in their own homes.

However, the latter means that individuals should take care not to allow one’s own information to be used by someone else. But will the future perception of privacy change because the emerging digital natives have a different set of needs/wishes? What might happen is that the tools they use will change. Despite the above there is no dominating influence – there are many trends some of which will be seen as useful and successful, others will be adopted by a few and have a limited but nevertheless enduring life.

14.6 Other Technical applications Some of the new trends which may have an impact on user behaviour and the market structure include: • The creation of specialised web crawlers that create ‘dossiers’ about individuals. Google has taken this up through its acquisition of http://www.answers.com. • The development of federated search engines. A single front-end interface pro-

vides access to a variety of different datasets. In this area, Yahoo has made a link with Verity. • The development of ‘Answer’ search engines is becoming popular as exemplified by AskJeeves • A further development is Dynamic Query Modification. This is a service that trains the user to be more selective in choice of search terms before clicking on the search button. There are also various new applications available for individual Desktop Search. • Auto-Summation is the process of creating new works from using natural lan-

guage technology. This is an advanced search technique within MSN.

238 Technology as a Driver for Change

• Multimedia Search. This includes voice recognition for example. Yahoo Video-

Search is an instance of this. • Personalisation, customisation and remote storage is also a further aspect of the

search process. MyYahoo does this. • Work Bench support. The Research Information Center being developed by

•

• • • • •

Microsoft and the British Library is an example of a service offering information support at all the stages of research cycle and not just bibliographic search Visualisation of output. With search engines such as Vivisimo and Clusty. Engineers at Sun use Grokker visualization tools; Stanford use Socrates; NCSU use Endeca. This may be uncomfortable for the bibliographic community but it is the way users are increasingly expecting their results to be delivered. Seamless Search, through sites such as Blinc.com makes implicit queries based on current work activity. Beyond Search. Text mining of documents. Include’s Corpora’s Jump (for pharmaceutical). It offers analysis of information results and extranet themes. Ontology mediators. These include text mining techniques, linguistics and extrapolation of different ontologies across different areas. Faceted browsing which ties the search to the work activity. Search and broadcast. Integration with devices such as iPod, Tivo and print.

Mobile search and delivery (such as using MP3 players to get delivery of information) are also receiving much attention. This includes SMS and WAP Tools, PodCasting and using an RSS feed which addresses MP3 technology requirements. Text and Data Mining are also significant areas for further development. Is there potential to mine the freely accessible data to establish new relationships and links? Similarly, is there potential to build on these knowledge discovery tools, mashed up with other material and linked to the APIs of major search engines, to create entirely new innovative information services branded by completely new information providers? Fundamentally, if the user is to dictate what is required, how are their desires and needs to be monitored? This is the question at the heart of the move towards electronic publishing by both publishers and libraries. 14.6.1 The “Cloud” A new service which warrants monitoring is that of the ‘cloud’. It involves a dynamic, self-adjusting system of creating a technological infrastructure which is highly distributable. It is powerful and highly scaleable. Google has announced its work on ‘the cloud’ and Microsoft is also looking at this technology. Even now, most of Google’s computing power is not on the Google campus. It is distributed in small, medium and large servers all over the world, over 1 million it is claimed, and is known as ‘the cloud’. This ‘cloud’ has become the basis for a new programme – Google 101 – and a new partnership with IBM to plug universities around the world into Google-like computing clouds. Unlike many traditional supercomputers, Google’s system never ages. When its individual pieces die, usually after about three years, engineers replace them with new, faster boxes. This means the cloud regenerates as it grows, almost like a living entity. If this concept expands, Google’s footprint could extend beyond search, media and advertising, leading Google into scientific research and other

Three predictions on scholarly communication technology 239

new businesses. In the process Google could become, in a sense, the world’s primary computer. Amazon has also recently opened up its own networks of computers to paying customers, initiating new players to cloud computing. Some users simply park their databases with Amazon and others use Amazon’s computers to mine data or create web services. In November 2007, Yahoo opened up a cluster of computers – a small cloud – for researchers at Carnegie Mellon University, and Microsoft has deepened its ties to communities of scientific researchers by providing them with access to its own server farms. As these clouds grow, new web start-ups will emerge, many in science and medicine, as data-crunching laboratories searching for new materials exploit the power of the clouds. For now, Google remains rooted in its core business, which relies heavily on advertising revenue. The cloud initiative barely registers in terms of current investment, but according to an article in Business Week it is claimed it bristles with future possibilities. The initiative started with one senior programmer using his 20 % ‘free’ time at Google to link together some 40 computers outside Google’s main service as a test bed for a new ‘cloud’ which became ‘MapReduce’. This divides each task into thousands of tasks, and distributes them to legions of computers. In a fraction of a second, as each one comes back with its nugget of information, MapReduce assembles the responses into an answer. Other programmes do the same job but MapReduce is faster and appears able to handle near limitless work. This is where IBM came in – to share the concept of developing a prototype university cloud. The work involved integrating IBM’s business applications and Google servers, and equipping them with a host of open-source programmes. The universities would develop the clouds, creating tools and applications while producing computer scientists to continue building and managing them. As the mass of business and scientific data rises, computing power turns into a strategic resource, a form of capital. “In a sense,” says Yahoo Research Chief Prabhakar Raghavan, “there are only five computers on earth.” He lists Google, Yahoo, Microsoft, IBM and Amazon. Few others, he says, can turn electricity into computing power with comparable efficiency. What will research clouds look like? Tony Hey, Vice-President for External Research at Microsoft, says they will function as huge virtual laboratories, with a new generation of librarians – some of them human – “curating” troves of data, opening them to researchers with the right credentials. Authorised users, he says, will build new tools, haul in data, and share it with far-flung colleagues. In these new labs, he predicts, “You may win the Nobel prize by analyzing data assembled by someone else.” It augers in the age of the global collaboratory as the centrepiece for much of future scholarly research.

14.7 Three predictions on scholarly communication technology The Economist (December 27th ) produced an end-of-year assessment of what it believes are the three main drivers for change in technology, which will therefore as a consequence impact on the scholarly communication arena during the coming year. These were:

240 Technology as a Driver for Change

1. Surfing the net will slow down The internet is not about to grind to a halt, but as more and more users take advantage of the many new services to download information, the information superhighway will begin to crawl with bumper-to-bumper traffic. The biggest road-hog will remain spam which accounts for 90 % of traffic on the internet. Phone companies and other large ISPs (internet service providers) have tolerated it for years because it would cost too much to fix. The big fat pipes used by ISPs operate symmetrically, with equal bandwidth for upstream and downstream traffic. But end-users have traditionally downloaded megabytes of information from the web, while uploading only kilobytes of key strokes and mouse clicks. When spammers dump billions of pieces of e-mail onto the internet, it travels over the phone companies’ relatively empty upstream segments. But this will change as the new services require automatic update of their content, and users start uploading as well as downloading in quantity (through FaceBook, MySpace, YouTube, etc).with a consequential increased uploading flow. “Today, music videos and TV episodes of hundreds of megabytes are being swapped over the internet by BitTorrent, Gnutella and other file-sharing networks.” And it is all two-way traffic. The result is a potential gridlock. While major internet service providers like AT&T, Verizon and Comcast all plan to upgrade their backbones, it will be a year or two before improvements begin to show. “By then, internet television will be in full bloom, spammers will have multiplied ten-fold, WiFi will be embedded in every moving object, and users will be demanding yet more capacity”. Meanwhile surfing the web is going to be more like traveling the highways at holiday time – one will get there eventually, but one cannot be sure when. 2. Surfing will be done on mobiles Earlier this month, Google bid for the most desirable chunk (known as C-block) of the 700-megahertz wireless spectrum being auctioned off by the Federal Communications Commission (FCC) in late January 2008. The 700-megahertz frequencies used by channels 52 to 69 of analogue television are being freed up by the switch to all-digital broadcasting in February 2009 in the US. The frequencies concerned are among the world’s most valuable. The 700 megahertz band is also the last great hope for a “third pipe” for internet access in America. Such a wireless network would offer consumers a serious alternative to the pricey and poor DSL (digital subscriber line) services they get from AT&T and Verizon, and to the marginally better cable broadband Comcast provides. Speculation around Google’s involvement in the mobile phone business as a result of its bid for C-block may be unfounded – according to the Economist its recent unveiling of its Android operating-system for mobile phones is not to make phones but rather make it easier for others to do so. The aim is to flood the market with “open access” phones that have none of the restrictions that big carriers impose – all because Google’s core business is claimed to be organising knowledge and giving users access to. Android has been made available to a group of manufacturers orchestrated by Google and known as the Open Handset Alliance. Currently Google has no way of getting at the billion users who rely more on mobile phones than personal computers to organise their lives. This is therefore perhaps time to muscle into the mobile-phone business by the dominant content

Three predictions on scholarly communication technology 241

provider of information. The winner of the C-block of frequencies, whoever that may be (and Verizon is the odds-on favourite), will have to open the network to any device that meets the basic specification because of the Google impact. In short, win or lose, Google has already achieved its objective – internet searches will doubtless be as popular among mobile-internet surfers as among PC users. Owning at least 60 % of the mobile search market is the prize Google has been after all along. 3. Surfing will become even more ‘open’ Linux has become the operating system of choice for low-end PCs. It started with Nicholas Negroponte, the brains behind the One Laptop Per Child project that aims to deliver computerised education to children in the developing world. His XO laptop, costing less than £ 100, would never have seen the light of day without its Linux operating system. But Negroponte has done more than create one of the world’s most ingenious computers. With a potential market measured in the hundreds of millions, he has frightened a lot of big-time computer makers into seeing how good a laptop, with open access, can be built for less than £ 250. Neither Microsoft nor Apple can compete at the new price points being set by companies looking to cut costs. With open-source software maturing fast, Linux, OpenOffice, Firefox, MySQL, Evolution, Pidgin and some 23,000 other Linux applications available for free seem more than ready to fill the gap. From the technological analysis offered by The Economist it appears that the pressure on the scholarly communications industry sub-sector in particular to change its ways and move away from licensed, restricted access business models to embrace the general IT trend towards openess is unavoidable.

Chapter 15

Data and Datasets as a Driver for Change

15.1 Background The research communication process is facing a challenge in how it adapts to the emergence of non-textual information (data and other media) as a primary source of information and knowledge. ‘Data’ now comes in a variety of forms and media – from spreadsheets to video clips, from experimental observations to chemical compounds. In the print paradigm limited scope existed for representing graphical, highly visual or programatic information in the form of detailed graphs, photographs, software to manipulate data online, etc). Collections of data in support of a research project were either treated as ‘supplementary information’ to be managed by the author, left in ‘data graveyards’ or dropped into a subject-based repository which had been established as a central resource for this purpose. Data are at risk for a number of reasons. They are in danger of being lost because the media on which they are recorded may be reused, corrupted or made obsolete by the passage of time or even just lost. The interpretation and validity of the data can also be lost due to missing calibration, provenance and other vital metadata.

15.2 Main data centres There are a number of centres throughout the world that act as major repositories for a variety of scientific data. Examples are listed in a report producedby Professor Tony Hey and Anne Trethen entitled “The Data Deluge: An e-Science Perspective” (Hey, T., Trefethen, A. “The Data Deluge: An e-Science Perspective,” Grid Computing: Making the Global Infrastructure a Reality. (Chichester: Wiley, 2003), pp. 809– 824. http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/research/esci/datadeluge.pdf). These examples of data compilations come from different subject areas and include: • Astronomy. There are several examples of datasets in this discipline. ‘Vir-

tual Observatories’ are being funded in the USA (NVO), Europe (AVO) and the UK (AstroGrid). These NVO projects will store 500 TBytes of data per year. Laser Interferometer Gravitational Observatory (LIGO) will generate 250 GBytes per annum. The VISTA visible and infrared survey telescope will generate 10 TeraBytes of stored data per annum. • Bioinformatics. The Protein Data Bank (PDB) is a database of 3D protein structures, comparatively small with storage in the GigaBytes. However, SWISSPROT is a protein sequence database with storage in the 10’s of GigaBytes.

244 Data and Datasets as a Driver for Change

•

•

•

•

•

EMBL Nucleotide Sequence Database in the UK contains data on humans, mice and other organisms with a total size in the order of TeraBytes. It is currently replicated in the USA and Japan. The GeneExpression Database includes image data produced from DNA chips and microarrays. Data storage requirements are predicted to be in the range of PetaBytes per annum. Environmental Science. One massive expansion in environmental data comes from the weather predictions. The European Centre for Medium Range Weather Forecasting (ECMWF) handles about 0.5 TeraBytes on new data per day with storage in excess of 330 TeraBytes, and with an 80 % per annum increase in data. NASA predicted rises of data volumes of 10 fold for the five-year period 2000 to 2005 and the Goddard Space Flight Center predicted that its holdings would increase by around a factor of 10, from 154 TeraBytes in 2000 to about 1.5 PetaBytes in 2005. Meanwhile the European Space Agency (ESA) satellites are generating around 100 GigaBytes of data per day. Particle Physics. The BaBar experiment has created the world’s largest database. 350 TeraBytes of data is stored in an objectivity database, and this number will be greatly increased when the Large Hadron Collider at CERN is fully operational. Medicine and Health. With the introduction of electronic patient records and improvements in medical imaging techniques the quantity of medical and health information that will be stored in digital form will increase dramatically. Radiological images in the US exceed 420 million and are increasing at 12 % per annum. Social Sciences. In the UK the total storage requirement for the social sciences has grown from around 400 GigaBytes in 1995 to more than a TeraByte in 2001.

These vast aggregations of data have become not only a work tool for researchers in those areas but a mechanism for communication and information exchange which parallels, and in many instances replaces, the book and journal. Even though five exabytes is claimed to represent the total sum of information of all types produced every year, this pales into insignificance when compared with some of the data produced in western society. For example, all the digital recording mechanisms on the US road system produce a vast amount (and redundancy) of information, reaching into exabytes. In this era of massive-scale computing, the central question is ‘what do we keep?’ Given the breadth and amount of such digital being generated we will need to decide what to choose to keep and maintain. As the data deluge increases, as the ‘firehose’ of data becomes an increasingly dominant feature of the information landscape in many scientific research areas, the way data and datasets are treated in future becomes one of the crucial issues facing electronic publishing.

15.3 The Data Challenge It is not a hyperbole to claim that changes in data collection, data management and data dissemination are creating a paradigm shift in the way research is being done in many fields of science. Declan Butler (Nature correspondent) has claimed that “by 2020, data networks will have gone from being the repositories of science to its starting point”. (Declan Butler, Nature News Feature “2020 computing: Everything, everywhere” http://www.nature.com/news/2006/060320/full/440402a.html).

The Data Challenge 245

This will be aided by the expansion of the use of workflow tools to help manage data through the entire research process, such as the increased use of Electronic Laboratory Notebooks (ELN), and the adoption of real time collections of ‘just enough’ data and information. Also ‘the long tail’ of research projects, each of which creates its own unique set of data and which is generally lost, will be retained in the future. This latter is the real challenge facing ‘small science’. These many small research projects produce 2–3 times more data than the well known ‘Big Science’ data sets, but they are more diverse, diffuse and at risk. There are processes at play such as ‘policy led deletion’ – when a researcher completes their project, or when they leaves the institution – all associated data is deleted however relevant it may be for future research projects. According to a UKOLN Report published early in 2007, there are several dimensions to the dataset issue. Research data is a diffuse and varied set of results that is created as a by-product of an individual research project and is often put on one side (in drawers, or deposited on local institutional repositories) as the researcher moves on to other areas. This amounts to thousands of datasets, often without adequate descriptions, incompatible standards and access protocols, yet also involving substantial investment of science-based research funds over the years. At the second level are the community datasets, more often used and updated by the community, but facing the challenge of not being interoperable or managed as a long-term resource. The third level is the reference data, the highly sophisticated datasets which are moderated either by the community itself or by an accepted editorial apparatus, which are often of a large scale as evidenced in the areas of astronomy, high energy physics, bioinformatics and some areas of chemistry. Here the challenge is to ensure continuity of access with trusted partners who will ensure that permanent links with other forms of media are established and maintained. The Research Information Network (RIN) has also funded research in this area of data management. In August 2007 it commissioned Key Perspectives Ltd to undertake a study into the Publication of Research Data Outputs that will highlight further issues. Meanwhile it has laid down five major principles for the stewardship of research data that include: 1. The roles of researchers, research institutions and funders should be clearly defined 2. Research data should be collected in accordance with international standards and using appropriate quality assurance 3. Research data should be easy to find but also protects the interest of those with legitimate ownership over the data 4. Models for enabling access to data must be effective 5. Research data should be preserved for future generations However, whilst this is the theory, practice falls far short of these goals and much research data is unexploited and under-utilised. This is often because of the lack of a clear commitment along with policies to enable data to be brought to the fore by science policy and funding agencies. The dream of highly efficient research reusing prior experimental results and observations is only apparent in a few narrowly defined disciplines. But there are indications that this may be changing. The tragedy is that the investment in creating a data set may be a considerable proportion of a research project’s budget and effort, and that every time a simi-

246 Data and Datasets as a Driver for Change

lar research project is undertaken the original data may have to be re-created at great expense. Also, longitudinal studies that could identify valuable trends are destroyed by the relevant data sets being lost.

15.4 Standards and Procedures To create an environment where data has the best chance of being reused, the data must be: • • • •

machine processable freely accessible have a unique persistent identifier be accompanied by sufficient metadata to correctly interpret the underlying data • include provenance information • conform to widely accepted standards 15.4.1 Data Management Systems One proposed hierarchical scientific data management system has the following attributes: (a) Distributed Data Collection – data is physically distributed but described in a single name space (b) Data Grid – integration of multiple data collections each with a separate name space (c) Federated Digital Library – distributed data collection with services for manipulation, presentation and discovery of digital objects (d) Persistent archives – digital libraries that curate the data and manage the problem of evolution of storage technologies According to Dr Tony Hey (Microsoft) the scientific data – held in file-stores, databases or archival systems – together with a metadata catalogue will become a new type of federated digital library. 15.4.2 Metadata of Data For datasets to be a valuable resource the expert community has to work together to define standards. The existence of such standards for the data and metadata will be vital to enable proper identification of the content of the file. Standards are also important to enable interoperability between, and federation of, data held in different formats in file systems, databases or other archival systems. Creating, managing and preserving appropriate metadata for any information is nevertheless still a difficult topic. It generally has to be created at the same time as research is performed. Improved tools may be needed to make it consistent and accessible. The metadata needs to offer advantages to the user and should be captured automatically or semi-automatically if possible.

The Researcher’s wishes

247

Besides metadata, Electronic Laboratory Notebooks (ELN) are increasingly in use, particularly in pharmaceutical research, where conformance to well-defined research procedures is mandatory. However, there is as yet little uptake of ELN within research laboratories in academia. 15.4.3 Data Webs Increasingly research data is posted and freely available from individual author websites. The researchers take it upon themselves to make the raw experimental data available, as long as the web address exists, which supports a final published report. To fully utilise these data sources they need to be located by others. This is where ‘Data Webs’ come in. These ‘Data Webs’ utilise a central registry that harvests the metadata. Various steps have been identified towards establishing Data Webs (or Web of Data under Sir Tim Berners Lee). A registry is needed to collect, marshal and integrate the metadata. A central access point is required and the metadata must be rigorously monitored. The primary data producers will retain their intellectual property rights (IPR) over the data but will treat the metadata as an open access resource. How does this differ from Google and other search engines? • The Data Webs provide access to data in the Deep Web that the search engines

currently do not trawl. • Each data Web has a specific domain of knowledge (and therefore has less

‘noise’ per subject area). • There is semantic coherence • There is programmatic access to the metadata

15.5 The Researcher’s wishes In a study commissioned in 2001 it was established by one UK funding agency that: • • • •

30 % of research grant recipients do preserve the primary data 12 % enable use by third parties of their data 67 % agree archiving of data would be beneficial None showed any support for preservation

This current absence of grass roots level support for data sharing or data archiving is somewhat worrying. There is no ‘success factor’ in evidence for researchers to commit to a data maintenance policy. This is partly because there is a “promiscuous data mining problem”. Authors are concerned about the misuse of their prized data, something that was evident in the Elsevier survey of its authors (2005). The Medical Research Council is leading a two-year project to identify a business model for data management. Unlike deposit mandates for publications in some disciplines, currently there is no mandate in place for data set deposition. Refereeing and peer review is important – they need to be applied to datasets as well as printed articles.

248 Data and Datasets as a Driver for Change

According to MRC all research proposals should include a “Data Sharing and Preservation” plan. However, to make this happen it is necessary to provide obvious benefits to the independently minded researchers. Unless researchers perceive a real benefit from dataset deposition the researcher-funders will probably have to develop mandatory policies to achieve compliance. NERC also asks for a data management plan from their grant recipients. But there is huge cultural problem to be overcome. Particularly in requiring the researcher to spend time to generate really good metadata. Researchers want to extract the most from their own experimental results rather than enabling others to do so. More generally, it is difficult to get the right professional-support to provide quality metadata. As data becomes more extensive, more elaborate, more community-based, more mediated by software, the relationships between articles and the data upon which they are based is becoming more complex. Implicit in these relationships are a whole series of disciplinary norms and supporting organisational and technical cyberinfrastructure services. For newly publicised data there are a range of approaches. Some journals offer to accept data as “supplementary materials” that accompany the article, but often with very few commitments about preserving the data or the tools to work with it, as opposed to the article itself. Not all journals offer this as an option, and some place constraints on the amount of data they will accept, or on the terms of access to the data (e.g., subscribers only). For certain types of data, specific communities – for example crystallographers, astronomers, and molecular biologists – have established norms, enforced by the editorial policies of their journals, which call for deposit of specific types of data within an international disciplinary system of data repositories, and have the article make reference to this data by an accession identifier assigned upon deposit in the relevant repository. Clearly, this works best when there are well agreed-upon structures for specific kinds of data that occur frequently (genomic sequencing, observations of the sky, etc). Another alternative is for the authors to store the underlying data in an institutional repository. While in some ways this is less desirable than using a subject repository (due to the potential for economies of scale, easy centralised searching of material on a disciplinary basis, and for the development and maintenance of specialised discipline-specific software tools, for example) the institutional repository may be the only real option available for many researchers. Over time individual researchers may move from institution to institution; technical systems evolve, organisations change mission and responsibilities, and funding models and funding agency interests and priorities shift – any of which can cause archived data to be migrated from one place to another or reorganised. The ability to resolve identifiers, to go from citation to data, is highly challenging when considered across long time horizons. Just because a dataset has been deposited into a repository does not automatically mean that other researchers (or indeed the public broadly) can have access to it. There are certain trends in the research community – most notably university interests in technology transfer as a revenue stream, and the increasing overreach of some Institutional Review Boards in restricting the collection, preservation and dissemination of materials dealing in any way with human subjects – which run counter to the bias towards greater openness.

Reproducible Results 249

15.6 Reproducible Results According to one expert in the field of bioinformatics (Dr Jill Mesirov from MIT and Harvard), research results must be reproducible or else there are no results. Information must be made available to enable the end user, with the same equipment, to make sure that the results can be replicated in different settings for different purposes. One must be able to compute in the same way as the original author using the same data and computational software. Not only does this reduce the risk of fraud (see later) but it also builds on a solid framework of reproducible results. In this respect there is also the need for persistence. The links to the raw data would need to be maintained in a way currently not being done. The same approach to data will need to be made as is currently done (by CrossRef) for the related textual publication.

15.7 Integration between Data and Text Furthermore, it is also essential to effect integration between the two resources of text and data. This is, according to Dr Philip Bourne from University of California at San Diego, a vital aspect of scholarly communication, one which has hitherto received precious little attention. There is no seamless integration of the related items of text and data, something which is as important as the local re-computation of someone else’ research results.

15.8 Business Model for Data If the data deluge is upon us, how will it be managed, filtered, quality assured, coordinated and disseminated? Who should play the primary role in this area? Certain pre-conditions have been established for any major player in this field. One of these is that data is usually ‘open’ and not locked behind authentication and authorisation protocols. Several organisations have made declarations with respect to the need for the openness of data. The Organisation for Economic Cooperation and Development (OECD) undertook a study that came up with a set of recommendations that were encapsulated within the Principles and Guidelines for Access to Research Data from Public Funding. There were originally 10 items included in these recommendations, but have since been expanded to 13. These recommendations try to ensure a balance between ‘openess’ and ‘commercial exploitability’ of research information. Recommendations have also been made by the UK Office of Scientific Information’s e-Infrastructure programme. This programme has some six working groups, one of which focuses on data and information creation. The report’s recommendations for data and information creation were: 1. Initiating a strategic approach to digitisation and repurposing data as a means of enabling access and new forms of analysis 2. The need for standards, data integration and certification. 3. Credits for use and citation of all form of research outputs and data (as well as publications), negative results, patents, etc

250 Data and Datasets as a Driver for Change

Data, when it is generated as a result of public funds, is becoming increasingly open access. This poses a challenge to the existing information system with its tradition of publishers and librarians relating to each other through a complex licensing and subscription-based model. Data available for free does not accommodate such a business model, which leads one to question whether the traditional publishing infrastructure is able to meet the challenge. Quality control is a key ingredient of the current publishing as inaccuracies, noise and irrelevancies are weeded out through the refereeing of research articles and books. This structure has not been translated to the data sector and the fear is that the research system could be stymied by a superfluity of poor quality data in circulation. Also, some of the datasets are immense in size and complexity. Specialist curatorial skills are required to manage these – that, or a reliance on strict standards and interoperability that would be required of those custodians (or trusted partners) who would bear responsibility for ensuring the persistence of particular datasets. 15.8.1 NSF funds for data compilations A reflection of the commitment that the US is taking to the dataset challenge comes from work undertaken by and for the National Science Foundation. A key report giving the US vision is contained in a report produced by Dan Atkins on the Cyberinfrastructure requirements in the USA (Cyberinfrastructure vision for 21st century discovery, July 2006, http://www.nsf.gov/od/oci/ci-v7.pdf). More recently the NSF has issued a call for proposals for a programme entitled Community-based Data Interoperability Networks(INTEROP). According to this latest proposal, digital data are increasingly both the products of research and the starting point for new research and education activities. The ability to re-purpose data – to use it in innovative ways and combinations not envisioned by those who originally created the data – requires that it be possible to find and understand data of many types and from many sources. Interoperability is fundamental. This NSF cross-sectoral programme supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies. Support will be provided for Data Interoperability Networks that will be responsible for consensus-building activities and for providing the expertise necessary to turn the consensus into technical standards with associated implementation tools and resources. Examples of the former are community workshops, web resources such as community interaction sites, and task groups. Examples of the latter are information sciences, software development, and ontology and taxonomy design and implementation. Approximately ten awards in each of the fiscal years 2008, 2009, and 2010 will be made, subject to the quality of proposals received and pending the availability of funds. The anticipated awards may be up to $ 250,000 total cost per year for three to five years. All communities whose science and engineering research and education activities are supported by the National Science Foundation (NSF) are encouraged to participate in this programme. Networks that provide for broad interoperability across a wide variety of disciplinary domains, topic areas, and/or data types and sources are particularly encouraged to take part. Achieving interop-

Impact of data on Libraries 251

erability on an international basis is among the goals of the programme and it is anticipated that the Networks will include worldwide participation. However, it is expected that the activities of the international partners outside the US will be supported by funds from their own national sources and programmes. See: http://www.nsf.gov/funding/pgm summ.jsp?pims id=502112&org=OCI&from=home

15.8.2 Google and Datasets Google has disclosed that their domain, http://research.google.com, will provide a home for terabytes of open-source scientific datasets. The storage will be free to scientists and access to the data will be free for all. The project, known as Palimpsest was previewed to the scientific community at the Science Foo camp at the Googleplex in the summer of 2007. Building on the company’s acquisition of the data visualization technology, Trendalyzer, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features. The storage would fill a major need for scientists who want to openly share their data, and would allow other scientists access to an unprecedented amount of data to explore. For example, two planned datasets are all 120 terabytes of Hubble Space Telescope data and the images from the Archimedes Palimpsest, the 10th century manuscript that inspired the Google dataset storage project. One major issue with science’s huge datasets is how to get them to Google. They are providing a 3TB drive array (Linux RAID5). The array is provided in “suitcase” and shipped to anyone who wants to send their data to Google. Anyone interested gives Google the file tree, and they copy the data off the drive.

15.9 Impact of data on Libraries Given the sheer growth of digital output, what options are available to the library to cope with all this? There seem to be three fundamental strategies: 1. Libraries ignore it and stick to what they know best. Managing the published textual output of research. However, these outputs will become less relevant over time in some disciplines and the science community is desperate for help in coping with the growing flood of research data. 2. Libraries limit the extent to which they become involved. More information is needed about what is required to cope with the massive injection of data before policies can de implemented. 3. Libraries embrace the data problem wholeheartedly. In fact, according to many librarians with T. Scott Putchak at the vanguard, libraries should adopt ‘Participating Librarianship’ which includes becoming active in data curation. In the past the research library community has collectively made a clear commitment to the long-term preservation of access to the traditional scientific literature; the assumption of a similar ultimate responsibility for scientific and scholarly data is highly controversial in some circles, however.

252 Data and Datasets as a Driver for Change

15.9.1 Data Curation Whilst the researcher may be interested in the processes of information creation, analysis, synthesis and access, the researcher may be less interested in archiving, computational and curatorial services The UK Digital Curation Centre focuses not so much on the creation of data sets but rather on their curation. Highly structured and refined data sets are required (and preferred) as these enable datasets to be combined and new evidence to be created from ‘mashing up’ the data from different sources. But data itself is also meaningless without context. This is where ‘computational lineage’ comes in – where the provenance of the data is important. Even this raises issues – when does the act of ‘data publishing’ take place? There is the need for the same citation system underpinning data as it does for text. A citation should link in perpetuity to the cited object. There are some players in this area. R4L (Repository for the Laboratory) has been created. The eCrystal federation (University of Southampton) is also involved. D-Space, the repository being run at Cambridge University, is also part of this data scene. (The latter has some 200,000 chemistry files). On the other hand it was recently pointed out that only five Institutional Repositories in the UK claim to hold data – Bristol, Edinburgh, Southampton, Oxford and Cambridge. Research libraries have a potential role with regard to sustaining datasets – the legacy issue in particular. There are agencies such as DCC, DPC, and the British Library that have a role in data management. As do AHDS, ESDS, NERC, universities data repositories and international communities. The latter include CERN, EBI, OECD, Canadian Association of Research Libraries and the National Science Foundation. A jumble of acronyms constitute the bedrock of current data preservation activity, with at present little overall coordination occurring between them.

15.10 Impact of data on Publishers Adapting to the availability of datasets and the management of them will be a key challenge facing publishers. Do they decide to use their skills at organising the mechanism for stamping quality on the data or do they allow other agencies to take over the management and curation of data? In one respect publishers do face a significant dilemma. As one industry observer has pointed out, the research article – the current focus of much of scholarly and research communication – is merely an exercise in rhetoric. It has a specific function of ‘showing off to the world’ a particular set of results, some of which are non-transferable, some of which are topic specific (Dr David Shotton, Oxford University). The support for the printed article or chapter has been oftentimes tied in with the grant allocation system that gives precedence to the textual report of a research process. But if grant allocating bodies, and researchers themselves, rely on the primary research output in terms of data rather than text as the main means for evaluation, then the citadel on which the article, journal and book sits begins to crumble. One of the significant changes in the past year or so has been the emergence of a new breed of scholarly information providers who are competing with the established journal publishers in terms of size. Whilst Elsevier remains dominant with

Impact on other institutions 253

a 17.4 % share of the market, followed by another Dutch conglomerate (Wolters Kluwer) and Thomson each with just over 6 %, the interesting development is the new entrants within the top 10. These are ‘data’ companies, hitherto only minor players in scholarly communication. Organisations such as WesternGeco (Schlumberger), now the 4th largest STM publisher, IHS Inc (6th ), Petroleum Geo Services (7th ) and TGS NOPEC Geophysical Company ASA (10th ) have suddenly appeared on the radar, with annual growth rates over 40 %. This compares with the organic growth of about 6–7 % per annum for traditional commercial journal publishers. There are also other players and communities involved in the data scene. In archaeology the dataset structure is much more developed, with trained archaeologists managing and staffing dataset curation. In the UK, this is dependent on support given by research councils. In atmospheric science there is also a welldeveloped dataset culture. In pharmacology there is an international scientific union (IUPMAR). Social sciences is covered by the UK Data Archive which has a restricted data base. In crystallography the international publication of the relevant data is a prerequisite for the publication of the paper. Other players are TNA (The National Archive), OCLC (though still searching for an underlying business case), Portico (for e-journals, with good funding from both libraries and publishers) and Iron Mountain, a data management system for records management. This indicates a new era for STM where data sources become as important, if not more so, than text-based publishing activities. This is a trend that has been suspected for some time but now there is tangible evidence that scientific research is moving back up-stream within the research process for its evidential information and relying less on published interpretations. (See Outsell report on Scientific and Technical Information, 2006 Results, June 7, 2007). Publishers may have a role to play, though this may be questionable in some eyes. There is a potential for them to provide coherence and coordination based on a model of multiple communities and organisational structures. In fact silos of information, similar to the current publishing system, could flourish but with the data and other information elements focused on the specific needs of the community.

15.11 Impact on other institutions 15.11.1 Research Councils Many of the UK research councils in the UK now see themselves as being in the business of supporting the complete life cycle of research information. The newly formed Science and Technology Facilities Council (out of CCLRC and PPARC) is currently managing 2 petabytes of data and in a few years time this will rise to 10 petabytes, and when the Large Hadron Collider comes on stream in early 2008 the role of data at STFC will be even greater. With this as an example, there is evidence that the research councils in the UK are insisting on ‘Data Management proposals’ being included in current and future requests for grants. This is still early days for some – such as the Biotechology and Biological Science Research Council (BBSRC) – whereas in others a clearer policy is emerging, such as with the Medical Research Council (MRC) and Natural Environment Research Council (NERC). However, in a recent NERC survey of 800 geoscience projects, only 60 replied with positive statements about data management processes. Still early days,

254 Data and Datasets as a Driver for Change

but indicative nonetheless of a new approach to dealing with the data issue at a policy level. One of the key centres for data collection in the arts and humanities is the Arts and Humanities Data Service (AHDS). Almost in defiance of the above suggested need for focus on the data issue, the AHRC is to cease funding the database from 31 March 2008, JISC has also decided that it is unable to fund the service alone and that therefore funding of the service will cease on the same day. In its eleven years of existence, the AHDS has established itself as a centre of expertise and excellence in the creation, curation and preservation of digital resources. It has been responsible for a considerable engagement by the arts and humanities community with ICT and a significant increase in that community’s knowledge and use of digital resources. This is a slap in the face for dataset management within the arts and humanities community in the UK.

15.12 Summary Geometric advances in computer storage and processing power are making the creation of massive datasets within all research disciplines possible. However the total lifecycle cost of management of these datasets easily outstrips the initial cost of creation. There are technical advances which are still making striking progress, following Moore’s Law and the geometric increase in computing power, but at the same time as technology’s frontiers expand the issue of digital preservation also comes to the fore. Other issues also emerge from this deluge in digital output, such as privacy issues, ethics of involvement, public versus private data, and other unintended consequences. Concerns arise, such as security and political control over data. Data management is far from an easy issue to resolve as it embraces technological, economic, administrative and political issues Persistent linking between data and text is also becoming a critical issue, and data and text mining are threatening to become powerful new services that will identify new scientific relationships which are undiscoverable using traditional search techniques. This is a key development path for scholarly communications in future (see recent OSI e-Infrastructure Report, and the NSF Cyberinfrastructure). Data has a central role to play in future research. In summary: 1. 2. 3. 4.

Much of the valuable data currently remains unpublished Creating good metadata is an essential requirement The large Science disciplines sometimes have access to large data sets The legacy and sustainability issue of data needs to be resolved.

Chapter 16

Mining of Text and Data

The pattern which seems to be emerging is that the changing paradigm is giving more emphasis on the provision of ‘information services’ in future, targeted at specific research needs, rather than offering content from which end users are expected to find what is relevant. Broader support for an information market which is faced with more information, in various formats, but having a constant amount of time available to find what they need, has become an emerging theme. This has given rise to new forms of services which look at content and users in different ways. One of these is text and data mining, a process which seeks optimisation of the available content.

16.1 Background The research process itself often involves making connections between seemingly unrelated facts to generate new ideas or hypotheses. However, the burgeoning growth of published text means that even the most avid researcher cannot hope to keep up with all the reading in a field, let alone adjacent fields. Nuggets of insight or new knowledge are at risk of languishing undiscovered in the literature. The existing information system is not optimal in enabling the vast corpus of knowledge to be analysed effectively and quickly. However, text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems undeterred by the text explosion. It involves analysing a large collection of documents to discover previously unknown information. The information might be relationships or patterns that lie buried in the document collection and which would otherwise be extremely difficult, if not impossible, to discover. Text mining can be used to analyse natural language documents about any subject, although much of the interest at present is coming from the biological sciences. As an example, the interactions between proteins is an important area of research for the development of drugs to modify protein interactions that are linked to disease. Text mining can not only extract information on protein interactions from documents, but can also go one step further to discover patterns in the extracted interactions. Information may be discovered that would have been extremely difficult to find, even if it had been possible to read all the documents. This information could help to answer existing research questions or suggest new avenues to explore.

256 Mining of Text and Data

16.2 Implications However, such an expansive view of scholarly communication is very likely to put a strain on traditional relationships. What is needed is a system which allows: • • • • • •

access to a vast corpus of multi-disciplinary information in a consistent and interoperable form freely accessible, without authentication controls covering text, data and other media sources unprotected by copyright controls (over creation of derivative works) a single point of entry with a powerful and generic search engine

Defining such a specification requires a new means of collaboration between existing and future stakeholders to accept data and text mining as being effective and acceptable processes. In particular, that such mining does not eliminate any significant role currently being performed by stakeholders, that it does not raise challenges and barriers to text/data mining applications, that it does not threaten publishers and librarians and their existence. There is the rub. The battle will be whether the advantages which text and data mining confer are sufficiently powerful and attractive to the research community to force objections aside. At present all we can hypothesise is that data and text mining will happen – is happening in select areas – and will be another Driver for Change in Electronic Publishing over the next few years. But how soon depends on a number of factors. Intellectual property rights and their protection will be at the forefront of these.

16.3 The mechanism of Text Mining Text mining involves the combination of a number of techniques, and brings them together in an integrated workflow. Information Retrieval (IR) systems identify the documents in a collection which match a user’s query. This includes the results from search engines such as Google, which list those documents on the World Wide Web that are relevant to a set of given keywords. It also covers access to accessible documents in digital libraries. IR systems enable a set of relevant documents to be identified. As text mining involves applying computationally-intensive algorithms to large document collections, IR can speed up the analysis considerably by reducing the number of documents for analysis. Natural Language Processing (NLP) is part of artificial intelligence and involves the analysis of human language so that computers can understand natural languages as humans do. Although this goal is still some way off, NLP can perform some types of analysis with a high degree of success. For example: • Part-of-speech tagging classifies words into categories such as noun, verb or

adjective

Recent History 257

• Word sense disambiguation identifies the meaning of a word, given its usage,

from among the multiple meanings that the word may have • Parsing performs a grammatical analysis of a sentence. Shallow parsers identify

only the main grammatical elements in a sentence, such as noun phrases and verb phrases, whereas deep parsers generate a complete representation of the grammatical structure of a sentence The role of NLP in text mining is to provide the systems in the information extraction phase with linguistic data that they need in order to perform their task. Often this is done by annotating documents with sentence boundaries, part-of-speech tags, parsing results, which can then be read by the information extraction tools. Information Extraction (IE) is the process of automatically obtaining structured data from an unstructured natural language document. This may involve defining the general form of the information that we are interested in as one or more templates, which are then used to guide the extraction process. IE systems rely heavily on the data generated by NLP systems. Tasks that IE systems can perform include: • Term analysis, which identifies the terms in a document, where a term may con-

sist of one or more words. This is especially useful for documents that contain many complex multi-word terms, such as scientific research papers • Named-entity recognition, which identifies the names in a document, such as the names of people or organisations. Some systems are also able to recognise dates and expressions of time, quantities and associated units, percentages, and so on • Fact extraction, which identifies and extracts complex facts from documents. Such facts could be relationships between entities or events Data Mining (DM) (known also as knowledge discovery) is the process of identifying patterns in large sets of data. The aim is to uncover previously unknown, useful knowledge. When used in text mining, DM is applied to the facts generated by the information extraction phase. By applying DM to a large database, it may be possible to identify patterns in the facts. This may lead to new discoveries about the types of interactions that can or cannot occur, or the relationship between types of interactions and particular diseases and so on. Visualisation. The results of the DM process can be fed into another database that can be queried by the end-user via a suitable graphical interface.

16.4 Recent History Forms of text and data mining have been around for some fifty years. The intelligence gathering community was an early recogniser of the usefulness of this technique. Artificial intelligence and diagnostics have also employed text and data mining techniques. In the 1980’s abstracts in the Medline database were used as a platform against which to test text mining approaches. Life science text has been used at the front-end of studies employing text mining largely because the payoffs in terms of drugs and health care are so high.

258 Mining of Text and Data

All this was a prelude to a shift in the way users came to terms with the information explosion. There were two elements. • The first is that ‘collecting’ digital material became different in some respects

from the way physical collections were built up and used. In the print world filing cabinets full of printed artefacts which the user, through the very process of collecting the stuff, absorbed the content by some unclear form of osmosis. Now people find things online, and they similarly build up collections of the digital items on their computers and laptops. The difference is that these personal libraries – which often still go unread – are interrogated using better search and retrieval software. Even so, there is still material which remains hidden on users’ machines. • The second change is that there is a new approach to digital ‘computation’. Google came along with its multiple range of services which raised the searching stakes. But still some of the expensive published material remains hidden to the search engines. But the very process of search and collections became disentangled. Google can only compute on what it can access and index. Even Google does not rely totally on the metadata, it indexes the full text. But there is still scope for misleading and irrelevant research hits to be achieved. There is an issue of ‘trust’. We assume Google can reveal all the hidden secrets in the documents. But this is not always the case, and it is the application of fulltext mining software and data mining procedures which expose more of the relationships which are hidden in and between the individual documents. Text mining builds on Google’s existence – it does not replace or compete with it. To be really effective text and data mining requires access to a large amounts of literature. This is the real challenge to widespread adoption of this technique.

16.5 Challenges facing Text and Data mining The potential value of automatically searching across expansive information resources is conditional on the structures being in existence which will support this. There are a number of issues which would make this difficult. 16.5.1 Structure of database The key issue is whether the text mining is undertaken from a single large accumulated database held centrally, or else whether a federated search system is adopted with knowbots being launched to pull in results from remote and privately held databases. Computation can take place in a more controlled environment on a single aggregated database, though this may not always be possible for a variety of technical and IPR reasons. A single distributed model raises issues around data normalisation, of performance levels, of other standardisation issues. A centralised database also raises issues of resources. Not only in terms of the infrastructure to support a large central file but also in the support services necessary to run it. A federated or distributed system requires conformity by all involved to metadata standards to allow effective cross reference and indexing. The issue of compatability of formats between external data resources is also made more complex

Challenges facing Text and Data mining

259

should databases created by the organisation itself be available, and should these also be included within a single text mining operation. If the need is to rely on a federated approach the issue of trust arises – trust that the remote database of text and data will always be there, curated and consistent in its approach to metadata creation and fulltext production. The issue of Trust will be commented on later – it is a key underpinning to an acceptable information system. Furthermore, a plan needs to be devised which will allow for new ‘partners’ to be included within the text mining environment, and old remote data services which have changed their operations, need to be eliminated according to a well conceived set of circumstances. 16.5.2 Legal and licensing framework Having established a basis for including databases in a particular service, the issue of access rights to the database has to be resolved. Most databases which include a sweat of the brow activity may only be accessible if the customer has paid a subscription or licence fee. Even if this hurdle is overcome, the terms of the subscription and licence may be such that the owner of the database will not allow reformulation of the material in any way. The rights issue has been highlighted by concerns several commercial journal publishers have raised with respect to the creation of ‘derivative works’ that could undermine the commercial or service opportunity of their own works. Derivative works are perceived to be a threat to publishers’ revenues. They prevent through the terms of their licences the mining of their articles to establish new relationships. Several publishers have recently reached an agreement with the funding agency, the Wellcome Trust, to allow text mining to take place on works which Wellcome has funded (through payment of author fees) but only within the terms of the licences agreed with each publisher. This is very restrictive as far as text mining is concerned. Licences would need to be changed to accommodate text and data mining activity, to open up the database to mining activity whilst protecting the commercial interests of the database owner. 16.5.3 Legal status of text mining There is precious little case law to draw on whether text and data mining as a process is legally sustainable. The challenge raised against derivative works is an indication that there is lack of clarity. There is however a question about whether any one publisher’s work has been included in any one text mining output. Though the publisher’s server may have been interrogated by the text mining software connecting the results of the mining process back to the information sources may prove extremely difficult. Multiple results may have been derived from a wide variety of text sources and how can one be given credit for any one item? In many instances a trawl through a publisher’s database may prove fruitless. Computers themselves are logical, not creative. Computation using text mining to create a derivate work is essentially a mechanical activity. Derivative works can therefore be based on hundreds or thousands of separate copyrighted works – does this make them something new and entirely different?

260 Mining of Text and Data

The legal basis of text and data mining also needs to encompass the creation of extracts, translations and summaries of developments in various fields. Some of these derivative works are mechanically produced but others, such as creating a translation, still needs elements of human creativity. So much so that copyright may be vested in the derived translation. Who can determine what is included from whom (copyright owner) in a newly derived work? 16.5.4 Computation by machines It is now possible for both humans and machines to compute from a large text base. In this case is there a need to consider changing the process of literature creation to make the process ‘friendlier’ for both humans and machines? A great deal of work is taking place in text mining to disambiguate the text and place the text items in contextual relationships. Some approaches to disambiguating text items have been addressed and in data (with gene names, names of species, astronomical objects). Dr Peter Murray-Rust from University of Cambridge has done much to advance the cause of text and data mining in the field of chemistry.

16.6 Practical Examples A project entitled METIS provides a practical example in the US of how aspects of text and data mining have been applied to a particular problem. The problem was to investigate all the factors which could be involved in breast cancer as identified in past research publications. Run from University North Carolina by Professor Julia Blake, this is an instance where an IT expert addressed a biomedical application – it was not a library or publisher driven project. Professor Blake’s approach was to sift through the vast amount of published knowledge on the topic of breast cancer and to manage the information flow. In so doing the key issue became one of sorting out the information retrieval aspect and applying a new function of information synthesis. The starting point was to collect a corpus of relevant documents in the breast cancer area – some 100,000 were included – and to see what information could be extracted. A meta-analytical tool was used. METIS included synthesis of detailed secondary (meta data) from the corpus of literature. The text mining software used was off-the-shelf IBM software. Other visualisation tools also exist. But these are often inappropriate for the scholarly or scientific researchers. Meta analysis synthesis involves comparing the information source selected with a control group. From the external database the contextual factors would need to be extracted based on the control group comparison. Then the meta analysis throws up results which then have to be evaluated whether and to what extent they are reliable and valuable. The point is that documents are not pre-selected by keywords. Every document, including those where the suspicion is that they may have only partial relevance, would be analysed. TEMIS is another such text mining project. TEMIS is a software organisation established in 2000 which has centres in France, Germany and the USA. It focuses on pharmaceutical and publishing applications and has a client base which

Implications in applying text mining

261

includes Elsevier, Thomson, Springer, etc. The TEMIS approach is to move away from delivering information in a document-centric way to producing finely grained information specifically relevant for purpose. It incorporates natural language processing technology. In using TEMIS against a data resource it is able to detect new terms, detect relationships between different objects and assesses document availability. Thomson Scientific uses TEMIS to rescue data which had been captured in another format (for example, the BIOSIS way) and restructure the database according to Thompson data house style. A clean, new database is created. MDL, a former Elsevier company, uses TEMIS to automatically extract facts. A new database is created from analysing text documents. Springer uses TEMIS to enrich journals with hyperlinks into major reference works. Gradually we are seeing more and more peripheral use of text mining for specific applications. So far it has not reached mainstream publishing activities for reasons outlined above.

16.7 Implications in applying text mining Text and data mining creates a new way of using information. It opens the horizons of researchers. But to fully appreciate the scope of the technology it requires some training for the researcher and the inclusion within their research process the cycle of which includes the meta analysis and synthesis. Besides that it needs access to a large document database. As has been mentioned, this creates problems with regard to licensing. But text miners need text, And they need it is a form which is useful for the text mining systems. There are many other centres of excellence worldwide which are pushing at the frontiers of text and data mining. In the UK two centres include the European Bioinformatics Institute, part of the European Molecular Biology Laboratory, and the NaCTeM (National Centre for Text Mining) group based at the University of Manchester,

16.8 The Future For text and data mining to thrive a new electronic publishing structure must emerge. A new electronic publishing structure will emerge because the situation is currently so changeable and fluid. This structure must effectively address the following issues. • Literature is not always useful because of licence restrictions • The literature is subject to mining, particularly by machines • This highlights the complex questions which exist about licensing and infor-

mation delivery arrangements • This will have impacts on authors and publishers • Where is all this heading? Should the relevant parts of the industry be reactive

or proactive? There are more questions than answers.

262 Mining of Text and Data

Does this mean that the research process will become different? Probably not as the same author-community communication activity will take place. But there will be a change with regard to style and features – the inclusion of text and data mining will become more ingrained within the emerging research process.

16.9 Impact on Libraries Librarians need to be aware of research trends and emerging techniques and in particular to take into account the above trend-lines. Librarians need, in particular, to monitor what the faculty really need and how they are building up their own knowledge resources. This will provide evidence on which to negotiate future licensing and subscription rights. The extent librarians are involved in the research process will determine whether the correct rights are secured for future research applications. Also, when digitising the library or institution’s own internal physical resources similar considerations should be given to the rights of the resultant digital resources. Standards setting and monitoring will become important for interoperability and advancing the art of text and data mining. Assisting in this standards setting process could become a responsibility of the library profession. Helping with the creation of of ontologies and appropriate mark-up languages could also become their role.

16.10 Impact on Publishers It is less easy to be sanguine about how text and data mining will impact on publishers. There is a distinct sense that publishers see this as a threat and are trying to restrict access to their published material until such a time as a business model can be constructed which allows publishers commercial benefit from the results of text and data mining. Whilst such mining activities flourish in an open access world the opportunities for publishers lie in how they can provide support services in this area. With several thousand scholarly publishers worldwide, each with their own silo of digital data, it would take a substantial change in the industry mindset to create a large, consistent database, sufficient to make text mining an effective service. Cooperatives of publishers are few and far between, and the industry record on cooperation has (with a few exceptions) been poor.

Chapter 17

E-science and Cyberinfrastructure as Drivers for Change

17.1 Background e-Science is a relatively new concept in electronic publishing, and originates with the recent substantial investments being made in national and international collaborations which are required to tackle some of the more intractable scientific and technical problems. It brings Information Computing Technology face-to-face with the traditional cosy world on scholarly publishing and potentially creates another ‘perfect storm’. It is an aspect of electronic publishing which suggests that the paradigm for publishing will change in a big way within a short period of time. It is a driver for change which would appear to have one of the greatest impacts on the way researchers conduct their research and information gathering – but only in those subject areas where the scale of investment in the technical infrastructure is large. As such it is a technical driver for change of some significance.

17.2 The e-Science Challenge The term e-Science describes computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing. E-Science sometimes includes technologies that enable distributed collaboration, such as the Access Grid. Examples of the kind of science to which the term is applied include social simulations, particle physics, earth sciences and bio-informatics. Particle physics has a particularly well developed e-Science infrastructure due to its need for adequate computing facilities for the analysis of results and storage of data originating from the CERN Large Hadron Collider, which is due to start generating data in 2008. e-Science as a term was first coined by John Taylor, the Director General of the United Kingdom’s Office of Science and Technology in 1999. It was used to describe a large funding initiative which started in the UK at that time. e-Science, together with the large datasets referred to in the earlier chapter, represent a significant volte-face to Electronic Publishing. It creates a new IT dimension, one that EP needs to take into account. There are broad social and political forces at work, independent of, but often finding common cause or at least compatibility with, the new e-science developments. New capabilities of global high-performance networking and other information technologies are, for the first

264 E-science and Cyberinfrastructure as Drivers for Change

time, making it possible for fundamental shifts in the practices and structures of scholarly communication.

17.3 Visions for e-Science In the US the term e-Science is synonymous with cyberinfrastructure, first described by Dan Atkins in a report to the National Science Foundation. Dan Atkins joined the NSF in June 2006 as director of the Office of Cyberinfrastructure (OCI). Created in June 2005, OCI gives awards for leading-edge, IT-based infrastructure increasingly essential to science and engineering leadership in the current Millennium. Cyberinfrastructure in the US includes supercomputers, data management systems, high capacity networks, digitally-enabled observatories and scientific instruments, and an interoperable suite of software and middleware services and tools for computation, visualisation and collaboration. It has a budget of $ 120 million. In early 2003, Atkins chaired a panel which issued a highly influential report, titled Revolutionizing Science and Engineering Through Cyberinfrastructure, that recommended a major development programme. The European Strategy Forum on Research Infrastructures (ESFRI) was launched in April 2002. It brings together representatives of EU Member States and Associated States, appointed by Ministers in charge of Research, and one representative of the European Commission. The role of ESFRI is to support a coherent approach to policy-making on research infrastructures in Europe, and to act as an incubator for international negotiations about concrete initiatives. John Woods produced a report on “The European Roadmap for Research Infrastructure” in 2006 for ESFRI in which he proposed 35 facilities being required. In the UK, the Office of Science and Innovation (OSI) e-Infrastructure Working Group, also produced a report – Developing the UK’s e-infrastructure for science and innovation – that sets out the requirements for a national e-infrastructure to help ensure the UK maintains and indeed enhances its global standing in science and innovation in an increasingly competitive world. Made up of senior representatives from JISC (Joint Information Systems Committee), the Research Councils, RIN (Research Information Network) and the British Library, the Working Group was formed in response to the Science and Innovation Investment Framework 2004– 2014, published by the Treasury, the DTI and the DfES in 2004, to explore the current provision of the UK’s e-infrastructure and help define its future development. However e-Science is now entering un-chartered waters. It is coming to terms with three interlocking processes – e-Science as an enabler for research activities, eScience as support infrastructure, and the usage of e-Science within the community. In trying to bring these together those who determine e-Science policy are facing a diversity of applications, many of them still in prototype, many of which are still in the academic rather than operational domain.

17.4 Overall context of e-Science In addition, because the same technical and economic drivers have fuelled much of the commitment to e-science, other exogenous factors that are shaping the fu-

Future role of e-Science 265

ture of scholarly communication are often overly identified with e-science itself. Notable examples include demands (particularly in the face of some recent highprofile cases of scientific fraud and misconduct) for greater accountability and auditability of science through structures and practices that facilitate the verification, reproducibility and re-analysis of scientific results. Also it embraces efforts to improve the collective social return on investment in scientific research through a recognition of the lasting value of much scientific data and the way that the investment it represents can be amplified by disclosure, curation and facilitation of reuse. Welcome to the world of e-Science! 17.4.1 Public Engagement Nevertheless, it seems the public is by and large sceptical of the role of e-Science within the community. This is leading to a focus within the UK to take e-Science and some of its success stories into schools and colleges, to attract and stimulate the young to take Science seriously. Schools need to participate in e-Science, not just observe its results, for effective public engagement to take place. Public engagement is also being sought with some of the e-Science experiments. Without public interest and public participation archaeology would be the poorer as a scientific discipline, for example. Visualisation technologies are being employed to stimulate such public interest and awareness. For example there are some 30 virtual globe products which can be used either on their own or as part of ‘mashups’ to bring excitement to a project which involves the geographical dimension. These include Google Earth, NASA World Wind, Microsoft Virtual earth, Free Earth, etc.

17.5 Future role of e-Science e-Science is a leading driver for change in EP. As with data and datasets there is a substantial investment in the infrastructure to support effective research, and in the process is rewriting the way EP is occurring. Not only does EP need to keep upto-date with the requirement for speedy dissemination of time-sensitive research findings, but it also needs to build on the technology and standards being adopted in the research process.

Chapter 18

Workflow Processes and Virtual Research Environments

18.1 Integration into Work Flow Process Research has become increasingly collaborative over time. Research is driven by the need embrace a broad range of disciplines to push forward Science. Much of the innovative research takes place in the gaps between traditional disciplines, or in the overlap between them. One of the key lessons from this is the need for more collaboration between subject areas and procedures which in the past followed separate and different paths. Collaboration is one of the basic lessons emerging from this development, collaboration which embraces the different cultural approaches within the disciplines. This can be brought about by more and better ‘team work’, and this has major implications on the way research projects will be organised and managed in future. It would underpin the large, collaborative, global research projects which are increasingly coming to the fore One indication of such global collaboration, of such team work, can be seen by the fact that there is more joint authorship of an individual published article, with such joint authorship coming from different countries and different disciplines. The following chart shows that the average authorship has risen from about two in 1966 to nearly four in 2000.

Figure 18.1 Average authorship per article

268 Workflow Processes and Virtual Research Environments

18.2 The research process The research process, particularly in large collaborative projects, encompasses a variety of steps. From the initial inception of an ‘idea’, the research can progress to initial investigations of who is doing similar work, an exploratory phase. This can include searching for patents and identifying relevant standards and protocols as well as a literature search within relevant databases. In some cases a market investigation may also be undertaken or commissioned. Then onto the research itself which can involve external tools and software to manipulate raw data. The collation of results and the collaborative writing and editing of the research results before the manuscript is considered suitable for final publication may then be undertaken. There are software services which enable the manuscript to be formulated to meet publisher’s house styles and easy submission. If the research generates some practical, commercially viable products or services, the results may also be submitted for patent protection which involves a different set of procedures. Each phase or stage may need access to different types of information. Not just published information but also standards and financial information, market data, raw data, competitive results – the more extensive the research process the wider the information sources which may need to be dawn on. In the past these have not been available in one place. They can be distributed over many parts of the information industry, located in different media types, each with different access permissions. It is getting more and more difficult to be on top of all the information needs of collaborative research. The work bench approach focuses on this problem. It is an attempt to integrate within one system, one work station, the access to all the main sources of external information which a researcher or research group may need to complete their research project.

18.3 Examples of a work bench approach Several large commercial publishers have caught onto this challenge in recent years. Both Elsevier and Thomson have sought to build on their traditional published resources to extend into other aspects of the work flow. Their focus tends to be on an industry specific basis, with the pharmaceutical industry being seen as a viable source for such a work bench approach. There are also instances of a broader work bench approach. One example is the project developed by the Technical Computing unit at Microsoft and the British Library. Known as Research Information Center (RIC) it was first released as a pilot to the All Hands conference at the University of Nottingham, in September 2007. The aim of RIC is to offer support services and functionality for the four main phases which a researcher goes through in completing a project. These four phases include • • • •

idea discovery and design, followed by getting funds to conduct the research. The third phase is to experiment, collaborate and analyse. The final stage is to disseminate the findings of the research.

Examples of a work bench approach 269

All too often the focus by publishers and information providers is on the final stage in supporting the research community, when the other three pose equal challenges. RIC enables each member of a research team to share information with other members of the team using the Sharepoint system developed and supported by Microsoft. 18.3.1 Virtual Research Environments (VRE) Virtual Research Environments (VREs) figure prominently as test beds for community reaction to the technology inherent within e-Science. The purpose of a VRE is to help researchers in all disciplines manage the increasingly complex range of tasks involved in carrying out research. A VRE will provide a framework of resources to support the underlying processes of research on both small and large scales, particularly for those disciplines that are not well catered for by the current infrastructure. A VRE is best viewed as a framework into which tools, services and resources can be plugged. VREs are part of infrastructure, albeit in digital form, rather than a free-standing product as with the workbench. A VRE shares more in common with a Managed Learning Environment that provides a collection of services and systems which together support the learning and teaching processes within an institution. The VRE, for its part, is the result of joining together new and existing components to support as much of the research process as appropriate for any given activity or role. It is usually assumed that a large proportion of existing components will be distributed and heterogeneous. A VRE that stands isolated from existing infrastructure and the research way of life will not be a research environment but probably only another underused Web portal. The emphasis in VRE is on architecture and standards rather than specific applications. The VRE presents a holistic view of the context in which research takes place whereas e-infrastructure is focused on the core, shared services over which the VRE is expected to operate. Also, given the multiple roles that members of the research and research-support communities tend to have both within their own institutions and within multiple inter-institutional research activities, the convergence between local/national middleware, access management and VRE development activities is obvious. An example of a VRE can be found in archaeology – the Virtual Research Environment for Archaeology (VERA) project. It will address user needs, enhancing the means of efficiently documenting archaeological excavation and its associated finds, and create a suitable Web portal that provides enhanced tools for the user community. VERA aims to develop utilities that help encapsulate the working practices of current research archaeologists unfamiliar with virtual research environments. However, the concept of the VRE is evolving, and the intention is not to produce a complete VRE, but rather to define and help to develop the common framework and its associated standards and to encourage others to work within this framework to develop and populate VREs with applications, services and resources appropriate to their needs.

270 Workflow Processes and Virtual Research Environments

18.4 Summary The following chart shows how the increased specialisation of scientific research is leading to a greater emphasis on serving the work flow process rather than just providing access to content. The user desktop becomes the key target for the new collaborative information services. These services currently take a multitude of forms.

Chapter 19

The Semantic Web as a Driver for Change

19.1 The Challenge of the Semantic Web The original Scientific American article on the Semantic Web appeared in 2001. It described the evolution from a Web that consisted largely of documents for humans to read, to one that includes data and information for computers to manipulate. The Semantic Web is a Web of actionable information – information derived from data through a semantic theory for interpreting the symbols This is still a challenge for the future. It may take some ten years before the full effects of the semantic web are felt. It remains a distant vision of some of the leading communication scientists, one which is worth considering in developing strategies for publishers and libraries for mid to long term, but its impact currently remains marginal. The semantic web is in fact an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a format that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. It derives from W3C director Sir Tim Berners-Lee’s vision of the Web as a universal medium for data, information, and knowledge exchange. At its core, the semantic web comprises a philosophy, a set of design principles, collaborative working groups, and a variety of enabling technologies. As suggested some elements of the semantic web are expressed as prospective future possibilities that have yet to be implemented or put into practical effect. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain.

19.2 Critiques The Semantic Web remains largely unrealised. Shopbots and auction bots abound on the Web, but these are essentially handcrafted for particular tasks; they have little ability to interact with heterogeneous data and information types. Because the large-scale, agent-based mediation, services have not yet been delivered some pundits argue that the Semantic Web has failed to materialise. However, it could be

272 The Semantic Web as a Driver for Change

argued that agents can only flourish when standards are well established and that the Web standards for expressing shared meaning have progressed steadily over the past five years. Furthermore, the use of ontologies in the e-science community is only just emerging, presaging ultimate success for the Semantic Web – just as the use of HTTP within the CERN particle physics community led to the revolutionary success of the original Web. Other critics question the basic feasibility of a complete or even partial fulfillment of the semantic web. Some base their critique from the perspective of human behaviour and personal preferences, which ostensibly diminish the likelihood of its fulfillment. Other commentators object that there are limitations that stem from the current state of software engineering itself. Where semantic web technologies have found a greater degree of practical adoption, it has tended to be among core specialized communities and organisations for intra company projects. The practical constraints toward adoption have appeared less challenging where domain and scope is more limited than that of the general public and the world wide web. The original 2001 Scientific American article (from Berners-Lee) described an expected evolution of the existing Web to a Semantic Web. Such an evolution has yet to occur, indeed a more recent article from Berners-Lee and colleagues stated that: “This simple idea, however, remains largely unrealised.”

19.3 Web Science Research Initiative Nevertheless, to apply these concepts and procedures to the scholarly information world and electronic publishing in general, the University of Southampton and Massachusetts Institute of Technology (MIT) have announced the launch of a long-term research collaboration that aims to produce the scientific fundamental advances necessary to guide the future design and use of the World Wide Web. The collaboration includes Sir Tim Berners-Lee. The Web Science Research Initiative (WSRI) will generate a research agenda for understanding the scientific, technical and social challenges underlying the growth of the web. Of particular interest is the volume of information on the web. WSRI research projects will weigh such questions as: How do we access information and assess its reliability? By what means may we assure its use complies with social and legal rules? How will we preserve the web over time?

19.4 Examples of Semantic Web in scholarly publishing One specific example of a semantic web approach in operation has been aluded to in an earlier chapter (on the semantic journal). The Royal Society of Chemistry has introduced a set of automated procedures for its service entitled Prospect (see earlier). Another example comes from the author sector rather than a publisher. Professor Peter Murray-Rust from Cambridge University has been a longstanding advocate of using new informatic and unconventional techniques to disseminate chemistry information online. This is in part related to the wastage which is currently going in as far as reporting on chemical research is concerned. He claims that 85 % of

Examples of Semantic Web in scholarly publishing 273

the crystallographic data produced at Cambridge University’s chemical labs are thrown away, and for spectral data this rises to 99 %. In part the current practices of publishing are at fault. The reliance on PDF for publication is, in his opinion, nothing short of a disaster. It makes life difficult if one’s intention is to use and reuse information in a multiplicity of ways. XML should become the basis for publication, involving more cost but allowing greater interoperability. According to Murray-Rust, the process of communication should not only involve humans but also machines. Underlying all this is the need for a dynamic ontology. At Cambridge University Murray-Rust has been involved in a project which applies the semantic system to doctoral theses. It is an open system which makes use of OSCAR as the editing system, a system which was written by undergraduates and supported by the Royal Society of Chemistry. The ‘machine’ reads the thesis, tabulates it, adds spectral and other data where appropriate. If there are any mistakes the robot finds them. The document is composed in XML. The whole process is dynamic, not a series of static pictures as with PDF documents. One of the students at Cambridge has also developed a semantic overlay journal which focuses on crystallographic data. CrystalEye involves a robot collecting and referencing data which it obtains from the web. It also makes use of free, open software developed by the community. In this case some 20 people write and maintain the BlueObelisk software ‘for fun’. 19.4.1 Knowlets Within biomedicine there are new ways being looked at to communicate research results in a quicker and more efficient way, compared with book and journal publishing. One of these is a project being run from Rotterdam in the Netherlands. Dr Barend Mons leads a research team called KNEWCO, which combines the knowledge of biomedical research requirements with a proposal for a system which would unite aspects of the current journal publishing system with web 2.0, wikis and semantic web developments. His view was that there is an unstoppable move away from a text-based publishing system to one which deals with small nuggets of facts within a social networking system which would provide the quality control and commentary and thereby propel biomedical research forward. Though articles would be required in future, reading through a long article on, say, malaria, only to find one small factoid near then end which could be useful, was a waste of time and resources. Such nuggets would become KnowletsTM, and would be included within an OmegaWiki-like database, to be commented on by wikiauthors with authority. These knowlets would have unique identifiers. After the recognition of individual concepts in texts, the Knewco KnowletTM technology makes reference to these concepts in an associative matrix of all concepts. This matrix contains ‘associative distance’ between each pair of concepts. Using Knewco’s meta-analysis algorithms, they create a multidimensional ‘concept cloud’ or Knowlets of the indexed paper. The semantic representation has information contained in it that is not based on the document itself alone, but also on the entire set of common, established and potential knowledge about the subject. In the case of Biomedical Life Sciences, KnowletsTM comprise the established knowledge in the Medline space and therefore includes an extra element of ‘interpretation’ over thesaurus-based and disambiguated concept lists.

274 The Semantic Web as a Driver for Change

These knowlets, or ‘barcodes of knowledge’, address some of the complexities of biomedical research – complexities arising from the data involved, incompatible data formats, multidisciplinarity and multilingual papers, ambiguity in terminology and the failures of the current system which has an inability to share knowledge effectively. Knowlets are currently identified by and for life scientists, and wikis are used to comment on them within the community. There are wiki proteins, wiki authors (which includes their unique IDs and publication records), wiki medical/clinical, and wiki phenotypes – each being exposed to the ‘million minds’ approach. Essentially the aim is to eliminate the barriers which stop people getting immediate access to research results. Only respected life scientists will be part of this wiki community. For more information see www.wikiprofessional.info. These are early examples of where the semantic web approach is beginning to bite in the traditional scientific publishing arena. These experiments are supported by the community, in the case of the Murray-Rust work with financial assistance from JISC, the Royal Society of Chemistry and Unilever PLC. Microsoft is also involved in getting latest versions of Microsoft Word adapted to become linked into the XM L process. It is also working on an Open Science for Chemistry.

19.5 Implications of Semantic Web for EP As illustrated above, parts of the semantic web are being applied selectively in scholarly communication, and over the years the progress being achieved in the development of reliable standards, protocols and procedures will inevitably impact on the scholarly communication system in some way. But when and how remain open questions. The Web 3.0 which will harness the promises of the semantic web are still a glimmer in a few enthusiast’s eyes.

Chapter 20

Mobile Devices as Driver for Change

20.1 Background Handheld or mobile devices may be the next new frontier for information delivery. The ubiquity of mobile phones poses a challenge for electronic publishing given that there are an estimated 3 billion mobile users worldwide – the challenge is to ensure that EP comes to terms with a consumer revolution in personal communications. It is not immediately obvious how the two will marry. Although power and functionality is increasing rapidly with the mobile phones, the big limitation is the size of the display. With such a small screen some of the requirements which a researcher may have – full page display, ease of navigation within a document, linking between items, access to the supporting data, and ability to manipulate and compute online – are missing. These and other functions are needed to be on tap, and a mobile phone can at best only deal with a few of them. A distinction may need to be made between the application and the background technology. The application is the handheld device, the technology is wireless. The latter is the mechanism for access to the material without the use of hard wiring.

20.2 The wireless economy In terms of technologies, wireless is up-and-coming. Two-thirds of users currently use wireless type devices (of which 70 % involve the supply of content). The younger generation, under 30 years, are more likely to use wireless devices than the older ones, though not to the total exclusion of the over 50’s. The latter often have time on their hands to experiment with new devices. The next main stimulus to wireless device usage will be the combination of content through wireless devices into workflow. Wireless technology is becoming generally available – whole institutions, even cities, are going wireless. Content could therefore become available to everyone at anytime. There are several aspects of the wireless technology: • Wireless – as used for accessing the Internet without a cable connection • Mobile – which involves a cellphone-based access • Ubiquitous computing – where computers are embedded in the web itself

276 Mobile Devices as Driver for Change

• Tetherless – where information is everywhere and accessible without a plug,

eg. where clothing has computers built in. The impact on the domestic household will be dramatic. Soon fridges, clocks, air conditioning, etc, will be operated by the web. It will lead to a massive decentralisation of information. The device ‘formerly known as the cellphone’ will take over. Buildings will react to the person’s existence as appropriate. As far as the publication of books and journals are concerned there is growing attention being focused on providing mobile access to the scholarly information which traditionally has been static in delivery terms. How this will manifest itself is still being resolved.

20.3 Intelligent spectacles? At an Academic Publishing in Europe conference in Berlin (January 2007) Dr Hermann Maurer of the Technical University at Graz described a system where the researcher would receive information through spectacles which not only received data but allowed the researcher to interrogate remote databases through ‘virtual’ keyboards and receive the data directly through transmission into the retina. This development may well be beyond the time frame of strategists working on the practical realities of scholarly communication in the future but it is a ‘vision’ nonetheless which indicates where technology may be taking the community.

20.4 Amazon’s ‘Kindle’ More specifically and practically, at the end of 2007 Amazon launched its digital reader, dubbed the “Kindle”, in what appears an attempt to do for literature what the iPod has done for music. The 101/4 oz (290 g) Kindle can download a book wirelessly in under a minute and store up to 200 volumes to be read on its “electronic ink” screen. The name Kindle apparently came from the “concept of kindling the love of reading”. Amazon has spent three years developing the device with a technology called “E Ink”. The Kindle went on sale in 2007 in America for $ 399 (£ 195). It offered access to about 90,000 books and a dozen daily newspapers, including The Wall Street Journal and The New York Times. Customers can download best-sellers for a discounted $ 9.99, and classics from such as Dickens sell for $ 1.99 each. Single copies of leading newspapers cost the user 75 cents, or customers can pay a monthly subscription. The device also offers access to the Wikipedia online encyclopedia and about 300 blogs. It also plays MP3 music files and has a slot for a memory card so that it can hold hundreds more books. The hi-tech device has a wedge shape that feels like a paperback book. The left side is about as thick as a book’s spine while the right side is thinner, like opened pages. “We want to make sure everything anyone wants to read is on the Kindle”, said Laura Porco, Amazon’s director of digital text on the launch of Kindle. “We won’t stop until we can offer millions of books.” A key technical feature is the E-Ink system. E-Ink particles are activated electronically to form the words on the page, giving the screen the matte quality of ink on paper. But because there is no

Google’s Open Handset Alliance

277

backlighting, the 6 in screen is easy on the eyes. The battery can last for up to a week of reading. The user cannot scroll down, as on a computer but instead must flip the whole page backwards or forwards using controls that run along the side of the device. Kindle users connect to Amazon’s online bookshop to browse and pick the volumes they want. But the Kindle is equipped with mobile phone technology that means that the user can download books anywhere within the USA. Its EV-DO phone technology is not supported in Europe and Amazon officials have refused to say, at the time of writing, when they plan to market a European version of the digital book. Not all publishers have agreed to sell to Kindle users. Penguin USA in particular has balked at Amazon’s prices for its best-sellers.

20.5 Google’s Open Handset Alliance Meanwhile Google has announced the launch of its Open Handset Alliance to create an open platform (to be called Android) for a Linux phone that can run mobile Google applications and others. The 34 existing partners include T-Mobile, Sprint Nextel, NTT Docomo, China Mobile, Telefonica, Telecom Italia, Motorola, Samsung, HTC, Qualcomm, Intel, and Google itself. No mention was made of Verizon, AT&T, Vodafone, or Nokia (which is pushing its own development platform). Google itself claims that it is not following Apple by announcing a Gphone. However, they claim that their Open Handset Alliance and Android is more significant and ambitious than a single phone. In fact, through the joint efforts of the members of the Open Handset Alliance, they hope Android will be the foundation for many new phones and will create an entirely new mobile experience for users, with new applications and new capabilities inconceivable today. Android is the first truly open and comprehensive platform for mobile devices. It includes an operating system, user-interface and applications – all of the software to run a mobile phone, but without the proprietary obstacles that have hindered mobile innovation. This is despite reports that Google is ready to announce its Gphone. It is more a reference design, than a single phone. Android-based phones will start to come out on the market in the latter half of 2008. The software development kit will be available in late 2007. There is nothing concrete in terms of products or services, but going mobile represents a major growth opportunity for Google, which wants to bring the Internet (along with search and contextual ads) to one’s phone. Google CEO Eric Schmidt has been reported as saying: “We want to create a whole new experience for mobile users. This will be the first fully-integrated software stack, including an operating system and middleware, being made available under the most liberal open-source license ever given to mobile operators (and handset makers).” However, he also claimed “This is not an announcement of a Gphone”.

278 Mobile Devices as Driver for Change

20.6 Future Developments With these two big operators unveiling their responses to the challenge of wireless meeting EP, this is likely to presage a number of similar initiatives in future. Each of these will assist in the wider adoption of electronic publishing services delivered through mobile devices using wireless technology. The support for handheld devices will also be given by the parallel developments in major book digitisation programmes, notably from the Internet Archive, Google and Microsoft. These and similar events will provide more digitised material which will be available for delivery through a number of devices including mobile. This is where archiving and digitisation meets wireless technology. The developments in archiving and preservation will be explored in the next chapter.

Chapter 21

Archiving and Preservation as Drivers for Change

21.1 The Challenge of Archiving and Preservation It appears according to pundits that an investment of about US$ 750 million would be enough to cover the digitisation of all books in the USA. This compares with an annual library spend in the US of about US$ 12 billion. The Internet Archive (IA) is attempting this, and currently has 150,000 books in eight different collections in its store. Other formats also being investigated include: • Audio – around 2–3 million discs have been made in total (which would amount

•

•

• •

to 1 gigabyte each uncompressed). 40,000 of these are concert recordings. These would cost US$ 10 per disc to digitise, or a total investment of US$ 20–30 billion. 100,000 are currently available on the Internet Archive. Films – there are approximately 200,000 films in existence, 1,000 of which are included in the Internet Archive. Many of these relate to old advertisements, training and other non scholarly information items. Television – there are 400,000 TV channels, of which 20 are currently being archived on the IA. This would amount to 1 petabyte of information. It would cost $ 15 per video per hour to digitise. Software – there are some 50,000 software titles available. The Web – for some time the IA has been taking snapshots of available websites

Much of this needs to be preserved and archived. Using the Library of Alexandria as the model (and in fact using the new Alexandria Library as a mirror site for the Internet Archive holdings), Brewster Kahle from the Internet Archive is aiming to offer bulk access through his Wayback Machine. Besides the technical issues which are being addressed by the Internet Archive there are also many political and social issues which have to be considered. How much of the archived material should be public and how much private? Should it be open or proprietary? There are still many issues to be investigated.

21.2 Preservation and Access With regard to keeping scholarly publications available for posterity, there are a whole set of questions which librarians in particular have – such as permanent access after licensing terms have expired, and whether librarians can take over some of the dissemination functions involved in preservation and access; publish-

280 Archiving and Preservation as Drivers for Change

ers want to guarantee permanent availability of their publications; authors want everlasting availability but also ability to reuse information. This is not a simple matter – deterioration of digital media in storage is more pronounced than with paper and the issue of technical obsolescence, where contemporary applications can no longer view the old file formats, also has to be addressed. There are a number of strategies that can help improve the probability of achieving future access. These strategies include: • Changing the physical carrier (Disk, CD-ROM, tape, etc) • File format migration to use contemporary applications to view the content • Emulation of the original computing environment so that all the original soft-

ware layers (operating system, middleware, applications software, etc) can be used to view the content An ongoing permanent and sustained R&D effort is required to turn these strategies into operational reality. Meanwhile, there are several main services currently available for e-journal content • (C)LOCKSS and LOCKSS (a collaboration of libraries organised by Stanford

University library) • Portico, a publisher and library collaborative, established by Ithica in the US • Safe Places • Koninklijke Bibliothek (KB) the Dutch national library, with its e-Depot dark

archive fitted in the third of these alternatives. Many of the main commercial and learned publishers, such as Elsevier, Kluwer, BMC, Blackwell, OUP, T&F, Sage and Springer, have all bought into KB’s e-Depot scheme since its launch in 2002. However, there are still challenges to be met. For example, there is still a major organisational challenge – the output of scholarly communications is for international consumption which demands a global archiving/preservation response. Another challenge is to avoid duplication of the same research results at different stages in the publication cycle (the ‘versioning problem’). Morgan Stanley, in one of their financial overviews of the industry, has pointed out that $ 1 billion of scholarly publications were now born digital, with no print as the primary output. All this requires an international commitment to preservation, which would involve a small number of permanent archives around the globe, each with a sustained R&D effort to ensure consistency. Currently e-Depot impose restrictions on access (a dark archive) with onsite use and interlibrary loans within the Netherlands (for a print derivative) being the sole access routes. Each side – publisher and KB – currently pay their own costs. There are no incentives to provide access. However, the B2B advantages conferred by such a scheme as e-Depot would have to result in a change in the business model. That devolving charges (small costs) on to the publishers may have to be introduced. Such a project has been launched in 2007 by the British Library – it has offered to guarantee access to individual archives from publishers in return for a fee which the publisher will pay. The long-term preservation of electronic scholarly

Archive requirements 281

resources will require deliberate, careful, and sustained effort that extends beyond the harvesting of web pages or reliance upon any single organisation. As a community, we are obviously still wrestling with how to preserve our growing number of important electronic resources. We are still trying to imagine what shape reliable archives of these materials might take. The Wayback Machine offers one example; LOCKSS, national libraries, and institutional repositories offer other models. The critical question now is how will we assess the viability of any particular approach? What elements are necessary to ensure the long-term preservation of and access to electronic scholarly materials? If we are to effectively preserve these resources for the long term – to “archive” them – then as a community we must have a broad-based and thorough understanding of the characteristics of a trusted, credible archive. There are several components which must be present in any trustworthy archive. The 1996 Report of the Task Force on Archiving of Digital Information and the 2002 report Trusted Digital Repositories: Attributes and Responsibilities offer clear and useful descriptions of these elements.

21.3 Archive requirements Long-term preservation of and ongoing access to digital materials requires at a minimum five organisational components specifically dedicated to or consistent with the archival objective mission; business model; technological infrastructure; relationships with libraries; and relationships with publishers. Without at least these five, the future of an electronic resource cannot be assured. There may be other important components as well, but these offer a necessary foundation. 1) Organisational mission – This component is absolutely critical because it drives the resource allocation, decision-making, and routine priorities and activities of the organization. When an organisation’s mission is to be an archive it will by necessity dedicate its available resources to this core activity, avoiding the all too frequent competition between preservation needs and other priorities. Similarly, when long-term preservation is mission critical, preservation values and concerns will necessarily inform the shape of an organisation’s routine procedures and processes. 2) Business model – An archive must generate a diverse revenue stream sufficient to fund the archive, including both the considerable cost of developing the archive’s basic infrastructure and the ongoing operation of the archive over the long term. A single source of funding – a single donor, a government agency, or a foundation – should be evaluated carefully for its ability to support the longevity of the archive. Noble efforts tend to come and go with the shifting priorities of those who control the purse strings. 3) Technological infrastructure: This infrastructure must support content ingest, verification, delivery, and multiple format migrations in accordance with accepted models such as OAIS and best preservation practices. It must include and support the automated and manual quality control processes necessary to protect the

282 Archiving and Preservation as Drivers for Change

ongoing integrity of the materials and to protect against format or hardware obsolescence. 4) Relationships with libraries: The archive must meet the needs of the library community, and it must find a way to balance these needs with those of other participants in the scholarly communication process taking into account, for example, what content should be preserved for the long term. 5) Relationships with content producers: The archive must establish agreements for the secure, timely, and reliable deposit of content, and it must work with publishers and other content producers to secure the rights necessary to archive the material entrusted to its care. These components could be implemented in any number of organisational models. Indeed, the community will be best served by having multiple organisations serving as trusted archives. But if we are to develop a network of trusted archives we must first find a way to evaluate the efficacy and reliability of proposed archiving models. Doing so is an essential step toward an important goal: a trusted, reliable, and long-lived record of scholarship. According to Brewster Kahle with current digital technology we can build digital collections, and with digital networks we can make these available throughout the world. The current challenge seems to be establishing the roles, rights and responsibilities of our libraries in providing public access to this material. Besides the technical issues which are being addressed by the Internet Archive there are also many political and social issues which have to be considered. How much of the archived material should be public and how much private? Should it be open or proprietary? There are still many issues to be investigated.

21.4 International Collaboration The scale of the problem has been recognised and a number of collaborative groups have been established to ensure that archiving and preservation issues are not ignored by policy makers. Two key groupings have been established on both sides of the Atlantic during 2007 and early 2008. 21.4.1 US-based Task Force on sustainable digital preservation and access A ‘Blue Ribbon Task Force’ has been set up to develop recommendations for the economic sustainability, preservation of, and persistent access to, digital information. The Blue Ribbon Task Force on Sustainable Digital Preservation and Access is co-chaired by Fran Berman, Director of the San Diego Supercomputer Center at the University of California, San Diego, and a pioneer in data cyberinfrastructure, and Brian Lavoie, a research scientist and economist with OCLC. The Task Force will convene a broad set of international experts from the academic, public and private sectors who will participate in quarterly discussion panels. It will publish two reports with its findings, including a final report in late 2009 that will include a set of recommendations for digital preservation, taking into account the prevailing economic conditions.

International Collaboration 283

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access was launched by the National Science Foundation and the Andrew W. Mellon Foundation in partnership with the Library of Congress, the Joint Information Systems Committee of the United Kingdom, the Council on Library and Information Resources, and the National Archives and Records Administration. Their two-year mission is to develop a viable economic sustainability strategy to ensure that data will be available for further use, analysis and study. The first meeting of the Task Force was held in Washington, DC in late January 2008. The group will also establish a public website to solicit comments and encourage dialogue on the issue of digital preservation.

21.4.2 The European Alliance for Permanent Access The Blue Ribbon Task Force is complemented by similar developments in Europe. During 2007 senior executives from some of the main European-based research centres, funding agencies, libraries and archives have been working on the establishment of what has become known as the Alliance. The Alliance is about to be launched in early spring 2008 with a permanent director and a board which represents the key players in the European sector. The aims of the foundation are: • To support the development of a sustainable European Digital Information

infrastructure that guarantees the permanent access to the digital records of science, whether documents or data, across all fields of research, scholarship and technology. • To be a strategic partner for the Commission of the European Union and national governments to strengthen European and national strategies and policies and their implementation in the area of long-term preservation of, and access to the digital records of science, and thereby contribute to Europe as an Information Society. • To be a platform for enabling key stakeholders in the world of science and scientific information to cooperate amongst themselves and with other organisations on digital repositories for science. • To strengthen the role of European parties in worldwide efforts in long-term preservation and access of the digital records of science. These two initiatives on both sides of the Atlantic demonstrate the growing awareness of the importance of digital archiving and preservation as a feature of the scholarly and research processes.

Social Drivers

The Google Generation

Reference has already been made in Chapter Three to the emerging digital native, the millennium or Google generation, as a feature of the information industry. We return to this theme as besides being a structural feature it is also a driver for change in the industry. It was pointed out that there is a new typology, differentiating the market for information according to age or generation. It was loosely defined as the ‘X’ generation being those born before 1980’s; the ‘Y’ generation being those born since before the 1990’s, and ‘the Google Generation’ as being those born since 1993. This typology was meant to highlight the behavioural differences which could be a result of the media types they are confronted with now, as opposed to earlier, in the process of growing old. The UCL Consultants/CIBER study pointed out that most students – 89 % of the total students – begin an information search with the major search engines such as Google, and only 2 % start with the library web site. More particularly, according to the UCL study, there has been the emergence of ‘promiscuous’ search behaviour, flitting between various information resources, bouncing, and horizontal in nature, spending little time on each resource but downloading extensively and squirreling the results – a ‘power browsing’ mentality has taken hold. This may suggest a new way of acquiring and assimilating information. However, despite being exposed to new forms of visual and interactive entertainment at an early age, the Google Generation is not any better or worse in its information literacy than earlier generations. What it does do is rely on comprehensive ‘brands’ such as Google to be their first and in some cases only source for information gathering. GoogleGen makes little use of advanced search facilities, assuming instead that search engines ‘understand’ their queries. Nor has social collaborative tools come to their aid as yet, according to UCL Consultants. The overall results from the UCL study suggests there has been an overestimate of the impact of ICT on the young (and underestimates its effect on the older generations). So we start this section of the book with a conundrum. On one hand there is evidence, much of it anecdotal, that the new generation of end users are driving forward a massive change in the electronic publishing industry ‘because they are different’ from the X and Y generations. On the other hand the UCL study suggests that the differences are not that significant – in overall terms – and that we see a behavioural conservatism in the information market. One which is not propelling the industry forward into a new and unknown future. We need to look at some of the social drivers to see how relevant they are in effecting industry change.

Chapter 22

Findability as a Driver for change

22.1 The rise of Search Engines Though search engines have existed since the first bibliographic databases were developed in the 1970’s, these were usually specific to a particular subject or database. Only in the past two decades have aggregated powerful search engines emerged, with Dialog and SDC battling it out in the early 1980’s. They both offered in many respects complex search procedures to access a package of databases that they gathered together in their collections. There was little interoperability between them and other existing more specialised search services. It was not until the early 1990’s that the large generic easy-to-use search engines appeared on the market, with AltaVista, AOL, Yahoo leading the charge into a more mass-market search system. Only towards the end of the 1990’s did we see another large jump in search technology. Google came out with a unique way of prioritising search results that met user expectations. Comprehensive content coverage combined with selectivity based on a logical formula, offered an irresistible service to the growing number of online users worldwide. Search engines emerged as a powerful new tool in driving EP forward.

22.2 Resource Discovery and Navigation As first conceptualized by Evans and Wurstler (“Blown to Bits”), once the glue had been melted between the descriptors (headers, metadata) and the ‘stuff’ (the item, article) a new business is borne. Amazon showed how quickly the traditional bookselling business could be overtaken by the new services dealing solely in book metadata. Amazon’s corporate value soon outpaced the value of the entire physical bookselling industry. Google, within ten years, achieved a corporate net worth greater than the sum of the electronic publishing industry on which it relied for some of its content using the same formula. The emergence of search engines provided a single interface to search the vast amount of descriptive text that arose from this new descriptive part of the electronic publishing sector. The ease with which this has been done, and the sophisticated technology for indexing and tracing relationships between the indexed items, provided a new and powerful insight into information and data. Suddenly a whole new audience emerged, one that had never adapted to the complex search routines of traditional abstract and indexing services. These

290 Findability as a Driver for change

new users went for simplicity, ease of use, and ‘something is good enough’ approach.

22.3 How users find information Outsell has investigated where the Internet users go for their information. An interesting finding according to Outsell is that the Internet/web has declined as being the main source of information from 79 % to 57 % in their user sample. Increasingly this ‘user sample’ goes to their intranet (5 % rising to 19 %). The library has remained stable at about 3–4 %, whereas work colleagues have risen from 5 % to 10 %. ‘Others’ have been 8–10 % (of which vendors account for about 4 %). Users seem to be increasingly more enthusiastic about local intranets – they are getting more sophisticated, and it is a case of ‘if it is not accessible through the intranet it doesn’t exist’. This finding contradicts claims by Google and other major search engines that they dominate the information acquisition function. The proportion of total time available for an academic taken in seeking information has risen from 44 % to 55 % of their work time, but it is felt that this is not an efficient use of such time. The overall search failure rate is put at 31 %. There are more and more information sources, hence decision-making becomes more and more difficult. If they can’t find information they tend to ask a colleague (64 %) as the main second source. E-mails are the principle form of alerts being received (77 %). Blogs represent 45 %, whereas RSS are only 20 %. Blogs and podcasts have stronger use among the under 30’s category. The main message from the Outsell report is that it is essential to look at what the young are doing as drivers for new format adoption. Hugh Look, Senior Consultant, Rightscom, has also reported on ‘Researchers and Discovery Services Survey’, a report Rightscom undertook for the Research Information Network(see http://www.rin.ac.uk/researchers-discovery-services). The aim was to assess the use and perception of sections of resource discovery services by academic researchers in the UK. The results were also intended to help determine priorities in the development of future services. The study was based on a telephone survey of 450 research-related personnel in UK universities, 395 of which were researchers (at PhD level and above) and 55 librarians and information officers across all disciplines. The term ‘user’ was broadly defined. In-depth interviews with postdoctoral researchers complemented the main interviews and were used to assess differences between those who had grown up as researchers in the Internet environment and those who had not. ‘Resource discovery services’, also broadly defined, included bibliographic A&I services, general Internet search services, dedicated guided portals (Intute in the UK), institutional library catalogues and portals, and libraries and librarians themselves. The main results were: • General search engines are the most used, among which Google is used more

than any other. Within the library community, the internal library portal was the next most used service. This is in contradiction to the Outsell findings in the USA.

How users find information 291

• Satisfaction with discovery services is high, predominantly among researchers

•

•

•

•

•

•

•

•

and scientists. In arts and humanities there were more concerns about gaps in service coverage. The issue of access (e.g. accessing a document once located) generated greater frustration among researchers and librarians than that of discovery. Another frustration concerned the lack of clear delineation between means and ends (between discovery services and what is being discovered). Most researchers rely on a range of resource discovery tools and select an appropriate tool for a specific inquiry. Researchers in the social sciences appear to use a wider range of resource discovery services than those in other disciplines. The most heavily used resource discovery sources include general search engines, internal library portals and catalogues, specialist search engines and subject-specific gateways. The pattern of researchers’ named discovery resources is expressed by a long tail; a very few resources – Google, Web of Science/Web of Knowledge and ScienceDirect – are named by a large number of researchers. Researchers also showed reliance on interfaces such as Athens (an authentication system) that they inaccurately identified as being content services. Among the range of resources found through the use of discovery services, journal articles are the most important. Virtually all researchers (99.5 %) rely on the journal article as a key resource. Over 90 % also use chapters in multiple author books, organisation websites and individual expertise. The next most cited resource – monographs – is mentioned by only 32 % of researchers. Peers and networks of colleagues are shown to be extremely important for virtually every type of inquiry. Research colleagues feature as important providers of information about resources and tools and new services, and this is particularly the case for postdoctoral researchers. Some researchers use email listservs, however online social networking services have been less popular. Colleagues are relied upon for locating individuals, initiating research, discussing research funding and locating data sets. The majority of researchers work by refining down from large sets of results. Surprisingly, researchers were more concerned about missing important data than they were about the amount of time spent locating information. Concerns were also expressed about being overwhelmed by email, and in every discipline researchers bemoaned the number of irrelevant results delivered by general search engines. With respect to emerging tools, blogs were shown to be little used. A majority of researchers (62 %) obtain regular information updates and alerts from services pushing information to their desktops, and email is the preferred tool for this (not RSS feeds). A smaller number use alerts on funding sources from research councils or specialist services. Sources for keeping up to date include journals themselves, email alerts, conferences and conference proceedings, among a wide range of ‘other’ sources that were not discussed in detail. The focus of library activity has shifted and library support is now being delivered more often through the services provided than personal contact. Librarians’ and researchers’ views diverged on a number of key issues including quality of discovery services, availability of resources, and gaps and problems. Researchers do their own searches in the vast majority of cases. Librarians overrated the importance of datasets to researchers and they used general search

292 Findability as a Driver for change

•

•

•

•

•

engines far less frequently than researchers. Librarians perceived researchers as conservative in their use of tools and were concerned that they were not reaching all researchers with formal training. Researchers did not perceive this to be a problem. Specific gaps in provision included access to foreign language materials, lack of distinction between actual sources and discovery services, difficulties in locating specific chapters in multiple-authored works due to lack of general indexes, and too-short backfiles of journals. A plea for ‘one stop shops’ was made across the board. Researchers have come to agree that ‘the more digital, the better’. Most expressed concern about not having access to a sufficient number of digital resources. Problems cited included institutions not subscribing to the full text of the e-journal, and overly short electronic backfiles. The data showed fewer differences between experienced cohorts than one might expect. Frequent and regular use by experience or age did not play a significant role, although the younger group stands out clearly in the use of blogs. Differences between disciplines are somewhat more marked. Researchers in the life sciences make more use of their colleagues than in other disciplines. In the physical and life sciences, researchers tend to use general search engines more than average. The library portal is used more frequently by arts and humanities lecturers. Google is not being used for mission-critical applications. Rather it is relied on, often in combination with other tools, to locate organisations and individuals, references, or to research a new area. A wide range of resources are used, including bibliographic databases, Google, internal portals, Web of Science/Web of Knowledge. The category ‘other’ (46 %) reveals a wide variety of discipline-specific resources.

Hugh Look also indicated that the boundaries between resources themselves and discovery services are increasingly permeable, a trend that is likely to continue as new forms of content aggregation are developed. With too much information being produced, this leads to less attention and demands for greater relevance. There is therefore the need to optimise delivery of content through making the discovery process easy, to build in relevance and ensure continued engagement.

22.4 The Findability Challenge Currently the dominant player in the findability area is Google. Though as an organisation it is often coy about its operational features, it had a 2005 revenue of $ 6,138 million which two years later has doubled to $ 13,430 million, has 16 US locations within USA employing 3,500 staff, and a further 3,000 staff in the rest of the world. It has a market capitalisation of $ 164 billion. It has a 32.5 % operating margin. Although in 2005 it had 2,230 new jobs available, it received over 1.1 million applicants – that says a lot about the company’s image and its extensive employee benefits. But the real attraction of Google is its dominance in the search and online advertising sectors, with a range of new innovative information services being churned out in rapid succession.

Case Study: The Google mantra 293

However, Peter Morville wrote in his book “Ambient Findability” about the challenges in finding one’s way through the mass of data and information currently available. He points out that this is evolving and not necessarily consistent – he quotes “the future exists today – it is merely unevenly distributed”. (William Gibson). In his book Morville challenges the suggestion that Google offers a good service – “if you really want to know about a medical complaint you don’t rely on Google but rather on NIH’s Pubmed database”. People are in general faced with a bewildering array of information formats (magazines, billboards, tv, etc) which Morville claims leads to a loss of literacy. Ambient findability is less about the computer than the complex interactions between humans and information. All our information needs will not necessarily be met automatically. Information anxiety will intensify, and we will spend more time rather than less searching for what we need. Search engines are not necessarily up to the task of meeting future needs. They tend to be out-of-date and inaccurate. However, they are trying to rectify some of the emerging weaknesses by improving their technology. For example, they are undertaking SEO – Search Engine Optimisation – ensuring that the software throws up the top ten results that are most relevant to the end users. Publishers, in turn, try to anticipate this by making sure that they understand the new algorithm and include data in their published material (metadata) that ensures a high hit rate. Whilst the search engines pride themselves on speed, this excludes the subsequent activity the end user has to go through in bypassing splash pages and other interferences in reaching the data. But even here, according to Bill Gates, what is important is not the search but getting the answers. Faced with the above there are opportunities for new ‘vertical’ search services. Google cannot be so precise and filtered as a targeted vertical search service. There are few examples of such out there at present; the question is whether these can pull back search activity from the entrenched position large generic search engines now find themselves in? The problem is that this requires hard work and investment – few publishers have shown any inclinication to create such a platform either individually or in unison.

22.5 Case Study: The Google mantra Historically, Google is a creature of the Internet. The web emerged in 1994 from Netscape Communications, and Yahoo created a hierarchical structure for content. Between 1995 and 1998 two doctorate candidates at Stanford University (Larry Page and Sergey Brin) developed a ranking system that incorporated ‘relevance’ (defined by a unique and proprietary listing of 100 items) and also Page ranking, as the underlying search technology. The system made its debut in 1998 and immediately the market swung away from services such as AltaVista in favour of this new ranking methodology. Viral marketing ensured that it took off. Google has single-handedly transformed the scholarly information landscape through a number of new ventures launched in recent years. Some of the key ventures launched include Google Print, Google Scholar and Google’s library digitisation programme.

294 Findability as a Driver for change

There is little doubt that Google benefits from the strength of its brand recognition. This came out strongly during some tests undertaken by Vividence in the US. The answers that a number of different search engines came up with were compared, and it was concluded that the differences were not significantly great. But according to Vividence, Google shone through in terms of the more subjective customer satisfaction rating, which goes to show that “search is a potentially fickle brand game, resting on perceptions and preferences rather than performance” (Outsell). As a result Google is often the first source for scholarly information by academics and researchers. Where does the revenue come from given that it is a search engine operating in what is essentially part of the ‘free access to information’ ethos of the Internet? In 2001, 77 % of Google’s revenues of $ 13.43 billion were generated through advertising, with the rest coming from enterprise search and other services. By 2003 advertising had risen to 95 %, and looks to be continuing to increase. Google’s current destiny is heavily reliant on the ability to link an advertisement directly to the search needs or profile of the online user. There have been many books written about Google. It has single-handedly transformed the structure of the information industry. One insightful analysis was written by John Battelle in 2005. In his book “The Search – How Google and its Rivals Rewrote the Rules of Business and Transformed our Culture” Battelle showed how rapid was the transformation made by two entrepreneurs, and how their efforts have made one company larger than the whole of the scholarly communications industry within ten years. They have also, as a result, potentially opened up the scholarly communications industry to a much larger audience. Larry Page first met Sergey Brin in the summer of 1995 when the idea of Google was not even a glimmer in their respective eyes. Within a decade their organisation and these entrepreneurs had changed the map of the information industry. They have transformed the individual user’s Database on Intentions into a multibillion dollar operation – not bad for two engineers who were more academic and geeky in approach than having had proven professional business acumen. Search as a process, according to Battelle, has not always been seen as the powerful force it has recently become. In the late 1990s and early years of this decade the concept of the portal, attracting the eyeballs of the user and leading them through a range of owned and proprietary services, captured the imagination of entrepreneurs and venture capitalists alike. However, the young entrepreneurs at the helm of the Google company, which in 1999 had a handful of employees and a rented office suite in a private house, had the idea of developing the mathematical algorithm which became PageRank, the heart of the current Google search process. Page and Brin, perhaps arrogantly according to Battelle, cocked a snoop at the rest of the industry and remained focused on its core activity at that stage – the less interesting search process. Everyone else was trying to develop locked-in communities and portals. But whether it was inspiration, arrogance or pure good fortune, search engines evolved and became the key players during the past few years, with their millions of dedicated followers and tremendous traffic. Portals have been relatively marginalised. Using these many digital users as the platform, Google eventually built the AdWord and AdSense contextual advertising systems, which provided a massive injection of revenues and led to a large number of the then 1–3,000 em-

Case Study: The Google mantra 295

ployees becoming millionaires overnight when the company undertook its controversial IPO stock auction in August 2004. But as the Search book reflects, it has not been an easy ride. Conflicts in culture emerged as the original motto for the company – ‘Don’t be evil’ – came up against the hard world of commerce. It came partially to the fore over the advertising issue – Page and Brin, with a professional CEO then acting as part of the triumvirate, seemed almost apologetic at initially taking advertising, on issuing an IPO which ignored Wall Street practices, and on its dealings with China which demanded censorship of certain sites. The less contentious corporate mission of ‘organising the world’s information and making it accessible’ has become their widely used mission, and the company may have made its peace with the devil. However, Google is not out of the woods yet. The Patriot Act, whereby the federal government increased its potential for tapping into not only telephone conversations but also e-mails and web usage data, highlighted the sensitive nature of the data available within its banks of parallel running computers at Google and with other major search engines. A digital footprint follows every user of the service and can be mapped and used for a variety of purposes. The clickstream has become “the exhaust of our lives” and is scattered across a wide range of services. Should this data stream or personal digital footprints be made available to the government? Is trust being broken by doing so, trust that the users can feel that their searches are not being monitored by a Big Brother? This is a heavy issue and one which is still running its course. Also, Google is making enemies in its own and adjacent industries. Google evokes ambivalent feelings. Some users now keep their photos, blogs, videos, calendars, e-mail, news feeds, maps, contacts, social networks, documents, spreadsheets, presentations, and credit-card information – in short, much of their lives – on Google’s computers. And Google has plans to add medical records, locationaware services and much else. It may even buy radio spectrum in America so that it can offer all these services over wireless-internet connections. Google could soon, if it wanted, compile dossiers on specific individuals. This presents “perhaps the most difficult privacy issues in all of human history,” says Edward Felten, a privacy expert at Princeton University. Speaking for many, John Battelle, the author of a book on Google and an early admirer, recently wrote on his blog that “I’ve found myself more and more wary” of Google “out of some primal, lizard-brain fear of giving too much control of my data to one source.” For the future. Google must maintain or improve the efficiency with which it puts ads next to searches. Currently it has far higher “click-through rates” than any of its competitors because it made these ads more relevant and useful, so that web users click on them more often. But even lucrative “pay-per-click” has limits, so Google is moving into other areas. It has bought DoubleClick, a company that specialises in the other big onlineadvertising market, so-called “branded” display or banner ads (for which each view, rather than each click, is charged for). Google also now brokers ads on traditional radio stations, television channels and in newspapers. The machinery that represents the fixed costs is Google’s key asset. Google has built, in effect, the world’s largest supercomputer. It consists of vast clusters of servers, spread out in enormous datacentres around the world. The details are Google’s best-guarded secret. But the result is to provide a “cloud” of computing power that is flexible enough “automatically to move load around between data-

296 Findability as a Driver for change

centres”. If, for example, there is unexpected demand for Gmail, Google’s e-mail service, the system instantly allocates more processors and storage to it, without the need for human intervention. This infrastructure means that Google can launch any new service at negligible cost or risk. If it fails, fine; if it succeeds, the cloud makes room for it. Beyond its attempts to expand into new markets, the big question is how Google will respond if its stunning success is interrupted. “It’s axiomatic that companies eventually have crises,” says Eric Schmidt, Google’s CEO. And history suggests that “tech companies that are dominant have trouble from within, not from competitors”. In Google’s case, he says, “I worry about the scaling of the company”. Its ability to attract new staff has been a competitive weapon, since Google can afford to hire talent pre-emptively, making it unavailable to Microsoft and Yahoo!. Google tends to win talent wars because its brand is sexier and its perks are lavish. Googlers commute on discreet shuttle buses (equipped with wireless broadband and running on biodiesel) to “GooglePlex”, which is a playground of lava lamps, volleyball courts, swimming pools, free and good restaurants, massage rooms, etc. In theory, all Googlers, down to receptionists, can spend one-fifth of their time exploring any new idea. New projects have come out of this, including Google News, Gmail, and even those commuter shuttles and their Wi-Fi systems. But it is not clear that the company as a whole is more innovative as a result. It still has only one proven revenue source and most big innovations, such as YouTube, Google Earth and the productivity applications, have come through acquisitions. As things stand today, Google has little to worry about. Most users continue to google with carefree abandon. The company faces lawsuits, but those are more of a nuisance than a threat. It dominates its rivals in the areas that matter, the server cloud is ready for new tasks and the cash keeps flowing. The test comes when the good times end. At that point, shareholders will demand trade-offs in their favour and consumers might stop believing that Google only ever means well.

22.6 Other search engines There will be a range of currently inaccessible information made available to the future search engines, information which is partly stored on one’s own PC, but also part of the immense mountain of grey literature which surpasses what is currently available on the web by a factor of ten. Future search engines will parse all that data “not with the blunt instrument of a Page-like algorithm, but with subtle and sophisticated calculations based on your own clickstream” (Batelle). A more personalised and customised set of alerting and delivery services will be developed. Also, as the cultural diversity of the different research disciplines emerge, so will specialised search engines be created which delve much deeper into the grey literature of the subject than the large search engines will ever reach. Google, Yahoo and MSN will remain as the umbrella service, pulling together specialised search services for specific sectors, but seeking a mass appeal. Not to be outdone by Google, Microsoft announced (early February 2005) its MSN Search Service. It offers powerful web, news and image searching, adds in links to freely accessed articles from Microsoft’s own Encarta encyclopaedia, and will even allow searching of user’s desktop with a free download of the beta of the MSN Toolbar Suite. The MSN Service encourages users to develop advanced

Impact of search engines on publishers 297

search skills using the Search Builder facility. This allows the user to be highly selective about the domains to be searched, and to use sliding bars to weight results according to popularity or immediacy. Microsoft further claims it will outperform research rivals such as Google and Yahoo! as its search bots will crawl the entire web every 48 hours for updates, rather than the industry standard of two weeks. According to Bill Gates, “Searching the Internet today is a challenge, and it is estimated that nearly half of customers’ complex questions go unanswered”. Providing the answer, rather than just offering the search process, seemed to be a theme. Even newer tools are emerging which is causing further changes to occur in findability. • • • •

More powerful and explicit search engines. Syndication, RSS, free access to API. Tracking, such as with newsYahoo, newsGoogle. Web 2.0 such as NewsVine, Topix, Medadiggers, Reddit, Digg, Memtrackers, Tailtracker.

There is a power that comes from being discovered by such social media services. Given that, according to the “Pew Online Activities and Pursuits” in the USA (March 2007) some 29 % of American men are online, and 27 % of American women, this indicates that there is a huge potential market evolving. Whether this use is fuelling adoption of such services, or such services fuelling use is perhaps unimportant. In effect, long-term progress will come from improving the relevance and engagement of landing pages and by intelligently changing content to suit either the source of a reader or their behaviour. But there is a long way to go to achieve this.

22.7 Impact of search engines on publishers Hidden within the pages of John Battelle’s book on ‘Search’ (see earlier) is a concept that also potentially challenges scholarly publishers in particular. Battelle ruminates on the future for the news services, given the way newspapers can be by-passed by the new decentralised information collection and customised dissemination through the web. As the web site http://www.epic.2015.com suggests, there could be an ultimate confrontation between the newspaper industry and Googlzon (a merger of Amazon and Google) later this decade, and that the search engine will win in the courts. By the same token, the inference is that when scholarly publications no longer become a destination site (as news has) but become, thanks to the search engines, a commodity, how can traditional publishers continue to exist if there is no longer a branded journal per se to purchase? As publishers continue to protect their journal subscription streams, so the argument goes, the information is no longer picked up by search engines for the future generation of digital scholars who are wedded almost exclusively to their preferred resource discovery system. The published research articles are no longer identified, are not part of the conversation within scholarly peer groups, and new channels emerge. Battelle’s recommendation for the news industry is to open up the sites, allow deep linking, and seek new value-added services. By implication, this could be the route for publishers if Battelle’s vision is brought to fruition.

298 Findability as a Driver for change

The ‘long tail’ of publications – some no longer in print – can be made live again. The ‘long tail’ is particularly pertinent in the scholarly publication sector where a real business can be made from serving the needs of the esoteric, infrequently used publications, as services such as Amazon and e-Bay have demonstrated.

22.8 Book digitisation and the Copyright issue One area where Google’s actions are causing concern is in its ambition to digitise the world’s books. Google intends to scan every book ever published, and to make the full texts searchable, in the same way that Web sites can be searched on the company’s search engine at google.com. No one knows accurately how many books there are. The most volumes listed in any catalogue is thirty-two million, the number in WorldCat, a database of titles from more than twenty-five thousand libraries around the world and collated by OCLC. Google aims to scan at least that many within ten years. Google’s is not the only book-scanning venture. Amazon has digitised hundreds of thousands of the books it sells, and allows users to search the texts; Carnegie Mellon is hosting a project called the Universal Library, which so far has scanned nearly a million and a half books. The Open Content Alliance, a consortium that includes Microsoft, Yahoo, and several major libraries, is also scanning thousands of books. There are many smaller projects in various stages of development. Still, only Google has embarked on a project of a scale commensurate with its current corporate philosophy: “to organize the world’s information and make it universally accessible and useful”. Do no evil has apparently been put on the backburner. Because of this latest ambition, Google’s endeavour is encountering opposition. Authors and publishers are challenging Google’s aspirations in court. However, it could be claimed that by being taken to court charging it with copyright infringement on a large scale might be the best thing that happened to Google as it would deter its competitors going down the same road. In 2002, Google made overtures to several libraries at major universities. The company proposed to digitise the entire collection free of charge, and give the library an electronic copy of each of its books. They experimented with different ways of copying the images, and a pilot project commenced in July, 2004 at Michigan University library. It is intended that the seven million volumes held at Michigan will be digitised within six years. In addition to forming partnerships with libraries, the company has signed contracts with nearly every major American publisher. When one of these publishers’ books is called up in response to search queries, Google displays a portion of the total work and shows links to the publisher’s Web site and online shops such as Amazon, where users can buy the book. One Google executive has claimed that “The Internet and search are custom made for marketing books. When there are 175,000 new books published each year, you can’t market each one of those books in a mass market. When someone goes into a search engine to learn more about a topic that is a perfect time to make them aware that a given book exists. Publishers know that ‘browse leads to buy’. This is living proof that the Long Tail can be made advantageous for publishers”. However, some publishers, including Simon & Schuster, the Penguin Group, and McGraw Hill, are taking Google to court not for their activities with publishers

What of the Future? 299

on the Google Book Search project but rather on the systematic digitisation that Google is undertaking with the large libraries. The vast majority of books (approximately 75 %) belong to a category known as ‘orphaned works’. These are still protected by copyright, or of uncertain status, and out of print. These books are at the centre of the conflict between Google and the publishers. Google is scanning these books in full but making only “snippets” (the company’s term) available on the Web. However, according to the plaintiffs, the act of copying the complete text amounts to an infringement, even if only portions are made available to users. Google asserts that its use of the copyrighted books is “transformative”, that its database turns a book into essentially a new product. Harvard, Stanford, and Oxford University Libraries have prohibited Google from scanning copyrighted works in their collections, limiting the company to books that are in the public domain. However, several of the public institutions that are Google’s partners, including the Universities of Michigan, California, Virginia, and Texas at Austin, are allowing scanning of copyrighted material. Because of the vagueness of copyright law, and the extension of protections mandated by the 1998 act, it is not always clear which works are still protected. In 2005, Microsoft announced that it would spend $ 2.5 million to scan a hundred thousand out-of-copyright books in the collection of the British Library. At this rate, scanning thirty-two million books – the number in WorldCat’s database – would cost Google $ 800 million, a major but not unachievable expenditure for a multibillion-dollar corporation. Among Google’s potential competitors in the field of library digitisation are members of the Open Content Alliance, which facilitates various scanning projects around the country and overseas. Funded largely by Microsoft and the Alfred P. Sloan Foundation, the OCA has formed alliances with many companies and institutions, including the Boston Public Library, the American Museum of Natural History, and Johns Hopkins University. For the moment, though, the OCA’s members are copying only material in the public domain (and works from copyright owners who have given explicit permission), which limits the scope of the projects substantially. Perhaps Google may not get book search right, as it didn’t with its own home grown attempts at video and with blogs. However, if Google strikes a deal with the publishers which allows publishers to get revenues from book digitisation this destroys the chances of other organisations competing on a level playing field with Google and deny more innovative approaches to delivering the content of digitised books to users. This argument goes on and on, but meanwhile Google continues to expand the list of libraries it is including in its book digitisation programme.

22.9 What of the Future? The basic premise is that there is much information available and no-one knows what is relevant. This creates a role for librarians to “get in the faces” of the bemused researchers and provide the required relevance. Some of the strategic issues that arise are:

300 Findability as a Driver for change

1. The main general-purpose web search engines do not effectively tackle the large and diffuse ‘invisible web’. Not only is there information which is not crawled because the robots cannot reach them, but there is also formally published material which is not picked up on the top 10–20 hits, and therefore lies ignored in the lower rankings of search results. 2. What has not changed much in seven years is how little people are willing to work at searching. The researchers Spink and Jansen found that people averaged about two words per query, and two queries per search session. “The searches are taking less than five minutes and they are only looking at the first page of results”. As a result, more power is being put into creating search engines and tools which make life easier for the typical user, and give a better ‘user experience’. Some of the features which are part of the newer search engine approach include a focus on providing ‘an answer’ rather than just a list of hits of variable relevance. A related feature is greater customisation and personalisation in the presentation of information to the user, often in anticipation of demand. Use of federated search engines (using a single front-end interface) is also on the increase, as is multimedia searching and visualisation of the output. Text mining is also being addressed to make the ‘user experience’ that much more productive in future. As a key driver for change in the electronic publishing sector, the search engines remain one of the most significant.

Chapter 23

Web 2.0 and Social Collaboration as Drivers for Change

23.1 Wisdom of the crowds ‘Wisdom of the Crowds – Why the many are Smarter than the Few’ (Little Brown, 2004).’ is the title of a book by James Surowiecki who drew attention to the wisdom which comes from asking a wide group of people their opinion on a specific topic. The theory is based on observation in practice. In 1906 Francis Galton, from Plymouth, saw bets being placed on the weight of an ox, and though he had little faith in the typical man’s intelligence, he found that the average assessment from the 800 participants was almost accurate. The conclusion from this and many similar experiments was that groups do not have to consist of clever people to be smart. This book suggests that one should stop chasing an expert and instead ask the crowd. However Gustave le Bon (1895) was one of many who claimed that the crowd became an independent organism and acted foolishly. But the classic study of group intelligence is the jelly-beans-in-jar estimate that can be fairly accurate. Collective intelligence can be brought to bear on (a) cognitive problems (where there is a definitive solution), (b) coordination problems among diverse groups, and (c) co-operation problems that involve distrustful groups. Groups work well under certain circumstances and less well under others. Diversity and independence are important. People must make individual independent guesses. And a few individuals may do better than the group. Also, the larger the crowd the better. According to Surowiecki, in the show ‘Who wants to be a Millionaire’, ask the expert got it right 65 % of the time, but ask the audience got it correct 91 % of times. There are several prime conditions that characterise wise crowds • diversity of opinion • independence of opinion • decentralisation (draw on local knowledge/expertise)

1. The value of diversity lies in the wide range of early options available. Diverse groups of both skilled and unskilled make decisions. The claims is that the value of individual expertise is in many respects overrated (or spectacularly narrow). One cannot be expert in broad subjects (policy or decision-making) or in forecasting. Expertise is only relevant in narrowly defined activities. Experts are more likely to disagree than to agree. Even if the expert who is always correct does exist it is difficult to identify him or her. However we still feel that averaging the results from a group is dumbing down the decision-making process.

302 Web 2.0 and Social Collaboration as Drivers for Change

2. The value of independence lies in two areas – it keeps the mistakes people make from being correlated, and it brings new information onto the scene. It emphasises ‘social proof’, slightly different from conformity – for example. When decisions are made as cascades they may be wrong, but when made from the input of diverse opinions they are more likely to be right. Cascades are determined by a few experts. Mimicry is also important but only when effective – otherwise it will be ignored. Fashion and style are driven by cascades. However, the more important the decision the less likely a cascade has to take hold. People’s decisions are made sequentially rather than all at once. Invention is an individual activity – but selecting between alternative inventions is a collective one. Intelligent imitation is good for the group – slavish imitation is not. Intelligent imitation requires many options being available, and also that some people put their heads over the parapet and make a judgement. 3. Thirdly, the art of decentralisation. The process of decentralisation was challenged by the intelligence community after the war, stressing the need to reduce fiefdoms and operate centrally. This goes against the fundamentals in many research disciplines which emphasise uncoordinated activity – the wisdom of crowds. Consensus is achieved using a mathematical truism – if enough people participate the errors cancel themselves out. With most things, however, the average is mediocrity – with decision-making it often results in excellence. This supports a new type of scholarly communication which is emerging – social publishing or social networks, all based around Web 2.0. When the power of the masses produces something which individual experts would not achieve easily. Wikipedia is an example of a product which has been built up using the wisdom of the crowds in a structured way. The wisdom of the crowds reaches the bedrock on which social collaboration and social networking – emerging processes in the Electronic Publishing field – are rooted. Even Google’s core search system is based on wisdom of the crowds – the PageRank algorithm is based on actions by the crowd. Amazon, e-Bay and similar online services look to the wisdom of the crowds to sustain their democratic approach to information service provision.

23.2 The Challenge of Web 2.0 The expansion of user generated media (UGM) into scholarly publishing – the grass roots creation and dissemination of information without formal organisations structuring such interaction – could be the next big challenge facing the scholarly publishing community. Tim O’Reilly, the father of Web 2.0, identified seven main themes or aspects to Web 2.0 (including user-supported sales, lightweight user interfaces, etc) of which the three most important are • ‘control over data sources which get richer as more people use it’ (such as with

Amazon);

Critiques of the Web 2.0 movement 303

• ‘create outlines from the supplier and then get the opinion of users, harnessing

collective intelligence’; and • ‘invite trusted users to become co-developers in product development and

workflow’. Combine this with the wisdom of the crowds and one has a powerful social tool. Connected and communicating in the right way populations can exhibit a kind of collective intelligence. Whilst there is a tidal wave of support in the Silicon Valley start-ups for applying collaborative tools in developing novel information services others feel that Web 2.0 is still at an embryonic, nervous and formative stage. It is still essentially something operating in the consumer domain. It still needs discernment. Either way, Web 2.0 and the social collaboration process is expected to have some impact on the electronic publishing environment – in fact there is evidence that some players in scholarly communication are already taking Web 2.0 to heart. Nature Publishing Group has indeed taken the challenge head-on, and has come up with products and services which are embedded in the UGM framework. Examples include: • • • • •

Nature Networks Nature Precedings Connotea Scintilla Open peer review (although this experiment terminated soon after it was launched in 2006)

23.3 Critiques of the Web 2.0 movement Tim O’Reilly, besides running his own advanced publication programme, organises a high-powered annual Foo Camp (Friends Of O’Reilly) which meets every summer in northern California and consists of several hundred of the great and the good of Silicon Valley’s entrepreneurs. One such meeting, of what has been claimed to be the ‘greying hippies’ leading the information revolution, took place in the Summer of 2004 and turned out to be an epiphany for one attendee, an epiphany which in effect suggested that the Web 2.0 was a dangerous and destructive force on culture and society. Andrew Keen was the one who was so concerned at what Web 2.0 was unwittingly doing that he formulated his views into a book. The Counter-cultural 1960’s. This is where suspicion of all forms of authority first arose. The idea of ‘community’ became reinvented in a flatter, less hierarchical form. It is expressed eloquently in the book “From Counterculture to Cyberculture” (Fred Turner). Free Market ideology of the 1960’s was then added. The Long Tail supported the notion that if one leaves the market alone, it will sort itself out. It created radical free market institutions, distinctly un-Keysian in their approach. The Technophiles of the 1990’s highlighted the ability for the individual to realise ourselves. It spelt the end of alienation. It has created a situation where there are no rules anymore, nor are there any spectators – everyone is able to engage in dialogue on the net, there is little effective

304 Web 2.0 and Social Collaboration as Drivers for Change

moderation. It is a new form of Darwism where survival is the achievement of the loudest, no longer the fittest. Everyone, according to Keen, is able to talk and are not listening. Nor is this solely a technology driven issue – Web 2.0 is a socio-cultural and economic challenge to the existence of traditional media in all forms. Combining all three processes begat Web 2.0, with its endless blogs, wikis and podcats, with Wikipedia (and its often very youthful contributors) dominating the reference space. The Unintended Consequence of this has been that the role of the expert has been undermined (as proposed in the book “The Wisdom of the Crowds” by James Surowiecki). There has been the implicit replacement of the mainstream expert by the un-expert, the amateur. The consequences of this can be seen with the decline of mainstream media such as newspapers, the music industry, and that there are indications that this will continue into the publishing sector and television. There will be less credible information in circulation. More democracy, less authority. There will be a crisis which hits the old established media. Instead we will see the rise of new media. There will be digital narcissism, the rise of personalised media, of blogs and selfserving listservs. We become both author and readers – in effect a global Foo Camp ‘babble’. This has implications of the future of scholarly communications. Currently the publication system represents the selective best of what society creates and offers. The new alternative is Kelly’s “Liquid Library” (New York Times, May 2007) which has as its heart the aim to give everything away for free. In fact Anderson, author of the Long Tail, is allegedly completing a book entitled “Free” which makes a virtue out of such universal freedom. Everyone will deposit their ‘creations’ into the free liquid library and everyone will be able to access it for free. The main point made by Andrew Keen in his book is that we lose the sense of ‘value’ and ‘quality’ in going down this mass market consumerism path. Web 2.0 is under the illusion that advertising will provide the new business model which will support a new form of scholarly communication. This in itself is questionable, but in the meantime what it is leading to is that the physical items become more valuable. Authors will become self-promoters, with the need to sell themselves to advertisers to get recognition and sales. It will create a totally different information system which will become less equal, less egalitarian. Access paths to information will have to change. All this could result in the emergence of a Web 3.0 where the role of the ‘expert’ is reinstated. The scenario of mass culture without quality is disturbing enough to be the catalyst for the return of those who can provide selectivity and quality to the information disseminating scene. According to Keen we need to eschew the consequence of Web 2.0 and rebuild value into the information sector. In this respect the role of the editors and publishers becomes vital again. This is one view of Web 2.0 and one which has been over simplified in order to make a point. But it is a point which however meritorious does not take into account that Web 2.0 is only part of a wide ranging movement to Change. Taking one aspect of the Change process out of context and highlighting it as destructive and dangerous does not reflect the fact that it is part of an overall and underlying trend. It is the trend towards increased digitisation, of newer business models, of changing administrative structures, of a new generation of users who have different information gathering habits which are also relevant. In this broader view, Web 2.0 is only one part of a larger picture. And it is the picture on the larger jigsaw which

Case Study – O’Reilly 305

becomes important in providing the contextual vision of where society and culture is migrating towards. There may be unintended consequences from the dumming down which Web 2.0 might create but there are other informatic developments which benefit from the greater democratisation of the overall process, and lead to new and innovative creation and delivery processes. As with the current, some say dysfunctional scholarly communication process, there are warts on the publication system as well as warts on the Web 2.0 process. But the real question is whether the worst excesses of any communication system can be treated and adapted to support a more effective scholarly dissemination system in future.

23.4 Case Study – O’Reilly O’Reilly is a $ 60 million private company, with 200 employees based in San Francisco and Cambridge, Mass, but with six offices worldwide. The company is owned by Tim O’Reilly best know for his invention of the term ‘Web 2.0’ and also for the inclusion of animals on the covers of books published in computer science and IT. It has a mission of “changing the world by spreading the knowledge of innovators”. Its cash cow is books, with a few magazines. Most of their activities are performed online. There are a number of ways O’Reilly monitor what is going on in the market. There is O’Reilly Radar, whereby they identify and watch what the key innovators are doing. They have O’Reilly Labs which is a space where they experiment by immersing themselves with the audience. There are the cash cow books, and the Safari imprint which is a joint XML project with Addison Wesley/Pearson and now also includes Microsoft Press. They have conferences, including open source conferences. This includes the Emerging Technology Conference. They run a Web 2.0 summit in partnership with other organisations. They run a FOO Camp – ‘interesting people’ are invited to literally camp-out in their Napa Valley facility and discuss any number of issues. Six companies have so far been spawned by such Foo ‘brain-storming’. There has been a major structural change in the way publishing functions are being performed within the area within which O’Reilly operates. A key stimulus for such differential is that children are reading in a different way. They have been brought up online, they use games the way older generation read printed books. The younger generation is consuming content in shorter formats and doing this in a multi-tasking way. More podcasting is coming into evidence. The retail function has been particularly hit by the structural changes. The shelfspace available in bookstores has been halved. Backlists are rarely included in the physical selling space. Obscurity, not privacy, is the big challenge in the new era. Authors are also changing. They are finding alternatives to writing books. They no longer write books for money – they do it mainly for reputation and prestige. Lulu.com and Linda.com are great in helping with authorship – these are done as pod and videos. As such there is a new trend to ‘publish’ video books rather than just books. ‘Rich media’ embraces audio and video production – the question is who is good at this, publishers or producers? In either respect, user generated media (UGM) is becoming more significant as social collaboration takes hold. Nevertheless, content filtering and syndication is still very important.

306 Web 2.0 and Social Collaboration as Drivers for Change

Distribution is now being done through alerts and RSS or Adam feeds. Marketing support is now different with ‘viral marketing’ coming into vogue. There is the ‘slash dot’ effect. A book can move up to 4,000 places up the Amazon hit list if someone gives it a good review. Another feature which has become a key factor in successful book ‘publishing’ is search engine optimisation (SEO). Discovery of one’s title is more important than actually reading it – one precedes the other. As such, playing with the Google algorithm becomes an essential strategy. Getting realtime feedback from the market has become the norm. This is possible in a Web 2.0 environment. Despite this they see Google as their main competitor – even though they are actively implementing their SEO strategies. So how is O’Reilly adapting to all these changes? • They have reorganised their operations around key customer groups. The au-

diences need to be addressed and targeted for print, online and in person. • They have built a central infrastructure around a repository containing all

types of content. MarkLogic software is used for the repository, or the so-called ‘mother ship’. Applications feed off the mother ship. With the MarkLogic software they are able to leverage off the infrastructure. They have plotted their various applications (products/services) as follows:

Visualisation of key activiites Badged Content High Margin

Intelinside ! ! ! ! Paid —————————————-!————————————————– Free ! ! Web Publishing Safari U ! Google ! Advertising Authors as stars! Atomised Content Traditional Publishing

This indicates that they do need to play with Google as they move from traditional publishing to Web publishing. As part of this they need to give away some information for free (5 pages of their books) as this has resulted in a rise in their sales. As far as e-Books is concerned the tipping point is nearly there. It is not binary, pdf, but it is in XML. However, the tipping point for the hardware – the handheld readers – is not there yet.

The Web 2.0 business model 307

23.5 The Web 2.0 business model From the above it is evident that, to participate in Web 2.0, a new business model is required, one which complements existing services and does not cannibalise them. Nor can it rely exclusively on a subscription or licensing based model. The Web 2.0 is essentially part of the open access agenda. As such the model is that of creating a brand loyalty. Of locking people into the main service and build revenue generating opportunities on the fly. As indicated above, some such as Andrew Keen feel that there is still a need for structure in the development of user-generated-media. Traditional discipline in publishing needs to assimilate but not be taken over by the new UGM. Most of the popular blogs do not use editorially-derived systems. User-generated-media still needs nurturing. There are aspects of Web 2.0 which can be adapted to the current system rather than let Web 2.0 take over in totality. In terms of scholarly communications it would be useful, for example, to allow for commentary at the end of each article, and for these to go to a forum. There is also a life cycle in the development of Web 2.0 products, often referred to as the Gartner ‘Hype’ Cycle. This shows that Web products are at different stages of development, and all go through a period of hype and disillusionment before settling down to an even development programme. Not all Web 2.0 products have reached this equilibrium.

Whilst some products and services have reached the ‘Plateau of Productivity’ this is not the case for all Web 2.0 derivatives. Some may fail to achieve the aspirations held out for them, and could disappear. Others will become established parts of the new information landscape. In the meantime there is a massive groundswell of activity in social collaboration projects. The plethera of collaborative and networking projects can be illustrated by the following:

308 Web 2.0 and Social Collaboration as Drivers for Change

Eurekster, for example, is a company based in Christchurch, New Zealand, that builds social search engines for use on websites. The search engines are called swickis (search plus wiki). Launched to the public in January 2004 Eurekster hosts around 50,000 swickis for various websites, which total approximately 20 million searches per month, or around 500,000 searches per day. If one adds their search engine to one’s site it will rank results based on what actions people who search from the site take. It then shows results that maximise relevance to the sites’ community. As such this is a living example of a service which challenges the Andrew Keen suggestion that Web 2.0 services are incompatible with the provision of quality or expert opinion. There is an in-built mechanism for ensuring that relevancy is exposed. In January 2007, it was announced that Eurekster was one of the 100 best companies by AlwaysOn Media 100. The selection was made by focusing on “innovation, market potential, commercialization, stakeholder value creation, and media attention or ‘buzz’ ”

Drive towards Consumer-based Collaborative systems 309

23.5.1 Blogs and Wikis Unquestionably, some parts of the community are adapting to the world of blogs, podcasts and wikis. Blogs are simplified content management devices supplying syndicated XML feeds. They are not chatrooms; they are a place for genuine open comments about related issues. Information is not a standalone entity – it is the beginning of a conversation. Blogs support such discussion. People are prepared to share what they are passionate about. It is the commencement not the ending of authorship. Authors can engage with the audience on an ongoing and interactive basis. A new weblog is being created every second some in audio areas (podcasts), in video (videoblogs) and some with a business focus (blawgs, for lawyers, for example). Whilst these are mushrooming, there is still need for improvement. There is a requirement for better composition tools for blogs so that scientists can actually take part in them rather than just looking at them. The success of FaceBook augers well for a more active participation being possible through blogs. Wikis are simplified publishing tools – they are fast – they can generate new forms of authorship. Wikis allow a workable draft to be created, enabling quality to be added through interaction. They facilitate collaborative editing. Again they also generate open content. They make blogs scaleable. Wikipedia is a prime example of where this happened – a distributed and unstructured creation of the world’s most up-to-date and free online encyclopedia through cooperation at the grass roots level. The weakness which Wikis have is making people aware of their existence. There is a move from blogs to structured wikis and venture capitalists are showing interest in such services. RSS (Really Simple Syndication) is an example of the pull-to-push model for information delivery where the information created through blogs and wikis can be served to an interested audience without their intervention. 23.5.2 Mash-ups Electronic Publishing and the Web 2.0 process goes beyond text-only systems. It is not just related to words – other media can be linked in, and a “Mashup” created whereby the image interplays with the commentary and with the data. An example of such a Mashup is with the New Orleans tornado – at ground level, individual residents in New Orleans plotted the extent of the destruction in their own vicinity onto the freely-available GoogleMaps, allowing everyone to see in which locales the worst damage occurred. However, there are many such Mashups as information services make their APIs freely available to the community. The innovative juices are used to produce some inspired new services building on separate information content sources.

23.6 Drive towards Consumer-based Collaborative systems It is difficult to assess the real impact which the current mushrooming of interest in User Generated Media (UGM) and collaborative or social publishing is having on the core electronic publishing systems. The impression is that this is a revolution waiting to happen rather than one which has already taken hold.

310 Web 2.0 and Social Collaboration as Drivers for Change

There is an incredible amount of individual participation in such services as FaceBook, MySpace, Flickr, etc. It is flourishing among the digital natives. But it is largely an unstructured communication system rather than a formal mechanism for exchanging qualified and verified research results. However, some Web 2.0 developments could have relevance to current scholarly information developments and the current stakeholders. One of the key areas where such relevance could be found is in ‘remixability’. Web 2.0 offers potential access to a range of different publishers and information providers and websites. This is significant because no one publisher has control over all the required published information in a particular topic. Besides offering support for joint information systems to be created, Web 2.0 offers the tools with which to distribute this new combined information – RSS, OpenURL, ‘mash-ups’, etc. All of this means that ‘links’ rather than individual digital objects become important. It becomes a ‘Linked Economy’, an intellectual pathway through the sciences. There are problems as some sites prevent linking and they change their URLs frequently. There are many indications of non-constructive behaviour and there are few sophisticated index linkings in place, but equally there are signs of more helpful joint collaboration using the new Web 2.0 tools. A few publishers are keen to see as many people linking into their published material as possible, freely if appropriate, in order to create a brand and image. Link-in and link-out is a basic feature of such operations in order to get users to spend more time online. A second main push from Web 2.0 is the creation of community participation. Despite what many new pundits claim, the scholarly community does want peer review. However, they want peer review which lies somewhere between the freedom of listservs and the formality of the formal article. Some allowance for personal commentary should be allowed but this does not mean that fast-changing or dynamic articles are wanted. In this respect, will Folksonomies replace traditional classification and cataloguing? Folksonomies involves the community doing the tagging for each digital item made available through a database. But in some areas folksonomy is not appropriate; in others it is. It operates best in those areas/disciplines where there are evolving terms; where lower accuracy can be tolerated; where structure is low and where variable search procedures can and should be employed. Folksonomies are a function of popularity – the more people who take part, the better the folksonomy because there are more users and there is greater frequency of interaction which leads to higher quality. Flickr, for example, has 20 million users per month – its tagging of pictures of ‘cirrus clouds’ for example is better than the formal classification structure. There is a need for publishers to use both taxonomy and folksonomy, and to anticipate the greater use (and therefore quality) which comes from the latter. Use is not the same as stickiness. In fact ‘stickiness’ works against the whole principle of scholarship, as scholarship requires use across sites and not staying in one domain only. As such, every effort should be made to use Web 2.0 tools to create flux between sites – to ensure that users come from across other sites and not just confined to one site only.

Communication 311

23.7 Communication A feature of Web 2.0 is the facility to ‘communicate’, to provide the mechanism to allow an interchange or dialogue between author and reader. This breaks with tradition whereby structured systems have been developed to permit a one-way delivery of research results. The Web 2.0 introduces a participative and collaborative aspect to the communication issue. It brings back communication, something which the strict refereeing system all but lost. As shown by the following graph, there is a push to move the boundary between the formal and informal in favour of the former by the publishers. The Web 2.0, on the other hand, would push the boundary downward.

23.8 Case Study – Wikipedia and online encylopedias According to Larry Sanger, developer of Wikipdedia, publishers, if they are to survive, should experiment immediately with Web 2.0 projects. The history of online encyclopaedias in recent years has been how much each of them has adapted to the use of social collaboration tools. He also has a vision for the next ten years where the ‘Encyclopaedia of Life’ (a $ 100 million project currently in the stage of inception) would hold centre stage, and Citizendium with millions of articles and real named editors/contributors (unlike Wikipedia) would also provide a source for references. Virtually all li-

312 Web 2.0 and Social Collaboration as Drivers for Change

brary holdings would be digitised in ten years, as would archives of video, audio, pictures, etc, and scholarly publishers would be actively participating in Web 2.0 operational projects which access these. Larry Sanger has described how the present situation with regard to global and free encyclopaedias has arisen. The first was the Stanford Encyclopaedia of Philosophy which began in 1995 and offered free access to quality controlled articles, but the number of articles involved was in the thousands rather than millions. There were not enough experts around to provide the quality filtering, so Nupedia was launched in 2000 which addressed this problem by allowing anybody to write articles for the online encyclopaedia. However, it collapsed as not enough people were willing to go through the seven steps necessary to create an item. This led to the formulation of the Wikipedia concept in 2001, based on the term ‘wiki’ which is Hawaiian for speedy. Wikipedia became simple, open and anonymous. It became contemptuous of expertise, but in so doing was often of questionable reliability. The Encyclopaedia of Earth was spawned in 2005, and applied the wiki concept to the earth’s environment. In this case, the contributor does have to be an expert, but as a result there is little social collaboration taking place. ‘Scholarpedia’ was developed in 2006, which is an encyclopaedia specialising in neurosciences, mathematics, computer sciences. It is not open content but is free to read. However, the level of the content is such that it is not for non-experts, and again little collaboration is going on. Which leads to the ‘Encyclopaedia of Life’ (2005–7) which is a $ 100 million investment in editorial design, the application of ‘mash-ups’, and the input of material from a wide variety of sources which is then authenticated. Though it does not yet exist, the investment of this amount of money should produce something of note. Finally there is Citizendium, which is like Wikipedia but with editors and real named authors. This goes back again to the collaborative approach for creating items interactively by experts and non-experts. In the seven months of Citizendium’s existence there have been 1,700 author contributions, 240 editors identified and 2,000 articles, which replicates Wikipedia in its rate of expansion. All these projects had to take into account some general principles of encyclopaedia creation. Both size and quality are important, and a ‘peaceful’ community is required. Other features include the need to expand the amount of contributors involved and to simplify the work process, articles should not be signed, editing should be encouraged by forceful moderators, open content licences applied to the material, and the principle of encouraging help with other people’s work. Sanger stressed the need for a comprehensive review through a balanced approach by as wide a group as possible. Many texts need to be combined into the same outline. There remained the issue of business models which could sustain the process. Advertising is one source of revenue, but only Scholarpedia adopts the advertising formula in its business activities. Other potential sources of revenue include payto-play, making the basic information free, but charging a rate for premium or value-added services, and patronage.

Wikinomics 313

23.9 Wikinomics ‘Wikinomics’ is essentially the process of capturing at grass roots level the power and range of a diffuse social network. Historically, individuals have occupied their own information space, but with the ubiquity of personal computers, free Internet telephony, open source software and global outsourcing platforms, a new form of communication has arisen which is increasingly impacting on STM and the ‘digital natives’ (those born since the early 1980’s and brought up on a diet of electronic games and interactive computer services) which are part of this constituency. It is described in detail by a book entitled Wikinomics – How Mass Collaboration Changes Everything, by Don Tapscott and Anthony D. Williams, published by Portfolio in 2006 Is this new social networking system a passing fad or likely to revolutionalise the peer review system, the bedrock on which current scholarly publishing is based? The evidence seems to be that new principles of: • • • •

openess, sharing (of some, not all, of their IP) peering and acting globally (China, India, Brazil Russia)

are migrating from ICT areas to scholarly information. The web itself is also undergoing change which supports the above. These are: • The emergence of the blogosphere (or the world’s biggest coffeehouse?) which

enables ‘conversations’ to take place. • Collective intelligence being captured – the “wisdom of the crowds” (James

Surowiecki) which supports Amazon and Google. This has led to “tagging” and bookmarks (Del.icio.us) and folksonomies • New public squares emerging – small companies can erode into large companies’ market shares through using LAMP Linux software, Apache web site, MySQL database and Perl scripting language • Serendipitous innovation is where new inventions are made and the business model is considered later A further aspect is that the new generation or NetGen is changing habits, with networking becoming a built in feature of their digital lives. It appears in MySpace, FaceBook, etc. Their approach reinforces the above. Pew claims that 57 % of US teenagers are ‘content creators’. The norms of work for the NetGen are speed, freedom, openess, innovation, mobility, authenticity and playfulness. Most publishers (excluding Nature Publishing Group (Connotea, Nature Network), Elsevier (Scopus, MDL, Scirus) and Thomson) are too small, too traditional and too locked into the ‘closed’ system of profitable subscription-based publishing to see beyond the threat. It is contended by the new school of scholarly communicators that as new platforms for collaboration emerge, and a new generation of those who are accustomed to collaborating arise, and a new global economy develops which supports new forms of economic cooperation, the conditions are emerging for ‘the perfect storm’, one which has already had a marked effect on the R&D strategy in many large companies.

314 Web 2.0 and Social Collaboration as Drivers for Change

The days when organisations such as IBM, Linux, Motorola, HP and even Proctor and Gamble conducted their own R&D efforts within house are disappearing. They now open up their R&D efforts to the community, the skilled users. A collaborative approach has developed, with these companies exposing their formerly cherished software programmes for all to use and in so using improve at a fraction of the cost it would take to develop these in-house. It also enables speedier and more innovative developments of the programmes as the power of the community exceeds the power of a few dedicated in-house researchers. Google has been a classic user of this approach, offering its APIs for other organisations to apply to other datasets and create new ‘mash-ups’. Exciting new services such as Swivel show how the traditional approach to dataset dissemination by publishers such as OECD can be radically improved through the community refining the data (and achieving a six-fold increase in usage as a result). Transparency, the disclosure of pertinent information is a growing force in the networked economy. “Coarse’s Law” suggests that companies will add new functions until such a time as the latest one becomes cheaper to outsource. Internet has caused the costs of production to tumble, which increases the outsourcing potential for modern companies. Coarse’s law is turned on its head, and it explains the death of the gigantic conglomerates who do it all themselves (GM, Ford). The rise of China (manufacturing) and India (office services) has led to globalisation or die. Some companies still protect themselves with digital rights management (DRM), but as Sony found to its cost when Cory Doctorow exposed the extent of its DRM activities in his Boing Boing, the problem is not piracy but rather obscurity. Open and free innovative services such as InnoCentive enable the forward looking companies to expose their product development problems to the 100,000 scientists around the world who participate in solving tough R&D problems not from any mercenary reasons but because it is part of an open and free social network where benefits are to be found in solving challenges rather than making money. ‘Ideagoras’ have emerged as the systematic tapping into the global skills of highly skilled talent many times more than is available within the organisation. ‘Prosumers’ have emerged claiming the right to hack into (and improve) existing old-fashioned systems. This is another reflection of the “wisdom of the crowds”. Platforms for participation are now being set up by some of the more forward looking companies to invite the community, the crowds, to participate in product and service improvement. Organisations which currently use this social networking approach, notably Amazon but also Google, Yahoo, eBay, will need to achieve a perceived balance between achieving outstanding commercial success on one hand and stimulating the interest, support and loyalty from their communities on the other. The business model is difficult and in the early days of this social collaboration networking has meant that new services have either adopted the advertising model to keep alive, sold themselves into larger wikinomic organisations (Skype acquired by eBay) or have been beaten into submission (Wikipedia beat out Britannica; Blogger beat out CNN; MySpace beat friendster; craiglist beat out Monster). And all this happened in 2006. The difference was that the losers launched web sites, the winners launched vibrant communities. Alliances and joint ventures are vestiges of the central-planning approach – instead one needs free market mechanisms according to the wikinomics approach.

Case Study: InnoCentive 315

23.10 Case Study: InnoCentive InnoCentive calls the scientists who attempt the problems “solvers” and the companies these problems come from as “seekers”. As of 2005 InnoCentive had 34 of these “seekers” (including Procter & Gamble, Dow AgroSciences and Eli Lilly), which have posted more than 200 “challenges” in 40 scientific disciplines, of which more than 58 had been solved by over 120,000 “solvers”. The problems listed are categorised as biology or chemistry problems, but use a very liberal definition of these disciplines, for example challenges have been posted in the areas of system network theory, manufacturing engineering, design, materials science and nanotechnology. Solutions have come from US, Europe, Russia, China, India and Argentina; the cash awards for solving challenge problems are typically in the $ 10,000 to $ 100,000 range. InnoCentive provides a consultancy service to enable its clients to make the most of its “solver” network. “Science advisers” and “problem definers” help clients to identify a challenge appropriate for posting on its network. They then estimate an appropriate award fee by determining the complexity of the problem, the resources required find a solution, and the value transferred to the company. InnoCentive reserves the right to reject the award amounts that are deemed too low and its experts provide a solution vetting service to screen out ideas that do not meet the challenge criteria. InnoCentive forces its “seeker” companies to agree to intellectual property audits so that once a solution is provided to the company it can guarantee that the intellectual property is not used if the company decides not to award it. The company may also force the “seeker” company to award the solution if it deems that it meets the requirements of the challenge. Its “science experts” provide feedback to explain the terms of the challenges as well as why submitted solutions may be deficient. It provides the logistic and legal framework for maintaining their control over the intellectual property until its sale to the seeker company. All communication and submitted solutions remain confidential. Competitors offering similar services to InnoCentive include Yet2.com, YourEncore and NineSigma. Inevitably it is claimed that failure to participate in this new system will result in great upheaval, distortion and danger for societies, publishers and individuals who fail to keep up with this relentless change. While the old Web was about web sites, clicks, ‘stickiness’ and ‘eyeballs’, the new Web economics – a mere five years or so on – is about communities, participation and peering. It is the latter which is frightening some of the larger commercial publishers. Without control over peer review, without making money out of taking control of the IPR of publications, they are as nothing. And yet it is the exercise of such control which is anathema to social networking and wikinomics. Most technologists agree, however, that DRM is a lost cause (due to hacker innovativeness) as well as being bad for business. Publishers don’t accept this – this is why Google, Yahoo and YouTube are driving the industry. The new business models to keep publishers in the loop need to be fleshed out (new services pay content providers something, or a tax on the Internet, etc). It is not only the publishers which see such social collaboration as a threat. Bill Gates, chairman of Microsoft, claims the movement to assemble a global “creative

316 Web 2.0 and Social Collaboration as Drivers for Change

commons” that contains large bodies of scientific and cultural content is a potential threat to the ability to make profits by knowledge-based industries such as software. More important, the telecoms companies are trying to set charges for the Internet which could be catastrophic – a war against innovation. The web is becoming a massive playground of information bits that are shared and remixed openly. The new web is about participating, not passively receiving information. The pressure to create the new Wikinomics comes from several directions. These are: 1. “Peer Pioneers” – the open access innovators. These include Wikipedia. Five staff currently keep it going, but 5,000 regular editors support it with each Wikipedia article being edited on average 20 times. Compared with Encyclopedia Brittanica, the Wikipedia errors are immediately corrected, not so for EB. 2. “Ideogoras” – global pools of skilled talent such as through InnoCentive (90,000 scientists in 175 countries).used by Boeing, Dow, Dupont, Novartis. They come in two forms – solutions in search of questions, and vice versa. 70–90 % of ideas go unexploited unless given free reign. See yet2.com for a list of available patents for exploitation by small firms. 3. “Prosumers” – the new hackers. Based on bustling agoras, those Athenian centres of culture and commerce, services such as Second Life (from Linden Labs) are emerging which supports 325,000 participants. Lego uses mindstorms.lego.com to develop new products. iPod and Sony (PSP) – they are involved in ‘culture remixing’ (Lessig) and Mashups. 4. “New Alexandrians” – those that improve the human lot by sharing simple but powerful ideas. Open access publishing is one such idea. It no longer involves hoarding corporate knowledge because this creates a vacuum. We are now in the age of collaborative science (or Science 2.0). Conventional scientific publishing is slow and expensive for users. It also involves many collaborators – 173 in high energy physics for example. Now we see the emergence of the Large Hadron Collider (2007) and Earth Grid System for climate, astronomy. The results from these will be vetted by 100’s of participants on the fly, and not a few anonymous referees. Blogs, wikis and datasets participants are heralding the arrival of Science 2.0. Scientific institutions will need to rethink the way they collect and manage data. The stumbling block is cultural. Silos will give way eventually to networked information systems. 5. “Platforms for Participation” – companies create an open stage to permit new businesses to be created (through mashups, etc). Housingmaps, Chicagocrime and Hurricane Katrina are all examples. Amazon pioneered the affiliates programme. The platform may be a product (eg a car or iPod), a software module (Google maps), a transaction engine (Amazon), a data set (Scorecard), etc. The key is for the platform to acquire network effects. 6. “Global Plant Floor” – where manufacturing intensive industries have used the new openess to improve efficiency. National companies (silos) gave rise to bloated and expensive bureaucracies that deployed inefficient incompatible and redundant processes. BMW focuses on marketing, partnering and customer relationships and it maintains what engineering expertise it deems critical – but not production. That is done by Magna International. They free up resources to focus on what they think is important and what lies in their main competences.

Summary

317

7. “Wiki Workplace” – a new corporate meritocracy is being created. Earlier generations valued loyalty, security, authority – the GeekGeneration supports creativity, social connectivity, fun, freedom, speed and diversity in their workplaces. There is a bottom up approach to innovation. 16,000 participate on Wikipedia, 250,000 on Slashdot, thousands on Linux, 140,000 application developers on Amazon – they ‘employ’ staff which are constantly changing and in flux. Not the 150 fixed number for an efficient hierarchical company. 40 % of IBM don’t work in traditional offices. Consultancy could become the new working model.

23.11 Summary These are major social developments which will have a profound effect on the future of electronic publishing in the scholarly area. How quickly they will become visible and have an impact on the current way scholarly information is produced and distributed remains unclear. But it is likely that by 2010 there will be new procedures in place and the question is whether the existing players can make this move from their digital silos and individual collection strategies to the more open and collaborative approach, or whether they will fall into the ‘valley of death’.

Chapter 24

Trust

24.1 Trust Finally, there is the matter of trust’. Trust in a publication system is essential to attract and retain users. The print publication system has evolved over generations and has become a cornerstone to the dissemination of research results. Print is a commonplace medium that has confronted users for generations, whether in the form of manuscripts, newspapers, books or research articles. It is only in the past decade or so that a new alternative has arisen with a more digital construct and audio-visual interface. Trust is crucial to new forms of publication but the metrics of trust are ill-defined. There are some highly trusted specialists whose web sites one can rely on. Formations of trust syndicates are evolving and being protected by security and authentication controls. The so-called ‘digital native’, or child of the new Millennium has emerged, immersed in the new IT technology that surrounds us. Instead of being brought up on a menu heavily laced with traditional print, the new generation has personal computers, television, iPods, MP3s, online games and socially interactive software with which to interact. This has an impact on Trust. As has been described by Kieran O’Hara in his book, ‘Trust – from Socrates to Spin’, trust is different in the scholarly communication area compared with the world of the Internet. He felt that there was a divide – publishers versus Web 2.0 – where neither side really understood the problems facing the other. With the Web, the issue is about trust, as there is currently no way for people to authenticate what is on the web and to assess its quality. We have to take it on trust but there is a fear that the ‘unwashed masses’ could break the system. Either the Web 2.0 will be regulated or it will collapse – someone will subvert the system. This was rarely the problem in the traditional scholarly information sector. For generations there has been trust that the findings published in international scholarly books and journals were reliable, accurate and could be built on. They were a solid foundation. The publications had been sifted through the refereeing system to ensure that only those which withstood the test of quality would be published and disseminated. Occasionally this did not occur – there are classic cases of the refereeing system breaking down – but these occurrences were in the minority. On the whole the scientific community believed in the publication system. They participated in it as willing editors, referees and co-authors. The community ensured that time would not be wasted checking each nugget of information and whether it was correct. There was Trust in the scholarly system.

320 Trust

With the Web, more recently emerging on the scene, there has not been sufficient time for the scholarly community to transfer their trust from the traditional scholarly system to the vast world of the Internet and Web. It is still unclear how reliable the information on the Web really is. There is no stamp of approval by institutions in which the individual scholar has confidence. Much of the information on the Web, particularly the social networking areas, are uncorroborated. It is not possible to build on the shoulders of giants if the giants turn out to be dwarves. The following diagram, adapted from Kieran O’Hara’s book, illustrates the difference between the concepts of Trust in the Scholarly and the Internet sectors. Difference between the Trust elements of information Local

Global

Personal acquaintances Sometimes transitory

Trust extended through proxy Proxy extends trust to strangers

Doesn’t scale

Includes systems risk

Horizontal

Vertical

Amongst equals Little coercion

Within hierarchies Coercion used to enforce

Not enforceable

Subject to control

Vertical ! ! SCHOLARLY TRUST ! Local —————————————!——————————— Global ! INTERNET TRUST ! ! Horizontal Within Web 1.0 there have been ways to overcome the concerns about Trust: • Ebay has developed a Trust metric based on feedback from buyers and sellers.

Buyers and sellers try to make sure that they remain trusted participants by sticking to the rules of transaction. Failure to do so appears on their e-Bay record. • Amazon uses Reviews sent in by book purchasers to provide more information about a title without resorting to advertising hype. • Slashdot uses Karma (or the Wisdom of the Crowd). • Google pioneered the PageRank system which uses a proprietary trust metric based around numbers of links to an item.

Fraud and Plagiarism 321

Within Web 2.0 there are also some trust-based systems • Del.icio.us. • Connotea.

We need to set up a mechanism to provide quality assessments for the Web. At present these exist more in the breach than the observance. But without the moving together of trust in both the Internet and the Scholarly area we face a mismatch. As such the electronic publishing in support of the scholarly community will never be able to gain the benefits which the Internet infrastructure potentially allows.

24.2 Fraud and Plagiarism The referee system, one of the main bulwarks of the traditional publishing system, is not without its detractors. They focus on the mistakes which have been made, when highly visible research papers which have passed through the review process have been shown subsequently to be fraudulent. In 2004 it was announced by the Korean biochemist Hwang Woo-Suk that he had successfully created a human embryonic stem cell from an adult cell by somatic nuclear transfer injection into a human egg. It was published in the March 2005 issue of Science. Later that year it was announced by Seoul National University that Hwang’s cell lines were a fabrication as was the published article. This despite the established review process undertaken by Science reviewers and in-house editorial staff. There are many similar examples. Fraudulent items has been published in the New England Journal of Medicine where authors have hidden critical information about Vioxx heart attacks. A report in Lancet included a fabrication about drugs and oral cancer. However, there are over one million articles published in peer reviewed academic journal each year and it would be foolish to dismember a system on the basis of a few rogue articles and authors. Plagiarism, however, is more of a problem even though according to the chief executive of the STM Publishers Association the extent of plagiarism is less than one paper in a thousand. One might question whether this includes large chunks, rather than the complete report, which is plagiarised. But to eradicate even the low figures indicated by STM, the CrossRef group have launched a CrossCheck initiative. Publishers buy into a software system which runs a check to see if plagiarism can be identified on a given item. This occurs across a wide spectrum of publishers at point of submission. According to Mabe from STM, “Publishers are doing this because trust and authority lie at the heart of their value-added service; ensuring the integrity of what is published isn’t just an ethical good, it is also good for the reputation, branding and the future submission of articles and their download and use”. (listserv comment February 1, 2008.)

Chapter 25

Timeline – Emergence of Electronic Publishing

Earlier chapters have described some of the Drivers which could move electronic publishing forward. This penultimate chapter attempts to chart when and to what degree the various trends will impact on the evolutionary process. The overall trends can be shown as follows: Impact of new developments on EP, 1995-2015

Open Access

Subscription based scholarly material

E-science/ Big science

1995 Events

2000

2005

Budapest UK Select Initiative CTE (BOAI) Report

‘Social Publishing’ (UGM)

2010

2015

Welcome Trust/ UKPMC

Figure 25.1 Impact of new developments on EP, 1995–2015

An alternative way of presenting the results can be illustrated as follows:

25.1 Where we come from The early chapters identified some of the challenges which faced the scholarly communication industry sector. These basically revolved around the problems caused by the inability of the demand sector (library budgets) to keep pace with the output

324 Timeline – Emergence of Electronic Publishing

of scientific results (supply). This set in motion some defensive tactics by the publishing industry and innovative actions by the library and funding communities. Industry Evolution

“Serials Crisis” “Big Deals” The Disenfranchised

Open Access

25.2 Users of scholarly communication However, there is no one size fits all, and user behaviour has been influenced by a number of social trends. These have been at the general level as society in general has begun to cope with the Internet and web, and at specific discipline levels as the culture of their discipline and subject areas have adapted to the information trends.

End Users

Overall Behaviour Patterns

Subject Based Behaviour Patterns

External impacts (Funding and technology)

Research Studies Metrics and scientometrics

25.3 The Industry Structure In addition the industry as a whole has been challenged by the electronic publishing developments. The main players, and the main electronic products underpinned by a digital infrastructure, is summarised below.

Drivers for Change 325

The Industry Players

Publishers

Librarians

e-Journals + docdel

e-Books

Intermediaries

Data and datasets

Users + Researchers

Grey Literature

e-Infrastructure and cyberinfrastructure

25.4 Drivers for Change In the face of the above structures, the electronic publishing system has been creaking. A number of new ‘drivers’ have been causing additional changes. There are some dozen of these, which have been summarised under three main headings. These are:

Drivers for Change

Financial and Administrative

Technological

Social

25.5 Separating the Drivers Looking at each main diver in turn, the first which we identify are those related to financial and administrative issues. There are two main features here – one is the move towards open access in line with the rest of ICT, and the other are the new powers which the funding agencies are giving themselves as masters of the purse strings.

326 Timeline – Emergence of Electronic Publishing

Financial and Administrative

Open Access

Industry and Charity Funding

Funding Mechanisms

Business Plans

Research Assessment Exercise

Advertising

Public/Nation al funding

The next set of drivers were those which were based largely on technological developments and innovation. These were partly from some changes which the existing players were making to enable their current and future operations to be more efficient and relevant to the changing external environment. But mainly they are technological changes bearing down on an industry sector which is being buffeted by external developments over which it has no control.

Technological Drivers

Efficiency Enhancements to current regime

Technology and innovation

Industry Restructuring

Mobile + PDAs

Data and Datasets

Text and Data Mining

e-Science

Work Flow Processes

Standards + Protocols

Authenticatio

DOI Others

Semantic Web, Ontologies Web 3.0 Archiving and Preservation

Finally, there are drivers which emanate from the user community itself. The sociology of the scholarly communication industry is undergoing change some of which stems from changes occurring from within the groupitself. Some of these are adaptations to the other drivers – others are adaptation to the new cultures which internet is imposing on the research sector.

Summary

327

Social Drivers

Findability and Resource Discovey

Search engines

Web 2.0 and Social Collaboration

Folksono mies

Blogs, Wikis

Trust

Mash-ups

25.6 Summary To conclude, it is a confusing web of interactions which is driving the electronic publishing industry forward. No one driver is dominant – all are interacting in different ways at different times. A schematic of the possible impact of the various Drivers at what time is illustrated below. Impact of the Drivers over time

Driver

2008

2009

Years 2010

2011

2012

Financial and Administrative Business Models Open Access Online advertising

3+ 1

2+ 2+

2+ 2+

1 3

1 0

2 3

3+ 3+

3 2

2 2

1 1

Technological and Innovation Efficiency enhancements Impact of Technology Data and Datasets e-Science and Cyberinfrast Text and data Mining Work-Flow Processes Mobile technology and PDAs Semantic Web (Web 3.0) Archiving and Preservation

3+ 3 2 2 3 2 1 1 2

2+ 3 3+ 3 3+ 3 2 1 2

2 3 3 3+ 2 3+ 3+ 2 2

2 3 2 3+ 2 3 2 3+ 2

2 3 1 3 0 2 1 3+ 2

Social Drivers Findability and Search Tools Social Collaboration (Web 2.0) Trust

2 1 2

3+ 2 1

3 2 1

2 3+ 1

1 3+ 1

Funding Mechanisms Metrics (scientometrics)

328 Timeline – Emergence of Electronic Publishing

The numbers indicate the estimated amount of activity which will be undertaken in that year – the ‘+’ sign shows where the peak of intensity will occur. The table reflects two things: (a) When the Impact of the specific Driver for Change will take effect (in which year). This is denoted by whether the impact will be small (0), or whether it will be substantial (3). (b) The extent of that impact. This is denoted by a ‘+’ sign, if it is felt that the extent of the impact will be strong in that year. Without the ‘+’ sign, then it would be there as a Driver but would not be a dominant issue. For the next year or so the Open Access agenda will dominate discussion within the industry. This will be counterbalancedby activities within the current stakeholders to improve the efficiency with which business and transactions are undertaken. However, the funding agencies will become as keen to measure the value of the output from their funding as to determine what format it will take. This will become heightened during 2009 as the committee structure in these agencies adopt the efficiency mantle and use the new forms of metrics as evidence. They will also take leadership in ensuring that datasets are collected, managed and curated. This will herald a totally new era for scholarly communication with ‘links’ between text and data supporting new ways of collecting relevant information. Once this is achieved the derivative processes of text and data mining will come to the fore – the collected information can be used in a new and intensive way. The two developments of e-Science and social collaboration will then come into effect. Two completely different, and in some respects conflicting drivers. The outcome of this interaction will dictate how soon the much-touted semantic web (Web 3.0) emerges from a phase of interesting innovations to one of practical operations. Throughout there will be a number of ongoing drivers which may not have ‘peaks’ of activity but will help effect change. Archiving and preservation is a constant – it is a key part of the scholarly communication agenda and will adapt and change as technology drives the need for ‘in perpetuity’ storing of digital objects and information services. The Timelines suggested here are variable. They are put forward merely to indicate that this is a dynamic, fluid transition from print to digital. As such the ‘revolution’ will not happen overnight – it will evolve as new drivers take hold as others dissipate.

Chapter 26

Summary and Recommendations

What does all this mean? We have seen that there are several trends in electronic publishing of scholarly material that are changing the face of information dissemination within the specialist research and professional areas. It is now time to bring these strands together and postulate on a scenario of what may emerge over the next few years, particularly with respect to the main stakeholders. The central theme of this book has been about Change – that change has been and is likely to remain rampant through the scholarly communications sector. The drivers for Change come from many directions, as the earlier chapters show, and occur at different intervals. The crucial issue, therefore, is what do all these changes mean – what is the present and future status of this industry sector? Is it in fact possible to identify some key overall developments for which we need to prepare ourselves?

26.1 Planning for Change We have seen that the cornerstone of scholarly communications over the centuries has been journal and book publishing in printed format. This legacy carries great weight when considering projections into the future. Though the concept of the ‘tipping point’ has been highlighted, it is unlikely that there will be an overnight change in the paradigm to one where digital artefacts will replace the print in this large and diffuse industry sector. Some parts of scholarly communication may remain largely unchanged over the next few years, but in others the change will be dramatic. Perhaps the most immediate change will be in the way the journal publishing sector copes with the migration from print journals to print-and-electronic to eventually born digital e-journals. There are indications that some 50–60 % of use being made of journals is now in electronic form. A major change from the past is that access to electronic text has become possible because technology allows this to happen, and as such user’s information habits are adapting to the change. Besides access to electronic text, another major change has been the business model that is being introduced by parts of the journal publishing sector. This switches the payment mechanism from libraries paying for access, to authors paying for distribution. Open access, the new model, carries with it some issues which are as yet unproven – whether the model is sufficiently scaleable, or even acceptable, to support ongoing financial sustainability of the scholarly communication process? Also, no single demand curve fits all the STM situations. Open access, for

330 Summary and Recommendations

example, seems more suitable at the low end of demand rather than at the broader level of persistent demand. It is in the mid range where alternative models are available, some of which are not necessarily appropriate in the modern context. Examples include a Delayed Open Access model (with variations from 2–3 months to 6–12 months), self archiving by authors in institutional or subject repositories, and the Big Deal (which crowds out the smaller, often society, publisher). Open Access is also not necessarily a proven model. In fact no currently touted business model is totally acceptable to all parties. Neverless, open access to scientific data is becoming a highly visible topic, and in many areas of research is becoming the norm. It will increase as international collaboration on major research projects escalates and becomes a central feature of Big Science research. It will also be assisted by a plethora of standards and protocols that are being implemented, such as DOI’s, OAI-PMH, OpenURL and XML among many others. It is claimed that this move towards open access data will change the peer review mechanism, and new technology will provide a further assistance. Presentation of data in an appropriate form for access and archiving will become a pre-requisite (and will be mandated). Data and data sharing will radically change Science. There are many obvious advantages in data being on the agenda, but there are still issues to be tackled such as interoperability, adding content, provenance, security, linking, metadata provision – all are still unresolved. Another development is the emergence of ‘live documents’, where there is a strong link between text and the supporting raw data. The two build on each other and provide greater insight into the research methodology and research results which were employed. Publishers may then be required to focus on providing ‘services’ around both data and textual content, such as through data analyses, workflow processes, data archiving, text and data mining, etc. The day of Content is King may be over and the new age is for providing answers to questions, not just the information. From this one could speculate that the current method of publication of academic results may be out of synch with new forms of scholarly communication. Some radical reshaping of the industry may be anticipated within the next five years as the previous chapter indicates. Related to the publishing structure of the industry are changes which may arise in the legal framework, with copyright and intellectual property rights coming under scrutiny particular if they pose too great a hurdle to the ‘legitimate’ access to publicly generated research information. Whilst the formal methods of information dissemination are undergoing change, there is an equally important development taking place in the informal methods. Blogs are expected to take on a greater role in communication. They will highlight not only successful projects but research experiments which have failed. There will be sharing of e-Laboratory experiments and protocols, using OpenWetWare. There will be more instances of collegiate sharing of results through projects such as Connotea. User generated media is on the rise. Institutional Repositories are also growing in number – approximately 1,400 currently exist worldwide. They are supported by some 20 different software systems. Outsell, in a November 2006 report, claims that when researcher and author behaviour changes in support of IRs, publishers could be by-passed very quickly.

A vision for Scholarly Communications 331

Outsell Inc has also explored attitudes to advertising as an alternative source of revenues to complement or substitute for subscription and PPV sources. Recent experiments by Elsevier with their oncology information sources is moving in this direction. According to Outsell, 86 % of scholars would accept some form of advertising with 49 % willing to accept full advertising support for their publications. But is the traffic there at present to entice the advertisers? The growing impact of e-Science (in the UK) and cyberinfrastructure (in the US) on the research process is also part of the Change mechanism. Future research in many fields will require the collaboration of globally distributed groups of researchers needing access to distributed computing, data resources and support for remote access to expensive, multi-national specialised facilities such as accelerators, specialist data archives, etc. An important road to innovation will also come from increasing multidisciplinary research. In addition, there will be an explosion in the amount of scientific data collected during the next decade that will form a key part of the e-Science agenda. Robust middleware services will be widely deployed on top of the academic research networks to constitute the necessary cyberinfrastructure to provide this collaborative research environment. The amount of sharing which is taking place has already reached a new dimension but this often bypasses existing scholarly information stakeholders. Another new trend is for R&D companies to open up their research effort to the community and invite them to apply their collective brain power to specific research activities on the basis of there being ‘wisdom in the crowds’. One example is that Novataris is now prepared to expose its findings on diabetes research in order to provide a more effective interchange and mass collaboration on research findings and inputs. Data analysis is being professionalised and in some instances democratised. Supporting this are government agencies ploughing funds into creating a powerful technical national IT infrastructure and in some instances applications that can flow through this mechanism. In the US the National Science Foundation supports the implementation of the ‘Atkins’ report findings on a ‘cyberinfrastructure’ with substantial funding; in the UK, JISC provides a plethora of technical support actions and in Europe there is also a widespread programme of electronic information activities planned within the FP7 programme.

26.2 A vision for Scholarly Communications The current scholarly information industry is not hampered by considerations of scale – with an annual revenue of $ 16.5 billion it is smaller than pornography and in the UK is less than is spent the consumption of chocolate. However, elements of the industry are growing rapidly, and those elements are not the traditional products and services which dominated the scene in the last century. The revenue generated by traditional scholarly publishing is dwarfed by the research funding that creates the research output that drives the scholarly information industry. This research funding – billions of dollars annually – can in principle be used to experiment with alternative publication forms as long as it is seen to be valuable to the research process. Whilst the publication process involves no more than 1–2 % of the total funds going on research a certain amount of tolerance for a legacy

332 Summary and Recommendations

approach may exist. But once the legacy system stands in the way of scientific progress the tolerance may end. Data output has grown at a rate of over 40 % per annum, whereas journals have grown by only 7 %. In the top 10 of scholarly publishers, still dominated by Elsevier with a 17.5 % market share, new names which trade solely in data (eg. WesternGeco/Schlumberger, IHS, Petroleum Geo-Services, TGS NOPEC Geophysical Company ASA and Veritas DGC) have emerged in the past few years to usurp organisations such as Springer and Informa from the revenue rankings. Although open access is limited at present, by 2012 it could be ubiquitous. Copyright will remain an issue but by 2012 it will to some extent be overtaken by a broader and more participative licensing environment, championed by organisations such as Creative Commons, Science Commons, JISC/Surf, etc. As will librarians, whose role will change from being caretakers of the storehouses of printed material, to being information support professionals. Publishers as we know them will also have to change, becoming providers of scientific support services. Only a few, perhaps three or four of the current players, will still remain as information producers as consolidation and market changes introduce new players to the industry. The ‘long tail’ will be shortened as the power of economy of scale become apparent in an (expensive) electronic world. Vertical, personalised and context sensitive searching will emerge. Much of the information creation will be done on a ‘community’ basis, using social networking. Journal articles will not entirely disappear but will, arguably, become fewer in number as a new multimedia item appears which has text as a hub with links to a variety of related information resources, including data, audiovisual, computational software, etc. The main focus will be on research workflow, as opposed to ‘publication’, as the main scholarly communication driver and output. This will be helped by need for improved productivity in the research process, a decision-making mentality and the requirement for compliance. Compliance particularly with authentication to premium level services will become a new role for the library community. There are indications that the network economy is coming into play. In the last decade hotmail accounts increased by 12 million within 18 months; Google, Linux and Wikipedia came of age, all heralding the arrival of new services based on Web 2.0. Google came along and demonstrated that it is all about speed and comprehensivity. Such major changes recognise that this is what has to be done – bold steps are needed. The inference is that the cosy world of publishing will be transformed by the impact of a number of significant changes. The academy has been typified by stable institutions but with increasing costs being faced. Whether they can remain islands of stability within a sea of violent storms remains to be seen. An even more futuristic, and perhaps alarming, view of scholarly communications has been offered by Prof Dr Hermann Maurer from the Technical University in Graz. He described a virtual reality system where the researcher would receive information through spectacles that not only received data but also allowed the researcher to interrogate remote databases through ‘virtual’ keyboards and receive the data directly through transmission into the retina. From all of the above it can be claimed that three main forces have emerged in the 21st century.

A vision for Scholarly Communications 333

1. The first is the Network Effect that stipulates that any service becomes more valuable the more people use it. This is the so-called ‘fax phenomenon‘. In fact there is a geometric (rather than arithmetic) progression as the network takes hold. Growth can be quick as ‘viral marketing’ takes hold. 2. The second force is a two sided market – that users recognise there are several ways revenues can be earned – not just one. 3. Finally, there is the ‘wisdom of the crowds’ that postulates a diverse, decentralised approach to getting an answer. Examples include Google’s PageRank, Wikipedia, and the developing user generated media (UGM). And we can expect more. The following chart shows how some of the new and emerging services fit in with the research flow.

In summary, this is what the specific trends identified in this book have on different sections of the electronic publishing community. 26.2.1 User Behaviour • Despite the heavy emphasis on Change in all the above, there is nonetheless

a powerful inertia and conservatism within the industry, though this is being eroded by the increased use of mandates and research assessment exercises (RAEs) which are affecting author behaviour in particular • Nevertheless, Trust in the ability of the publication system to ensure, through an accepted, recognised and transparent quality control system, that the wheat is separated from the chaff. This has a moderating influence on the impact of external changes. • Each subject discipline has its own way of cultural adaptation to change. Some are evolutionary in their approach, some are revolutionary. In essence the more

334 Summary and Recommendations

• •

• • •

specialised and high-tech focused, the more the subject area has adopted electronic publishing systems. However, an important road to innovation will be brought about by the increasing reliance on multidisciplinary research The fulltext of articles will be far easier to access by 2020, not that they will necessarily be read in their entirety by users. The role of the article will become more a ‘Record of Science’ rather than as a primary communication medium. Subscription and service fees will survive but researchers will rely on service provision and will avoid relying on reading content alone. Researchers will seek answers from a wide range of digital content and services covering text, data, video, structures, modelling, software etc. ‘Digital natives’ or the Millennium generation will become significant drivers towards community-type information services (though publishers claim that the young are currently even more conservative than the old – with their need for establishment recognition – but this may change as mechanisms for research assessment become more metric-based). 26.2.2 Effect of government intervention

• The importance of scientific and technical research as a major contributor to the

innovative juices of the national economy has been recognised in recent years. • This has led to more questions being asked about the efficacy and value of the

STM publication process by governments in the US, the UK, Germany, Australia and the European Commission in particular. • Private charities (such as the Wellcome Trust) and private benefactors (Soros) have taken the lead in providing funds for alternative publication models. • Public-funded institutions such as Research Councils have also followed suit by demanding control over the visibility and therefore impact of research results stemming from their funding. • Research Assessment exercises will remain important. 26.2.3 New information service requirements • How the electronic publishing industry copes with the management of data

and datasets will be crucial. Huge amounts of data will be generated in some STM (and social science and humanities) areas that will require coordinated and professional management. According to Tim O’Reilly, “data will be the new Intel-inside” by 2020. Data and datasets must be greatly improved in areas such as data integration, interoperability, annotations, provenance, security and exporting/importing of data in agreed formats • Blogs, wikis and Mashups will become an increasing part of the information scene. Projects such as Connotea and others which use shared tags will be more in evidence. • The Web 2.0 community will encroach on scholarly communication giving rise to new ‘social networking’ and ‘social collaboration’ forms of publication. Grassroot ‘communities’ will emerge with their own informal communication services.

A vision for Scholarly Communications 335

• Research will become process-driven, with support services at all stages of the

research cycle being provided and electronic lab notes (ELN) emerging as a recognised and acceptable digital format. • Collaboratories will become global as e-Science takes hold (such as in signalling gateways) with a few large centres dominating specific research or discipline fields. • They will incorporate software which allows rapid and automated analysis of the data and Sharepoint which supports collaboration within the research team. • Meanwhile the complaints against the commercial publishing model and the high prices/profits will continue. 26.2.4 Market Trends • In many respects the research article will become a ‘live’ document and will be

• • •

•

•

•

assessed by different types of review mechanisms, one of which could emerge from the Amazon experience – a community-based review mechanism. There will be increasing focus on how to measure the scientific value of research output, using combinations of citations, downloads and other metrics. Silos of branded information (journals) will gradually be submerged within the broader base of information products and services. Seeking site ‘stickiness’ will no longer be a positive activity – seeking collaboration from the community will emphasise the need to generate ‘brand loyalty’ for the site but as part of a network The ‘Long Tail’ describes the new market environment, offering potential for new business models based on serving the occasional user of information (the current ‘disenfranchised’)? The current levels of disenfranchisement will be broken down further by more powerful search engines and micro payment systems for required objects. In some disciplines a greater market for specialised information and services will emerge as open access takes hold. However, not all information will be freely accessible – authors may prevent early exposure of their research data and findings pending their being fully mined for present and future research fundings.

26.2.5 The Information Process • New forms of peer review will emerge based around concepts such as the

‘Faculty of 1,000’ and the general trend towards social collaboration • Linking between sources and formats will be essential. Linking and pointing to

remote services will become a core component of the new information structure • Creation of quality metadata needs to be ensured and incorporatedinto the new

information chain as a distinct and valuable service. It is unclear whether authors, librarians or the community at large will figure prominently in providing such support. • Once the silos of journals are eroded, data and text mining will become ubiquitous as more and more material is available without restriction and free. • The research process is a lengthy undertaking, from the original conception of the idea, through identifying sources for grants, undertaking bibliographic

336 Summary and Recommendations

research, downloading data and software with which to manipulate the data, communicating with peers and related research teams, producing the research results and publishing the conclusions, whilst also making some allowance for storing the specific data and datasets which underpinned the research. All these steps are information-related and become part of the integrated work/information process. • The role of data and text preservation, curation and archiving will become ever more important and in some cases provide a basis for additional value-added services. 26.2.6 Business Models • Despite strong advocacy, the migration towards self-archiving within Institu-

tional Repositories has not happened yet. IRs will not catch on unless intensive mandating is undertaken. If the ‘tipping point’ is reached and take-off occurs, the harvesting of IR material will pose a substantial challenge to the future of publishers. • Subscriptions to information services will take over from subscriptions to journals. The extent of value-add provided will be crucial in determining subscription price and market acceptability. • Generating ‘hits’ for free access to full information will lead to brand recognition, eventually transforming the business model from a subscription base to one that supports advertising and potentially other revenue sources. • Totally new ways of delivering scholarly information will emerge 26.2.7 New Products and Services • Though slow to develop, handheld devices for receiving information are by no

• • •

•

•

means dead, particularly within Asian markets. Delivery of research alerts will come through ubiquitous personal tools (mobile phone derivatives) and RSS services, both of which will improve. Future demand will be for relevant and customised answers, not unprocessed information chunks Publication of ‘live documents’, which involves the integration of the article with the original live data, and interactive updating, will emerge. Blogs and other such services will also become more prevalent and analytical tools will be provided. Weblogs and wikis, it is claimed, will have particular relevance for sharing laboratory protocols using OpenNetWare. The use of RSS feeds to keep researchers updated and their participation in global conferences and podcasts is also relevant in some instances. Openness and freedom are proving to be the agents for change in peer review for datasets. Services such as Swivel are free, as is ManyEyes from IBM, Google’s Gaprinter and Metaweb’s Freebase. Then there are services, such as Nature’s Connotea and CSA’s Illustra, which are perhaps still looking for more conventional ways to achieve a return on their investments Mashups will emerge as datasets are combined in novel ways to produce entirely new information and research findings in the scholarly arena.

The future role of the Publisher 337

26.2.8 Stakeholders • Arguably, there will be fewer book and journal publishers by 2020. Mergers

•

• •

•

and acquisitions will continue as large commercial publishers seek to achieve optimal economies of scale. The new publishers will become information service providers, enveloping existing products within new value-add environments. There may be few large global players able to make such a transfer. Investments will be high and content ownership low. Learned societies will refocus on ‘community support’ information services in their subject areas, establishing portals and community communication projects (blogs, wikis). They will be empowered by Web 2.0 applications Librarians will become information support professionals. Libraries will be required to ensure ethical standards are observed, privacy issues are monitored, plagiarism is avoided, information re-use is approved – all issues of compliance. Massive-scale computing will create a need for massive-scale librarianship. ‘Participatory librarianship’ needs to embrace all future forms of scholarly ‘conversations’. 26.2.9 Legal issues

• ‘Clickable licences’ will take over as pioneered by Creative Commons and Dig-

ital Commons • Challenges to the use of material which has been copyrighted by publishers will

diminish as the academy becomes more involved with the open access movement, open source developments and the publicly funded origin of research material. The above suggests that in five to ten years time the existing paradigm will be unrecognisable, and new ‘user-focused’ and targeted information services will arise. Content is King will be partially dethroned, and new princes are being groomed in the wings around the value-added services concepts.

26.3 The future role of the Publisher Currently, scholarly publishers seem to be retrenching, emphasising their claimed right over copyright (which is being weakened by new licences) in preventing widespread distribution of works published by them. Without the force of IPR and copyright behind them, their reliance on content to provide the high margins of the past becomes questionable. The smaller publishers, particularly the smaller learned societies, face the biggest challenge. Their future is being compromised partly through the market reach of the large commercial publishers and partly through the emergence of the open access publishing initiatives. The large commercial publishers may be able to weather the upcoming storm better than the smaller publishers because they have the investment capability to migrate into new high-tech information services. The strategies being implemented by Thomson (Web of Science and Web of Knowledge), Elsevier (Scopus, Scirus)

338 Summary and Recommendations

and Nature (Web 2 developments) are indicative of a different product strategy for the future. New business models will determine how viable scholarly publishing will remain. As open access takes hold, will advertising income be attracted as a replacement? As subscription and Big Deals falter, will micro-payments and ppv become significant? How much revenue will the premium services which the larger publishers may be willing to invest in produce, and will it fill the gap created by declining subscriptions? Is the ‘Long Tail’ of market demand relevant for specialised scholarly material. Will it open up new opportunities for a dumming down form of research publications? The net effect would seem to be that the margins currently available to the scholarly publishing industry will be eroded, and the problem publishers face is how they and their stockholders will adapt to the new situation.

26.4 The future role of Libraries As the above suggests, the publishing landscape is likely to become more complex than ever before. There is good quality information that professionals need that is available for free, but it is mixed on the Internet with tons of junk. Most of what researchers need is still expensive and it is not clear how quickly that is going to change. Meanwhile, there has been an explosion of different types of resources – point of care tools, online textbooks, evidence-based databases – so we are not looking solely at online journals any more. If cost-effective use of the time, energy and talents of researchers is to be achieved, there is a need for a professional librarian (perhaps in another guise) to help make sense of this increasingly complex information space. Researchers do not need somebody to manage the library – they need someone to help make sure that the researchers and users have the best information available, in the right place, at the right time, in the most cost-efficient way. In this highly competitive academic and research sector, the community can ill afford to have such a traditional librarian in support. As commented on Liblicence in late 2007 by Bernie Sloan, “. . . if this sort of trend continues will it gradually begin to marginalize the library, bit by bit? In other words, if more information becomes available freely will that lead people to think they need the library less?” According to T. Scott Plutchak, (Director, Lister Hill Library of the Health Sciences University of Alabama at Birmingham) of course it will. But that is been happening piecemeal for years now. People do need “the library” less, but they may need the new skillsets of librarians more than ever. According to Plutchak, “One of my gripes with the Library 2.0 crowd is that they’re not radical enough. For all of the chatter about embracing change and embracing the users and becoming more participative and making use of social software and social networks the focus is still firmly on the success of “the library”. How do we make the library relevant, how do we make it a cool destination, how do we make sure that people are using those resources, etc., etc., etc. . . . If we were really focused on what the people in our communities need, we’d quit talking about “the library” altogether”.

The future role of Libraries 339

Future activity does not take place in the library building – the ‘new’ librarians will be increasingly in the faculty, participating in curriculum meetings, teaching in the lecture halls, holding office hours in the student lounges. That is where the new librarians belong. However, some librarians are also seeking to bolster up traditional expertises and roles in the new electronic environment. They seek to build on their traditional professional experiences whilst taking into account a new set of informatic market needs. Will they be able to find a role in the migration towards Web 2.0 and the semantic web, in the development of ontologies and creation of quality metadata to enable targeted access to the world’s STM information resource? Will the body of professional training cope with this change in approach? Or will this yearning for control, order and structure over vast quantities of unstructured information be the final nail in the coffin for the traditional librarian? If so what is the future essential role of libraries and librarians? For libraries it involves buildings and managing physical collections, tied up with physical space. For librarians it could be managing the knowledge base. This gets past the notion of their being custodians of space and physical buildings. The information network and community has become complex. Managing this has become critical to all librarians. Within the stm world, notably within biomedicine, the gold standard has been the individual, peer-reviewed article. This has been honed to the peak of perfection, and the librarian has a role in disseminating these. However, the peer reviewed individual article is no longer so relevant. The article has been transformed. It has become part of a network, with links to other databases. The research article is often the gateway or portal into a world of simulations, data analyses, modelling, etc. Though the article has become richer in its evolution, it has become less essential as a standalone entity. There are many other items that now compete for the attention of the researcher. In the biomedical area there is Genbank with its online community that contributes data and findings directly into the datasets. For many parts of the community this has become the primary means of research communication. Writing articles has become a secondary reason. These new data resources are being created, organised and supported often by the research community itself rather than the librarians. Some librarians look to the Institutional Repositories as providing them with a new purpose. However, so far librarians have been concerned that the rate of deposition in institutional repositories has been low (without mandate), and they see IRs as at best a limited success. The IRs are good places for all the digital ‘grey literature’. Applying metadata to such items could offer this new role for librarians, metadata which enables the grey literature to be captured by the search engines. This grey literature could then offer further competition to the research article. Wikis, blogs and social publishing will also have some impact for the librarian, the extent of which is currently unclear. It is however na¨ıve not to assume that it will have some role, particular for the future researchers who are currently aged under 20. So what will be the librarians function given the challenges to the current modus operandum of researchers and the changing nature of the formats for information dissemination? They will become:

340 Summary and Recommendations

• Stewards of the institution’s information needs. This will no longer be there just

to buy or licence information products. The traditional library funds are being used in other ways. • Navigators through the information morass. • Evolve partnerships with the faculty and students. Particularly involved with the authors and faculty in a much more proactive way. • Developers and implementers of new services to support the diverse constituency they support.

26.5 The future role of Intermediaries Aggregators and Intermediaries have had fluctuating fortunes over the years, and essentially been faced with disintermediation as publishers sought to bypass them and gain direct access to users. In the new Internet environment they are reinventing themselves based around the tools provided by such services as aggregated Mashups (mixing the API from different services), social bookmarks, and most important of all, new search and navigation tools. But disintermediation still looks them square in the face as it did in the later days of the print-only paradigm. They will need the flexibility to cope with different functions that supported their former existence. Aggregation is no longer the important role it once was (for subscription agents) as interoperability and linking come to the fore. Subscription and licensing consolidation (again performed by subscription agents) will be overtaken by pay-per-view using micropayments styled on the iTunes or similar models. The institutional structures of publishers, libraries and intermediaries will all give way to a service-focused set of organisations, some more ‘virtual’ than others, but few requiring the huge commitments to buildings or staffing levels which existed in the past. The future is challenging. It is also exciting, particular for those with the imagination and energy to run with the new potential which is opening up. Electronic Publishing is learning to find itself a new role in the new millennium, and this search for new roles is not for the faint of heart.

References

Books Tapscott, D. and Williams, Anthony D. “Wikinomics – How mass collaboration changes everything”, Portfolio, 2007. Evans, Philip and Wurster, Thomas “Blown to Bits – How the new economics of information transforms strategy”, Harvard Business School Press, 2000. Tenopir, Carol and King, Donald W. “Towards Electronic Journals – Realities for scientists, librarians and publishers”, Special Library Association Publishing, 2000. Morville, Peter “Ambient Findability – what we find changes who we become”, O’Reilly, 2005. Keen, Andrew “The Cult of the Amateur – How today’s internet iskilling our culture”, Doubleday/Culture, 2007. O’Hara, “Trust – From Socrates to Spin”, Icon Books, 2004 Gladwell, M. ‘The Tipping Point – how little things can make a big difference’, 2000 Surowiecki, J. ‘The Wisdowm of the Crowds – why the many are smarter than the few’, Little Brown, 2004 Anderson, C. ‘The Long Tail’ Random House Business Books Battelle, J. ‘The Search – How Google and its rivals rewrote the rules of business and transformed our culture’, 2005

Articles MacRoberts, M.H. and MacRoberts, B.R. “Problems of Citation Analysis: A Critical Review”, Journal of the American Society for Information Science, 40 (5): 342–349, 1989. Wates, E. and Campbell, R. “Author’s version vs. publishers’s version: an analysis of the copy-editing function”, Learned Publishing, Vol 20, No 2, pp 121–129, April 2007. Hardin, G. ‘The Tragedy of the Commons’, Science, 1968. Hey, T. and Trefethen, A. “The Data Deluge: An e-Science Perspective,” Grid Computing: Making the Global Infrastructure a Reality. (Chichester: Wiley, 2003), pp. 809–824. http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/research/esci/datadeluge.pdf

Lynch, C.A. “Institutional Repositories: Essential Infrastructure for Scholarship in the http://www.arl.org/newsltr/226/ir.html

342 References

Statistics Science and Engineering Indicators, National Science Board UNESCO Statistical Handbook, 2000 Association of Research Libraries (ARL) Statistics UK University Library Spending on Books, Journals and e-Resources, 2007, update, The Publishers Association

Reports National Science Foundation Office of Cyberinfrastructure – http://www.nsf.gov/oci Lyon, L. “Dealing with Data: Roles, Rights, Responsibilities and Relationships (Consultancy report),” UKOLN and the Joint Information Systems Committee (JISC), 2006. http://www.jisc.ac.uk/whatwedo/programmes/programme digital repositories/ project dealing with data.aspx

Figures and Tables

Page Figure 2.1 Figure 2.2 Figure 2.3 Figure 2.4 Figure 2.5 Table 2.1 Table 3.1 Table 3.2 Figure 3.1 Figure 3.2 Figure 5.1 Table 5.1 Figure 5.2 Table 6.1 Table 6.2 Table 6.3 Figure 6.1 Figure 6.2 Figure 7.1 Figure 7.2 Table 7.1 Table 7.2 Figure 7.3 Figure 7.4 Figure 7.5 Figure 9.1 Figure 9.2

US Academic R&D Expenditure and ARL Library Budgets 1982 Constant Dollars Library as % of total institutional spend, 1992–2005 UK Higher Education establishments Theory of the Long Tail The “Disenfranchised” information sector Thinkforce employees in OECD countries per thousand, 2004 Developments in the Information Culture Breakdown of profiles of researchers (1991/92) Shortfall in UK Higher Education Library Budget Expenditure Changes in motivations of authors over a 10 year period Differentials in motivation change over ten years Professional Information Markets Publishers by Size (in US $ ‘000) Porters Competitive Forces Top Eight STM Performers in 2006 ($ millions) The Life Cycle of Scholarly Communication Stages in the Emergence of a published article The Valley of Death Proportion of library expenditure within the ‘old’ universities in the UK Journal Growth 1665–2006 Article Growth 1981–2005 Average number of articles, article pages and total pages per journal title per year: 1975, 1995, 2001 Spending on UK journals by UK Higher Education sector (2004/5 Amount of readings per Scientist Time spent reading Trends in ILL acquisitions at US Research Universities Market share of Publications in OECD countries, 2000–2004 Percentage of manuscripts deposited in Australian IRs

12 13 20 22 23 27 31 35 39 41 66 68 70 77 80 85 90 94 104 105 106 108 109 113 123 134 138

344 Figures and Tables

Table 10.1 Table 10.2 Table 10.3 Table 10.4 Table 10.5 Figure 10.1 Figure 11.1 Figure 18.1 Figure 25.1

Expenditure on R&D in UK by source of funds 2000 Number of articles read per annum Time spent reading an article Proportion of articles read by respondents Article Downloads by subject area Usage of preprints versus Articles by Astronomers, 2005 Main forms of open access of scholarly material Average authorship per article Impact of new developments on EP, 1995–2015

141 146 146 146 147 148 162 267 323

Index

(C)LOCKSS 280 ‘X’ generation 43, 287 ‘Y’ generation 43, 287 “cloud” of computing power ‘disenfranchised’ 335 1976 Copyright Act 119 1CATE 235

295

AAP/PSP Statistics Programme 214 abstracts and indexes 229 Academic Press 15, 222 Academic Publishing in Europe 88, 173 Academic R&D 133 academy 26 Academy of Sciences 79 Access Grid 263 Accountants 21 Adobe Systems 217 AdSense 294 advertising 331 AdWord 294 aerospace 30 Africa 132 All Hands conference 268 Allergan 143 ALPSP 78, 83, 89, 164, 211, 216, 222 AltaVista 289 Alternative review procedures 89 Amazon 19, 29, 98, 289, 298 Amazon’s ‘Kindle’ 276 Ambient Findability 293 American Astronomical Society 111 American Institute of Physics 148 American Physical Society 79, 172 AMRC 149 Anderson, Chris 19 Andrew W. Mellon Foundation 52 Android 277 Answers.com 30 AOL 27

AP Print and Electronic Access Licence 15 APE 173 APPEAL 15 Apple 277 Appropriate Copy 234 archaeology 253 Archive 104, 115 Archiving and Preservation 279 ARL 93 ARL statistics 93 Armbruster, Chris 220 article economy 7 ArticleLinker 235 Arts and Humanities 152 Arts and Humanities Data Service 254 Arts and Humanities Research Council 152 ArXiv 84, 148, 165, 174 Asia 136 Association of American Publishers 196, 197 Association of Learned, Professional and Society Publishers 211 Association of Research Libraries 93, 108, 111 Astronomy 148, 243 Athens 231 Atkins, Dan 250, 264, 331 atmospheric science 253 Atypon 216 AT&T 277 Australia 60, 138 Australian National University 219 authentication 25 author pays 18 authors’ websites 42 Auto-Summation 237 Automated Content Access Protocol 224 Autonomy 26 avatar 227

346 Index B2B 92 B2P 92 Baldwin, Christine 78 Bali Global Warming conference 227 Banks, Peter 196 barriers to competition 191 Battelle, John 294 BBSRC 149, 186 Beckett, Chris 42 Becta 233 Bekhradnia, Bahram 57, 204 Bergstrom, Carl 60 Bergstrom, Ted 82 Beringer, Sir John 206 Berkeley Center for Studies in Higher Education 37 Berlin 162 Berners-Lee, Sir Tim 247, 271 BERR 100 Bertelsmann AG 219 Bethesda 162 Bethesda Statement on Open Access Publishing 186 bibliometrics 59 Big Deals 15, 36, 54, 82, 121, 191, 220, 330 bioinformatics 31, 243 BioMed Central 163, 171 biomedics 31 BioOne 216 BioSciences and Medicine 149 biotechnology companies 114 Biotechology and Biological Science Research Council 253 BL Direct 124 Blackberry 153 Blackwell 150 Blackwell Publishing 76, 77, 219 Blackwell Scientific 216 Blake, Julia 260 BLDSC 54 blogosphere 313 Blogs 97, 290, 309, 330, 336 BlueObelisk 273 Boersenverein des deutschen Buchhandels 125 Bollen, Jan 51, 60 Bollen, Johan 52 Book digitisation 298 born digital 107 brand 71, 74 brand identity 95 Brand recognition 221 Brazil 131 Brin, Sergey 293 British Library xix, 43, 121, 123, 124, 147, 225, 238, 264, 280, 299

British Library Document Supply Centre 54 British Library Documer Supply Centre 31 Brown, Gordon 205 browsers 38 Budapest 162 Budapest Open Access Initiative 162, 186 bulletin boards 41 Business Model 5 Butler, Declan 244 BuyerZone 221 Cambridge University 151, 272 Cambridge University Press 223 Campbell, Robert 86 Canadian Competition Bureau 221 Candover 219 carbon footprints 227 CD-ROMs 2 CDL 37 cellphone 236, 276 Centre for Strategic Economic Studies 176 CERN 99, 173, 185, 186 CERN Large Hadron Collider 263 Certification 103, 114 certification 191 Chemical Biology 153 chemists 31 China 77, 131, 132, 137 CIBER xix, 34, 37, 39, 49, 136, 146, 150, 170, 179 Cinven 219 CISTI 54, 121 Citation Analysis 47 citation analysis 36 citation data 74 citation impact factor 48 Citizendium 311 City University, London 37 Classic Scientists 31 Classified advertisements 182 Code of Practice 50 cold fusion 47 Coles, Bryan 39 Collaboratories 22, 98 Collection development 34 commercial ethic 30 common good 11 common land 11 Community-based Data Interoperability Networks 250 Competitive Enterprise Institute 196 compliance 45, 50

Index 347 Comprehensive Spending Reviews 136 computer science 41, 42 conference proceedings 56, 235 conferences 41 Conjoint Analysis 177 Connected 32 connectors 16, 32 Connotea 43, 303, 330, 334 contagious 16 copy protection 231 copyduties 120 copyright 5, 18, 45, 120, 129, 332 copyright infringement 119 copyright transfer 199 Cornell University 165 corporate research 30 Corporation for National Research Initiatives 234 Cotterill, Matthew 171 Coulthurst, Sarah 151 COUNTER 34, 50, 230 Creative Commons 5, 18, 127 Creative Commons licence 102 CrossRef 226, 230, 235 Crow, Raym 215 CrystalEye 273 crystallographic data 273 Crystallographic Information File 236 crystallography 253 cultural buckets 19 Current Awareness and Alerting 225 CWTS (Leiden) 58 Cyberinfrastructure 250 cyberinfrastructure 248, 264, 331 D-Space 128 Data 243 Data and datasets 7, 334 Data Curation 252 data deluge xvi, 249 data graveyards 243 data mining 257, 258 Data providers 45 Data Webs 247 Database of Open Access Journals 164 datasets 5, 84 Davis, Sir Crispin 123, 221 decay curve 147 deep log analysis 116 deep web 229, 247 Del.icio.us 97 Delayed Open Access 330 Demography 5 Department of Health 149 Department of Innovation, Universities and Skills 132

Department of Trade and Industry 73, 100 Department of Universities, Innovation and Skills 100 Depot 169 Derek de Solla Price 104 Derivative works 14, 259 DEST 139 deterioration of digital media 280 detriment of small publishers 220 Deutsche Forschungsgesellschaft 207 Dezenhall, Eric 195 diabetes research 331 Dialog 289 Digital asset management 236 digital footprint 34 digital grey literature 234 digital native 3 Digital natives 43, 334 Digital Object Identifier 234, 235 digital osmosis 39 digital preservation 96, 194 Digital Resource Management 231 Digital Rights Management 223 digitisation 96 DigiZeitschriften project 207 dinosaurs 1, 3, 18 disenfranchised 24, 122, 161 disenfranchised knowledge worker 110 Disenfranchised Researchers 20 disenfranchisement 83 disintermediated xvi disintermediation 340 Display advertising 182 Disruptive competitors 70 Disruptive technologies 70 Dissemination 104, 114 Distance learners 21 DIUS 136 Doblin Inc 157 doctoral theses 273 Document Delivery 14, 54, 120 document delivery service 87 Document Downloads 49 DOI 84, 230, 234, 330 DoubleClick 295 Dow AgroSciences 315 Drexel University 109 dynamic ontologies 26 Dynamic Query Modification 237 dysfunction 17 e-Book phenomenon 115 e-Depot dark archive 280 E-Ink 98 E-Ink system 276

348 Index e-journal 107 e-Journals in industry 113 e-Laboratory 330 e-Legal Deposit 14 e-legal deposit 96 E-mail 182, 290 e-Research 5 e-Science xvi, 5, 331 eBay 158 EBI 34 Ebrary 117 economics 42 economies of scale 220 eContentplus 198 EDitEUR 223 editorial processing centers 107 educational publishing market 221 Eduserv 231 Eigenfactor 60, 179 eJUSt report 35 Electronic Laboratory Notebooks 245, 247 Electronic Publishing Services Ltd 61, 73 Electronic Resource Management 51 electronic rights management 223 Elsevier 39, 44, 54, 55, 106, 125, 161, 183, 211, 230 Elsevier’s ScienceDirect 39, 78, 118 EMBL Nucleotide Sequence Database 244 Emedia 221 Emerald publishing 38 Emerging Technology Conference 305 Encarta encyclopaedia 296 Encyclopaedia of Earth 312 Encyclopaedia of Life 311 Engineers 21 Entrepreneurs 21 Environmental Science 244 EPIC2015 29 epidemics 16 EPS 76 EPS Market Monitor 76 Ericsson 143 eScholarship repository 37 ESFRI 264 Esposito, Joseph 53, 216, 220 ESRC 186 Ethernet 230 Eurekster 308 Europe 132, 136 European Bioinformatics Institute 261 European Commission 18, 95, 221 European Commission FP7 einfrastructures 188 European Digital Libraries 95

European Molecular Biology Laboratory 261 European Research Advisory Board 193 European Research Council 193 European Strategy Forum on Research Infrastructures 264 European Union 132 EV-DO phone technology 277 Evans and Wurstler 289 Evans, Philip 230 Evidence Ltd 58 exclusive licences 199 expenditure on research libraries 12 Experimenters 32 Extensible Markup Language 218 FaceBook 4, 27, 43, 158 Faculty of 1,000 191 fair dealing 122 fair use 122 falling share of the library budget 13 FAST 26 Faxon Institute 31 Federal Communications Commission 240 federated search engines 237 federated search system 258 federated searching 26 flickr 118, 236 folksonomies 102, 310 Follett Review 35 Follett, Sir Brian 35 FOO Camp 305 Forrester Research 21 frustration gap 4, 12 Funding Agencies 5, 99 Garfield, Eugene 48 Genbank 339 GeneExpression Database 244 general public 22 genome sequences 151 Geosciences 153 German Copyright Act 126 Gilder’s Law 230 Ginsberg, Paul 148 Ginsparg, Paul 165 Gladwell, Malcolm 16 Global eBook Survey 117 Globalisation of Research 131 Gold Route 18 Gold Route to open access 163 Golm 186 Google 2, 19, 24–26, 29, 52, 66, 73, 117, 119, 143, 157, 158, 180, 230, 240, 291, 292, 298

Index 349 Google Generation xv, 287 Google Print 293 Google Scholar 293 Google’s Open Handset Alliance 277 GooglePlex 296 GoogleScholar 170, 230 Googlzon 29, 297 Gordon and Betty Moore foundation 163 Gphone 277 Green Route 18 Green Route to open access 165 Greenboro University 174 grey literature 5, 25, 97, 230 Grid xvi Grokker 238 Gross national product 131 Guedon, Jean-Claude 162, 185 h-Index 61, 179 halls of residence 116 halo effect 48 handheld devices 336 Handheld or mobile devices 275 Handle technology 234 Harcourt Education 67 Hardin, Garrett 11 Harnad, Stevan 55, 91, 99, 162, 185, 193, 194 Harrington, Richard 221 Harvard problem 234 Harvesting 170 HEFCE 149 HEFI 205 Heuer, Rolf-Dieter 173 Hey, Tony 243 high energy physics 99, 172 Higher Education Funding Councils 15 Higher Education Funding Councils of England, Wales, Scotland and Northern Ireland 56 Higher Education Policy Institute 57, 204 HighWire Press 216 Hindawi Publishing 172 Hirsch, Jorge E. 61 Houghton, John 74, 138, 161, 176, 187 Hugh Look 290 hybrid journals 86, 163 hybrid system 2 i2010 digital library 95 IBM 143 Ideagoras 144 IDEAL 15 IHS, Petroleum Geo-Services

44

Impact Factors 179 Impact of Author Self-Archiving 215 Imperial College London 39 India 77, 131, 132 indicator chasing 57 Indifferent 32 Industrial R&D 133 inertia and conservatism 333 Informa 44, 219 Information Anxious 31 information communications technology (ICT) 229 Information Extraction 257 information gatekeepers 33 information overload 32 Information Retrieval 256 Information Zealots 31 InfoTrieve 121 Ingenta 216 INIST 54, 121 InnoCentive 144, 314, 315 Institute for Scientific Information 48 Institute for the Future 35 Institute of Physics Publishing 88, 148 Institutional Repositories 14, 91, 330, 336 institutional repositories 139 institutional repository 18, 60, 95, 97, 248 Institutional-based repositories 165 Intel 230 intellectual property rights 18 intelligent design 47 Interactive open access review 89 Interlibrary Loans 14 International collaboration 185 International Council for Scientific and Technical information 151 International Digital Electronic Access Library 15 International DOI Foundation 234 international framework agreement 125 International Publishers Association 224 International STM Association 38, 40, 125 International STM Publishers Association 211 International Union of Crystallography 236 Internet 8 Internet Advertising Bureau 182 Internet Archive 279 Internet2 Middleware Initiative 232 INTEROP 250 Intute 207

350 Index iPod 236 Iron Mountain IRs 14 IT 30 Ithaka 79 Ithica 280 iTunes 340 IUCr 236

253

J-STAGE 216 Japan 132 Jarvis, John 123 JISC xix, 43, 50, 165, 169, 207, 223, 233, 264, 331 JISC and SURF’s Licence to Publish 128 JISC RepositoryNet 207 Joint Information Services Committee 100 Joint Information Services Council xix Jones, Adrian 153 journal access problems 74 Journal Citation Index 48 Journal Citation Reports 61 Journals and Research Productivity 215 JSTOR 207 Jubb, Michael 94 Kahle, Brewster 279 Kahn, Bob 234 Kelly, Kevin 118 Key Perspectives Ltd 94, 148, 169, 245 Kindle 98 King 35, 105 King Research 109 King, Donald xix, 33, 144 Kluwer Academic Publishers 219 KNEWCO 273 knowbots 258 knowledge base 96 knowledge worker 21, 23, 122, 135 Knowlets 273 Koninklijke Bibliothek 280 laboratory book notes 43 Lackluster Veterans 32 Law of the Few 16 learned society publisher 220 learned society publishers 15, 78 Lessig, Lawrence 18, 127 Library of Alexandria 279 Library of Congress 65 Life Cycle of Scholarly Communication 80 life sciences 41

Ligue des Biblioth`eques Europ´eennes de Recherche 200 Lilly, Eli 315 Linden Dollar 227 Linden Laboratory Research 226 Linking 335 Linux 26, 143, 241, 277 Lippincott, Williams and Wilkins 125 Lisbon Strategy 189 LISU 93 Livres Hebdo 67 Lloyd, William Forster 11 LOCKSS 280 Long Tail 18, 28, 104, 298 Lorcan Dempsey 31 Los Alamos 165 Los Alamos National Laboratory 34, 50, 99 Los Alamos National Library 52 Lynch, Cliff 174 Mabe, Michael xix, 40, 55, 211, 225 Macmillan/Holtzbrink 224 MacRoberts, Barbara 48 MacRoberts, Michael 48 Managers 21 mandating 336 Mandatory 168 ManyEyes 336 Market Estimates 75 market size 66 markup language 217 mash-ups 102, 119, 180, 265, 309, 340 Massachusetts Institute of Technology 128, 272 mathematics 42, 147 Maurer, Hermann 276 mavens 16 Max Planck Gesellschaft 186 Max Planck Institutes 162 Max Planck Society 207 Mayflies 57 McKiel, Allen 117 MDL 261 Medical Research Council 149, 186, 206, 247 Medicine and Health 244 Mellon, Carnegie 298 membership dues 171 Merck & Co 143 Mercury Enterprises 31 Meredith, Barbara 196 mergers and acquisitions 8, 219 Mesu 52 MESUR 34, 49, 179 MESUR/LANL 51 metadata 25, 223

Index 351 metadata standards 258 Metaverse 227 Metcalfe’s Law 231 Metcalfe, Robert 230 METIS 260 metric 56 micropayments 340 Microsoft 22, 157, 180, 238, 274, 296, 299 Microsoft Live Academic Search 22 Microsoft Vista 233 Millennium generation 3, 43, 334 Minutes of Science 42, 44, 104 Mobile Centrics 32 Mons, Barend 273 Moore’s Law 230 Moore, Gordon 230 Morville, Peter 293 Motorola 143 MSN 230 MSN Search Service 296 Multi author articles 106 multilingual 96 multimedia 45 Multimedia Search 238 multimedia searching 26 multiple resolution 236 Murray-Rust, Peter 217, 260, 272 museum of the Book 45 music 19 MySpace 4, 43, 97 NaCTeM 261 NASA 244 National Centre for Text Mining 261 National Health Service 149, 206 National Institutes of Health 99, 142, 149, 169, 186, 196 National Library of Medicine 128 national research and development budget 12 National Science Foundation 107, 109, 133, 136, 148, 250, 264, 331 Natural Environment Research Council 253 natural language processing 256 natural language processing technology 261 Nature Networks 303 Nature Precedings 303 Nature Publishing Group 87 Navigation 104 NCBI 34 Neal Stephenson 227 Negative Citations 57 Negroponte, Nicholas 241 NERC 248 NetGen 313

Network Effect 27, 333 networked economy 26 New York Times 118 newspapers 225 Nicholas, David xix, 39, 146 NIH Authors Postings Study 214 NIH’s Pubmed database 293 NineSigma 315 NISO 223 NOP 39 Novartis 143 Novataris 331 Nucleic Acids Research 150 O’Hara, Kieran 319 O’Nions, Sir Keith 132 O’Reilly, Tim 302, 305, 334 OAI-PMH 170, 330 OAISter 24 Oak Ridge National Laboratories 111 OCLC 31, 253 OECD 24, 100, 133, 137, 142 Office of Cyberinfrastructure 264 Office of Science and Innovation 264 OhioLINK 146, 150 OIASter 170 OINCs 31 Oldenburg 220 Oldenburg, Henry 40, 103 Oldenburgian functions 42 OmegaWiki 273 omnivores 32 OncologySTAT 183 One Laptop Per Child 241 one stop shops 292 ONIX 223, 224 ONIX for Books 223 ONIX for Licensing Terms 223 ONIX for Publisher Licences 223 ONIX for Serials 223 Online Advertising 182 Online submission systems 88 ontologies 96, 102, 262, 272 ontology 226 Open Access 14, 17, 161, 329, 330, 332 open access initiative, protocol for metadata harvesting 170 Open Content Alliance 180, 298, 299 Open peer review 303 Open Society Institute 162 OpenAthens 233 OpenDOAR 201 OpenID 233 OpenURL 98, 330 OpenWetWare 330 Organisation for Economic Cooperation and Development 24, 249

352 Index Orphan Works 129 orphaned books 119 orphaned works 299 OSCAR 273 Out-in-the-Cold 31 Outsell 21, 29, 44, 66, 70, 75–77, 97, 290, 330 Outsell Inc 61 Oxford Scholarship Online 116, 117 Oxford University Press 39, 117, 150, 163, 223 PA/STM 136 Page, Larry 293 PageRank 25, 294, 333 PageRank algorithm 24 Paid for listings 182 Pardua 186 Particle physics 244, 263 Partnership for Research Integrity in Science and Medicine 197 patients 22 Patriot Act 295 Pay-Per-View 122 PDF 217 Peer Evaluation 47 Peer Project 198 Peer Review 40, 88, 335 Peers 291 perfect storm 28 persistent identifier 234 Personalisation 238 Pew 313 Pew Internet and America Life Project 32 Ph D theses 234 pharmaceutical companies 149 pharmaceutical industry 113, 144 pharmaceuticals 30 pharmacology 253 Philosophical Transactions 87, 103 physicists 31 physics 41, 147 plagiarism 87 Plutchak, Scott 97 podcasts 97, 290 Portable Document Format 217 Porters Competitive Forces 70 Portico 253, 280 Postprint 84 power browsing 44, 287 power of contex 16 Preprint 83 PRISM 197 Prisoner’s Dilemma 90 Procter and Gamble 143, 315 Product/Service concept 5

Productivity Enhancers 32 professional publishing market 77 professionals 21 Project Euclid 216 Project Gutenberg 181 promiscuous users 38 ProQuest/CSA/Bowker 76 Prospect 226 psychology 31 Public Engagement 265 public library 123 Public Library of Science 162, 163 publication paradigm 14 Publisher and Library/Learning Solutions 50 Publisher Trade Associations 211 Publishers Association 93, 136 Publishers Licensing Society 223, 224 Publishers Research Consortium 214 Publishers Weekly 67 Publishing Cooperatives 215 Publishing Research Consortia 177 Publishing Research Consortium 88 Publishing Technology Ltd 216 PubMed 36, 150 PubMed Central 128, 149, 169, 186, 196, 197 Putchak, T. Scott 251 Queensland University of Technology 169 R4L 252 radio astronomers 30 RAE 56 Really Simple Syndication 153, 309 Record of Science 18, 42, 86, 114, 334 Redding, Viviane 194 Reed Elsevier 15, 219, 224 Regazzi, John 89 Registration 103, 114 Registration Agents 234 remixability 310 Repository for the Laboratory 252 reproducible results xvi Research Assessment Exercise 48, 55, 75, 168, 205 research assessment exercises 5 Research Councils 57, 186 Research Councils in the UK 99, 142 Research Councils of the UK 73, 201 Research Councils UK 169 Research Information Center 238, 268 Research Information Network xix, 73, 94, 245, 264, 290 Research Libraries 92 Research Output Database 149

Index 353 research process 268 Research Quality Framework 60 research workflow 332 Residents 226 Resource Discovery and Navigation 289 retro-digitisation 236 Reuters 221 RIC 268 Rightscom 290 RIN xix robotic “spiders” 224 Robots 52 Robots Exclusive Protocol 224 RoMEO 201 Rowlands, Ian xix Royal Chemical Society 226 Royal Society 206 Royal Society of Chemistry 272 Royal Society of London 103 RQF 139 RSS 97, 226, 290, 306 Russell, Ian 212 Russia 131 R&D funding 4 salesmen 16 Sanger, Larry 311 Schmidt, Eric 277 Scholar’s Copyright Addendum Engine 128 scholarly information sector 6 Scholarly Information Strategies 177 Scholarly Information Systems 42 Scholarly Publishing and Academic Resources Coalition 127, 200 Scholarpedia 312 Schroeder, Patricia 197 Science and Technology Committee of the UK House of Commons 22 Science and Technology Facilities Council 253 Science Budget for 1997–2011 132 Science Citation Index 50 Science Commons 5, 127 ScienceDirect 150, 153, 291 Scientific and Technical Committee of the House of Commons 18 scientific data 331 Scientific Disciplines 5 scientific ethic 30, 142, 168 scientific, technical, engineering and mathematical 6 Scientometrics 55 Scintilla 303 Scirus 24, 230 Scitopia 78

SCOAP3 172, 174 SCONUL 93 Search Engine Optimisation 293 Search Engines 289 SecondLife 4, 43, 226 Selective Dissemination of Information 225 Self-archiving 71 Self-citing 49 semantic journal 226 semantic web 45, 102, 236, 271, 273 Sender, John 108 serials crisis 14, 32, 78, 105 serials solutions 223 SFX 235 SGML 218 Sharepoint 335 Shepherd, Peter 52 SHERPA/ OpenDOAR 201 Shibboleth 232 Shopbots 271 Shotton, David 252 shoulders of giants xix, 6 signalling gateway 99, 102 Silicon Valley 137, 303 Simba 76 Simba Information 78 Simon Inger 42 Simple Object Access Protocol 51 site licensing business model 78 Sleeping Beauties 57 Sloan, Robin 29 smart search 236 SME 21 social bookmarks 102, 340 social collaboration 143, 303, 334 social collaboration tools 5 social collaborative tools xvi social networking 45, 334 social sciences 244, 253 social tagging 180 society publishers 220 Something is Good Enough 25 Soros 334 Soros Foundation 162 Southampton 186 Southampton University 95 Sowden, Peter 93 spam 240 SPARC 200, 215 SPARC Europe 200 spectral data 273 SPIRES 174 Sponsoring Consortium for Open Access Publishing in Particle Physics 172 Springer 44, 118, 219, 223, 261

354 Index Springer Science & Business Media 163, 219 SSP 216 Standard Generalised Markup Language 218 Standardized Usage Statistics Harvesting Initiative 51 Stanford Encyclopaedia of Philosophy 312 Stanford University 18, 127 Stanford University Libraries 35, 216, 280 Stanley, Morgan 78, 280 Steele, Colin 187, 219 stickiness 16 STM 6, 18 Suber, Peter 185 Subito 124 subject-based repositories 42 Sumerian clay tablets 118 SuperBook 116 SuperBook project 116 supplementary data 88 supplementary material 5, 84 Surowiecki, James 144, 301 SUSHI 51 swickis 308 SWISS-PROT 243 Swivel 336 Synovate 44 Tapscott and Williams 143 Tapscott, Don 313 Taylor & Francis 116, 216, 219 Taylor, Graham 212 Taylor, John 263 Technical University at Graz 276 Technology 4 TEMIS 260 Tenopir 35, 105 Tenopir and King 111 Tenopir, Carol xix, 33, 53, 144 Tenopir/King research 33 tensions 3 Text and Data Mining 238 text mining 26, 255 TGS NOPEC Geophysical Company ASA 44, 77 The Bell Curve 47 The Depot 207 The European Library 96, 180 The Lonely Planet 228 The Long Tail 19 The National Archive 253 The Peer Review system 215 the pit bull 196 The Publishers Association (UK) 212

The Royal Society 39 Thinkforce 23, 24 Thomas, Eric 59 Thompson, Matt 29 Thomson Science 49 Thomson Scientific 53, 107, 179, 261 tipping point 16, 17, 169, 329 Towards Electronic Journals 33 Tracz, Vitek 123, 171 Trade Associations 211 Tragedy of the Commons 11, 220 Trethen, Anne 243 Triangulation 55 trust 258, 319, 333 U.S. Department of Justice 221 UCL Consultants/CIBER 43 UCL’s Centre for Publishing xix UK Access Management Federation 233 UK Chancellor of the Exchequer 57 UK Data Archive 253 UK Digital Curation Centre 252 UK Office of Scientific Information’s eInfrastructure programme 249 UK Publishers Association 38, 108 UK PubMed Central 208 UK Select Committee Enquiry 123 UK Select Committee of Enquiry on scientific publications 169 UK Serials Group 51 UK university library expenditure 93 UKERNA 233 UKOLN 245 UKPMC 149 Ulrich’s International Periodicals Directory 104 Unilever PLC 274 United Kingdom 136 United Kingdom’s Office of Science and Technology 263 United States 131 universal resource locators 234 Universities UK 58, 206 University College London xix, 34, 37, 146, 153 University College London/Centre for Publishing 116 University College, London 49 University of California 37, 53, 65 University of California’s Office of Scholarly Communication 37 University of California, San Diego 61 University of Lund 164 University of Southampton 50, 272 University of Tennessee 109 University presses 79, 181

Index 355 US Appropriations Act 169 USA 132 User Behaviour 5 User Generated Media 309 user-generated media 5, 302 Valley of Death 89 van de Sompel, Herbert 52 Varmus, Harald 163 venture capitalists in the City 92 Veritas DGC 44 Verity 26 Verizon 241, 277 Version control 83 Versioning 171 versioning problem 280 vertical searching 45 Victoria University 176 viral marketing 27, 293, 333 Virtual learning groups 22 virtual reality 226 Virtual Research Environment for Archaeology 269 Virtual Research Environments 269 viruses 16 visualisation 26, 265 visualisation of output 238 Vividence 294 Vivisimo 238 Vodafone 277 Voluntary 168 voluntary deposition 97 VRE 269 W3C 218 Waltham, Mary 78 Ware, Mark 88 Warwick University 35 Wates, Edward 86 Wayback Machine 279 Web 2.0 xvi, 302 Web 3.0 236 Web of Knowledge 153 Web of Science 150 Web of Science/Web of Knowledge 291

Web Science Research Initiative 272 Wellcome Trust 99, 129, 139, 149, 163, 169, 186, 208, 334 WesternGeco/Schlumberger 44 White paper on academic use of journal content 199 Wi-Fi systems 296 WiFi 240 Wijnen, Jan Willem 147 Wikinomics 143, 313 Wikipedia 26, 158, 276, 311, 333 Wikis 97, 273, 309, 336 Wiley 77, 219 Wiley Interscience 116 Wiley-Blackwell 86, 224 Williams, Anthony D. 313 Wired Magazine 19, 118, 228 wireless 275 wisdom of the crowds 27, 28, 50, 144, 301, 331, 333 Wolters Kluwer 77, 219, 221 Woods, John 264 work bench 268 work bench support 238 work flow process 267 workflow 45 World Wide Web 65 World Wide Web Consortium 51, 218 WorldWideScience.org 24 Worlock, David xix, 44 Wurster, Thomas 230 XML 218, 273, 330 XML formats 223 Y Facto 179 Yahoo 52, 157, 230 Yahoo Groups 158 Yale University 171 Yet2.com 315 Young Technologists 31 YourEncore 315 Youtube 158 Zagat

158

K .G . SAUR

New IFLA Publications at K. G. Saur Bernard Dione / Réjean Savard (Ed.)

QManaging Technologies in Developing Countries 05/2008. 217 pp. Hc. List price: € 78.00 / *US$ 115.00 Special price for IFLA members: € 58.00 / *US$ 86.00 ISBN 978-3-598-22038-8 eBook: € 87.00 / *US$ 129.00 ISBN: 978-3-598-44095-3 (IFLA Publications, Vol. 132)

Jesús Lau (Ed.)

QInformation Literacy 05/2008. 160 pp. Hc. List price: € 78.00 / *US$ 115.00 Special price for IFLA members: € 58.00 / *US$ 86.00 ISBN 978-3-598-22037-1 eBook: € 87.00 / *US$ 129.00 ISBN: 978-3-598-44094-6 (IFLA Publications, Vol. 131)

Ruth Heedegard / Elizabeth Anne Melrose (Ed.)

QInternational Genealogy and Local History

QWorld Guide to Library, Archive and Information Science Education New, completely revised edition 10/2007. 560 pp. Hc. List price: € 168.00 / *US$ 249.00 Special price for IFLA members: € 148.00 / *US$ 219.00 ISBN 978-3-598-22035-7 eBook: € 188.00 / *US$ 278.00 ISBN: 978-3-598-44029-6 (IFLA Publications, Vol. 128/129)

Roswitha Poll / Peter te Boekhorst / Sebastian Mundt (Ed.)

QMeasuring Quality. Performance Measurement in Libraries 2nd revised edition 08/2007. 269 pp. Hc. List price: € 78.00 / *US$ 115.00 Special price for IFLA members: € 58.00 / *US$ 86.00 ISBN 978-3-598-22033-3 eBook: € 87.00 / *US$ 129.00 ISBN: 978-3-598-44028-1 (IFLA Publications, Vol. 127)

03/2008. 287 pp. Hc. List price: € 78.00 / *US$ 115.00 Special price for IFLA members: € 58.00 / *US$ 86.00 ISBN 978-3-598-22036-4 eBook: € 87.00 / *US$ 129.00 ISBN: 978-3-598-44090-8 (IFLA Publications, Vol. 130)

www.saur.de

*For orders placed in North America. Prices are subject to change. Prices do not include postage and handling.