Public or Private Economies of Knowledge?: Turbulence in the Biological Sciences

Public or Private Economies of Knowledge? Public or Private Economies of Knowledge? Turbulence in the Biological Scien...

Author: Mark Harvey | Andrew McMeekin

6 downloads 1016 Views 864KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Public or Private Economies of Knowledge?

Public or Private Economies of Knowledge? Turbulence in the Biological Sciences

Mark Harvey Professor and Director, Centre for Research in Economic Sociology and Innovation, Department of Sociology, University of Essex, UK

Andrew McMeekin Senior Lecturer, Manchester Institute of Innovation Research, University of Manchester, UK

Edward Elgar Cheltenham, UK • Northampton, MA, USA

© Mark Harvey and Andrew McMeekin, 2007 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical or photocopying, recording, or otherwise without the prior permission of the publisher. Published by Edward Elgar Publishing Limited Glensanda House Montpellier Parade Cheltenham Glos GL50 1UA UK Edward Elgar Publishing, Inc. William Pratt House 9 Dewey Court Northampton Massachusetts 01060 USA

A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Harvey, Mark, 1943– . Public or private economies of knowledge? Turbulence in the biological sciences/by Mark Harvey and Andrew McMeekin. p. cm. Includes bibliographical references and index. 1. Knowledge management. 2. Technological innovations—Economic aspects 3. Life sciences—Economic aspects. I. McMeekin, Andrew. II. Title. HD30.19.H37 2007 658.4’038—dc22 2007030403

ISBN 978 1 84542 096 3 (cased) Printed and bound in Great Britain by MPG Books Ltd, Bodmin, Cornwall

Contents List of ﬁgures List of tables Acknowledgements 1.

2.

vii viii ix

Making knowledge public and private Turbulence in knowledge, turbulence in the economy Public and private knowledge: key debates Knowledge as an economic good: public or private The embodiment of knowledge: tacit or codiﬁed Types of knowledge: science versus technology States and markets: the funding and organization of knowledge production Knowledge and property rights: commons, enclosures and anti-commons Disjointed dimensions The ‘Instituted Economic Process’ (IEP) approach Economies of knowledge: instituting production, distribution, exchange and use Legal and informal institutions of appropriation Public and private economies of knowledge Analysing economies of knowledge The data explosion: the emergence of a new form of bio-knowledge What is bio-data? Technologies of data production Competition and conﬂict over genomic data The scramble for ‘keys to genes’: Expressed Sequence Tags Developing norms for public genomic data: the Bermuda Rules Who picks the ‘low-lying fruit’? The public appropriation of SNPs From national to global scales of public data: trajectories of protein databases Conclusion: emergent public economies of bio-data v

1 1 4 5 8 9 11 13 15 16 18 27 30 31 37 37 40 45 47 55 57 60 65

vi

3.

4.

5.

Public or private economies of knowledge?

Tools of the trade – trade of the tools Introduction What are bioinformatic tools? Part II Historical context: pioneers in knowledge and economies Laying the foundations: algorithms for sequence similarity and their computational implementation Markets for tools: early forms of private appropriation and distribution The tumultuous 1990s: market explosion and implosion Part II Tracking trajectories: two case studies of bioinformatic tools Lion Bioscience: evolution and extinction of an economic agent Tracking the economic trajectory of a major bioinformatic tool Unsettled frontiers Collaboration and competition: the dynamics of a genome race Introduction The competitors and the competition Let the race begin The race changes course To the ﬁrst ﬁnishing line Publication and the end of the race The twist in the tale The diﬀerentiation and interdependency between public and private Agrobacterium knowledge Postscriptome: Aspergillus niger Appendix Evolving economies of knowledge At the frontiers of biological knowledge Instituting economies of knowledge Two axes: production and use/distribution and appropriation Evolving economies of knowledge Formal appropriation institutions Knowledge and the multi-modality of capitalist economies

Appendix: list of interviewees Bibliography Index

71 71 72 75 75 80 82 88 88 98 114 119 119 120 127 129 132 134 138 139 145 149 155 155 158 162 169 177 183 187 191 205

Figures 1.1 Nelson’s axes of knowledge 3.1 The SRS universe, Lion Bioscience, database integration system 3.2 The life science informatics-centric view of the economic universe 4.1 The race to sequence Agrobacterium tumefaciens

vii

11 91 97 128

Tables 1.1 2.1 3.1 3.2 4A.1 4A.2

Stokes’s quadrants of science and technology Main historical events in the process of making ESTs public and private Acquisitions by Accelrys to create a ‘one-stop-shop’ Programs available from UWGCG Chronology of the sequencing of Agrobacterium tumefaciens genome Aspergillus niger: collaboration between public and private organizations

viii

10 50 85 100 149 153

Acknowledgements We would like to thank all our colleagues in the Centre for Research in Innovation and Competition for their contribution to discussions and debates of the issues developed in this book. We gratefully acknowledge Sally Gee for all the help and enthusiasm in undertaking some of the empirical research. Above all, we express our appreciation to Stan Metcalfe for his visionary support and encouragement in intellectual terms, but also for creating a fertile and imaginative research environment that made this book possible. He engaged directly with the ideas presented here with his own distinctive perspective, pushing us in turn to develop and clarify our arguments. We cannot thank him enough. Last but not least, we thank Miriam and Rachael for their unceasing support and Jake for providing many welcome distractions. M.H. A.M.

ix

1. Making knowledge public and private TURBULENCE IN KNOWLEDGE, TURBULENCE IN ECONOMY The genomic sequence we were producing and dealing with is more than a commodity. It is the essence of biological heritage, the instruction book for living things. . . . The only reasonable way of dealing with the human genome sequence is to say that it belongs to us all – it is the common heritage of humankind. (Sulston1 and Ferry, 2003) Strains on basic scientiﬁc values are exempliﬁed by the public–private competition that arose during the sequencing of the human genome. While this competition accelerated initial availability of the genome sequence, it did so at considerable cost to the health of the interface between science and society. (Olson, 20022) Our actions will make the human genome unpatentable. We expect this primary data will be used by us and others as a starting point for additional biological studies that could identify and deﬁne new pharmaceutical and diagnostic targets. Once we have fully characterized important structures (including, for example, deﬁning biological function), we expect to seek patent protection as appropriate. In eﬀect, this new venture is the private sector recognition of the importance of the Human Genome Project. By working closely together, NIH, DOE and other public and private institutions can help meet the goal of having a complete map and sequence of the human genome sooner than anyone ever imagined. I hope that after this hearing you view both our announcement and the federal program for which you are responsible as not an ‘either/or’ proposition, but instead focus on how these two activities working in tandem can ultimately improve our lives and those of the generations to come. (Venter, 19983) Anything that drives science forward is good. We’re all better oﬀ because we have access to the [rice] genome data today instead of waiting for some 15-year public program. (Bio-IT World, 2002)

Over the past few decades, we have witnessed an outburst of debate over the acclaimed revolution in biological science (Zweiger, 2000; Gilbert, 1991; Strohman, 1997) and, at the same time, as the above quotations demonstrate, intense controversy over what kind of economic good this 1

2

Public or private economies of knowledge?

new biological knowledge is or should be. We do not think this is a coincidence. Upheavals in production of knowledge (scientiﬁc–technical) generate upheavals in the economic order of knowledge too. This contributes to the unceasing ‘restlessness of capitalism’ (Metcalfe, 2001, 2002, 2007). Economic instability stimulates us to a fundamental questioning of what is property when it comes to knowledge. In what sense is knowledge public? How is knowledge exchanged and reproduced? What kind of knowledge constitutes tradable ‘products’? We address these questions particularly in order to problematize conceptions of the ‘public’ and ‘private’ knowledge that has provoked – and confused – much recent discussion. The most high-proﬁle example of this coincidence between scientiﬁc revolution and turbulence in economies of knowledge has undoubtedly been the race to sequence the whole human genome (Sulston and Ferry, 2003; CookDeegan, 1994; Balmer, 1996), paralleled by many other genome races, including some with more complex economic processes (Harvey and McMeekin, 2004). There was certainly a race for a landmark scientiﬁc achievement (we shall ask what kind of knowledge is involved in that achievement), but there was at the same time a race to ‘make public’ or ‘make private’ some or all of the resultant knowledge. Huge resources, either from public funding or from private ﬁnancing, were dedicated to the race, and the nature and governance of ﬁnancial resources are clearly signiﬁcant for any knowledge production. Earlier, one of the main public funders in the US, the National Institute of Health, had vigorously promoted the patenting of those keys to genes, Expressed Sequence Tags (ESTs), and also single base mutations identiﬁed as being related to particular diseases (Single Nucleotide Polymorphisms, or SNPs/‘snips’). But later the NIH reversed its position, partly in response to the outcry about the balkanization of the human genome (Doll, 1998; Dworkin, 1997). Then, as an economic intervention that deﬁes normal categorization and many assumptions about economic behaviour, Merck, a major pharmaceutical company, in collaboration with the Washington University, St Louis, created a public database for ESTs – a prime genomic resource for eventual commercial exploration – and, at great speed and expense, deposited as many ESTs in the public domain as it could, so preempting private property rights (Marshall, 1994, 1999; Venter and Adams, 1998; Caskey and Williamson, 1996; Goldman, 1998; Eisenberg and Nelson, 2002).4 Another sign of turbulence – which we will be analysing in some detail – raised the question of what is public and private in another way, namely, as a national or global public or commercial good. A European protein database that had dual public and commercial characteristics was subject to a novel type of ‘takeover’ by the NIH in order to reglobalize (Americanize?)

Making knowledge public and private

3

it, as distinct from re-‘nationalizing’ it, making the new database a solely public good, ostensibly a world-public database. At least, US ﬁrms now have access to that database on a non-commercial basis. These are just some striking examples of how the emergence of the new science raised some equally new economic questions. It poses science policy questions about what components of the new science governments should fund, and how they should set priorities. For example, what should be the balance between supporting the production of new knowledge vis-à-vis new infrastructures for distributing knowledge? It also poses questions for regulators. While the 1980 Bayh–Dole Act in the USA, allowing and encouraging university scientists to seek patents for their federally funded research, was not speciﬁcally directed at new developments in biology, it no doubt had an impact. But what sort of impact has it had on advancing biological knowledge and its applications? Signiﬁcant questions have been asked about patent law and practices too. The new knowledge resources emerging from genomics and bioinformatics have challenged patent rules determining what can be patented and what can not, especially in relation to the issue of utility. Furthermore, there have been questions over the desirability of allowing patents on genomic sequences on the grounds that creating restrictions to use would seriously impede progress. There are also important questions for those involved in industry. Where should ﬁrms place their bets in the emerging biology knowledge base? Should they seek patents on their knowledge, keep it secret or distribute it publicly? For which types of knowledge resource can markets be established and what value can be placed on tradable knowledge resources? These questions give a good indication of the importance of the issues at stake, but they imply a far more fundamental question concerning the development of capitalism. If the growth of knowledge is central to the growth of the economy, and if increasingly that growth is sustained and promoted by the growth of non-market, public knowledge (and education), what does that imply for our understanding of capitalism as essentially or predominantly a market-driven economy? The question is one of how economies are organized to advance knowledge and make use of it for economic ends, and at their core is the issue of how the public and private aspects of knowledge become instituted. This book aims to tackle the issue head on, developing a novel theoretical framework and using it to make sense of the ongoing informatics revolution in biological science and technology. We will argue that capitalist economies create divides between public and private knowledge. The speciﬁc characteristics of public and private knowledge, and the divides and interactions between them depend on how economies of knowledge are instituted. There is immense variety in this, both synchronically and diachronically. Furthermore, there can be

4

Public or private economies of knowledge?

signiﬁcant ﬂuidity in how the public and private parts of knowledge interact: there are tensions, some fruitful and some not. So it is a dynamic picture, where the evolution of economies of knowledge depends in large part on the co-evolution of public and private knowledge. The dynamism is within and across both the public and private domains, and it is this that we aim to capture in our analysis of multimodal economies of knowledge. The core thesis of the book recalls and revives some old themes of political economy in a new guise of economic sociology. At its simplest, we believe that new divisions of labour are constantly being created between production and use of knowledge. Once diﬀerentiation occurs between groups of producers and groups of users of knowledge, there is an issue of how that knowledge is distributed and exchanged between them. As with all divisions of labour, there is an interdependence between those now diﬀerentiated in their relation to knowledge as it is created and applied. At the broadest level, the division between public and private knowledge can be seen as a societal division of labour. In capitalist economies, there is a perpetual tension between diﬀerent modes of distribution and appropriation of knowledge, and thereby a source of contestation concerning the place of knowledge in economy and society. We pitch our research camps at the ﬁssure points between public and private knowledge.

PUBLIC AND PRIVATE KNOWLEDGE: KEY DEBATES Quite a lot has been written about the public and private characteristics of knowledge. Our objective here is not to provide a comprehensive review, so much as to point out how diﬀerent dimensions of knowledge have been explored to uncover directly or indirectly some of its public and private attributes. In this respect, we have identiﬁed ﬁve prominent dimensions in the literature: knowledge as an economic good, the embodiment of knowledge, diﬀerent types of knowledge, institutions of knowledge, and, ownership of knowledge. We discuss these ﬁve dimensions to provide some key building blocks for the development of our own analytical framework. In doing so, we want to draw attention to a common pattern that permeates the way that each of the debates has been structured. For each dimension, public and private characteristics have initially been conceptualized as contrasting and opposed binary categories. Subsequently, empirical realities have shown the sharp divides to be signiﬁcantly overstated, and perhaps increasingly so. As the debates have matured, the initial dichotomies have largely been replaced by notions of hybridity, blurring and interdependence. Our other objective in this section is to consider some consequences of the

Making knowledge public and private

5

diﬀerent cuts at the public and private characteristics of knowledge. The problem is that the debates often appear to be at odds with one another: the public or private of one dimension does not sit easily with the public and private of another. The terms are used in conﬂicting and confusing ways, sometimes referring to organizations, sometimes to sectors and sometimes to knowledge itself. The following discussion attempts to tease out some of these meanings. Knowledge as an Economic Good: Public or Private Over 180 years ago, Thomas Jeﬀerson made the following observation about some special characteristics of knowledge that has been subsequently marshalled (for example, David, 1993; Stephan, 1996; Stiglitz, 1999) in support of a view that knowledge has certain inherent properties that make it a public good: If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of everyone, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening mine. (Jeﬀerson, 1967 edn, p. 433)

Formalization of this view of knowledge as a public good, developed ﬁrst by Nelson (1959) and Arrow (1962), and then further explored by the likes of David, Dasgupta, Stiglitz and Foray (Dasgupta and David, 1994; Stiglitz, 1999; David, 2004; Foray, 2004) can be summarized by two inherent characteristics of knowledge: ●

●

Knowledge is non-rivalrous in use: once produced, there is no marginal cost in another individual making use of the knowledge. Furthermore, even if exclusion was possible, it would be undesirable, since the spread of already existing knowledge is costless, at least relative to the costs of producing it again. Knowledge does not get used up in use, so there is no rivalry between diﬀerent agents availing themselves of it. Knowledge is inherently non-exclusive: once produced, and especially when intentionally disseminated, knowledge is diﬃcult or impossible to appropriate privately through excluding others from using it.

6

Public or private economies of knowledge?

Establishing the public good credentials in this way has been persistently used as a ‘market failure’ argument for government support of knowledge production, principally on the grounds that ﬁrms will underinvest because appropriation is diﬃcult. Markets simply do not work well, so the argument goes, in stimulating the production and circulation of knowledge. Knowledge, by virtue of these inherent properties, is public by default – unless, as historically occurred, dedicated legal institutions are created, such as patents, which deﬁne what knowledge might be appropriated. These institutions, as will become evident, are central to the creation of the public divide within capitalist economies, and they are continuously evolving. But, even under such legal protection, knowledge is only partially and insecurely captured for private property, not only or even mainly because of legal requirements for disclosure and time limits of protection. It remains a latent or quasi public good in the sense that it is ‘out there’ especially for non-competing uses or uses that readily evade legal enforcement of private rights. But it is the reliance on inherent properties that has prompted the development of alternative viewpoints. We can present three distinctive criticisms to this view. The ﬁrst, most forcefully argued by Callon (1994), takes issue with the idea that the public status of knowledge should be deﬁned by looking at inherent properties: ‘There is nothing in science to prevent it from being transformed into merchandise’ (ibid., 402). Knowledge can always be bundled or packaged or embodied in ways that make it tradable, and enable rival uses. In this analysis, neither of the two inherent properties of knowledge hold with suﬃcient force even to make the claim that knowledge is a quasi and latent public good as the economic theory deﬁnes it. But Callon does want to retain the idea of science as a public good and as an enterprise that should receive government support. Public science produces public knowledge by creating new networks and these networks are emergent entities that require defence against tendencies that might close down the creation of variety in knowledge. It is a model of interdependency between public and private science, which seems right, but, in attacking the standard economic theory of public knowledge, much of the economic conditioning of the relationship is thrown out too. The second type of opposition to the essentialist formulation involves studies of knowledge appropriation dispelling the idea that knowledge is inherently non-excludable and at least hinting at the idea that knowledge is rivalrous in use under certain conditions. Crucially, the attack on the nonexcludability of knowledge rests on more than the observation that patents can be used as an instrument of appropriability. In these studies (Levin et al., 1987; Teece, 1986), ﬁrms in particular were shown to be quite eﬀective in appropriating their knowledge through secrecy or strategies relating to

Making knowledge public and private

7

ﬁrst-mover advantages (such as the development of complementary assets), even if only for a short period of time. Some, like Pavitt (2001) have presented an even stronger view, that it is actually very diﬃcult and costly for industrial knowledge to be transferred between diﬀerent organizations, and that knowledge can prove somewhat ‘sticky’. These observations, combined with the opportunity to seek formal protection of intellectual property through patents, show that knowledge can quite readily be made private and in several diﬀerent ways. There is also surely a strong sense in which knowledge does exhibit rivalry in use. Given the widely believed role of knowledge in economic growth and in competitive advantage, it seems obvious that access to knowledge for further knowledge production, or for the production of non-knowledge goods, grants those in possession an advantage over others. Thus the competitive context of knowledge production and use, within either university or commercial settings, clearly points to rivalry. That is why great eﬀorts are frequently made to keep knowledge secret. The third view proposes that the more knowledge is generic the more it approximates to the essential knowledge characteristics deﬁned above; and the more it is speciﬁc, and embedded in particular uses and objects, the easier it is to appropriate privately. But there are no sharp lines here, so no absolute inherent properties of generic-ness or speciﬁc-ness, a view most forcefully and persistently argued by Nelson (1989, 1990). Moreover the generic and speciﬁc are intricately interconnected, so further blurring boundaries.5 The question then becomes one of how the boundaries between public and private knowledge are set, given that there are no naturally occurring distinct species of knowledge. The important questions become achieving a balance between establishment of incentives to produce knowledge and exploit it commercially, which can rest on the opportunity to make it private, against preserving the non-exclusivity of knowledge, so important for its cumulative advance. Depending on institutional incentives and rules, the balance can tip too far either way, although Nelson himself has been increasingly concerned about the risks of growing opportunities for private appropriation impeding cumulative advance. However, although now some distance from the earlier essentialist view, there still remains a tendency to treat knowledge as a public good unless it has been privately appropriated. It still draws on the idea that generic knowledge has inherent characteristics gravitating towards the public pole, unless eﬀorts are undertaken to appropriate privately. Treating public knowledge in this default way implies a view of the capitalist ‘engine of progress’ with the private market mode as its dynamic component, for which public knowledge is only the fuel, somehow generated outside the central economic system.

8

Public or private economies of knowledge?

The Embodiment of Knowledge: Tacit or Codiﬁed The notion of tacit knowledge as proposed by Michael Polanyi (1967) rested on an empirical observation that people seem to know more than they articulate or are even able to articulate. The concept was further developed to provide a foundational component of Nelson and Winter’s (1982) evolutionary theory of economic change, where tacitness here extended to groups of individuals within ﬁrms. But it was Dasgupta and David (1994) who most forcefully considered how the knowledge embodiment issue connected to the public–private question, by contrasting tacit knowledge with codiﬁed knowledge. Almost by deﬁnition, tacit knowledge is private to the individual or group by virtue of experience, codes and representations not shared by others, hence incommunicable or at least untranslated for a wider constituency. Codiﬁed knowledge, on the other hand, is ‘knowledge that has been reduced and converted into messages that can be easily communicated among decision agents’ (Dasgupta and David, 1994, p. 493). On this basis, the related dichotomy of ‘information’ and ‘knowledge’ is posited, with the former embodied in a ‘codebook’ or other material or electronic substrate (books, graphics, electronic data and so on), the latter embodied in the human bearers of knowledge. There are then diﬀerent potentials for these two categories, respectively as either public information resources and private informational product markets, or public and private labour markets for bearers of knowledge. Indeed, in a later contribution, Cowan et al. (2000) go much further by insisting that uncodiﬁable knowledge is not very interesting for the social sciences. Here, the focus is on understanding the process of codiﬁcation in greater detail, promoting a view that enhancements in codiﬁcation will facilitate progress in knowledge accumulation and that modern developments in Information and Communication Technologies (ICTs) represent signiﬁcant opportunities in this respect. Yet these mappings of diﬀerent knowledge embodiments onto economic and organizational characteristics are rarely pursued to their logical conclusion. First, the dichotomy presumes that codiﬁed knowledge, once separated from the bearers of knowledge, is free ﬂoating in public or market space. Tacit knowledge alone is treated as residing in the bearers of knowledge, whereas codiﬁed knowledge circulates ‘out there’. But if codiﬁed knowledge is both ‘in there’ (the heads of knowledge bearers) and ‘out there’ (knowledge products) then the dichotomy is far from straightforward. In particular, the tradability potential is not so straightforwardly dichotomized between knowledge markets for information and labour markets for knowledge. It is quite revealing that Cowan et al. consider the strong case for tacit knowledge as undermining the case for a ‘public goods’

Making knowledge public and private

9

view of science and its funding. If essential scientiﬁc understandings remain tacit and uncodiﬁable, they no longer have the character of nonrivalrous use, extensibility or indivisibility. Tacit means private to an individual or collection of individuals, and these authors then take some pleasure in suggesting that, if codiﬁed knowledge can be traded as marketable commodities under certain conditions, tacit knowledge becomes more readily purchased through labour markets and the hiring of its bearers. They argue also that this kind of private knowledge makes redundant the need for intellectual property rights to protect against rivalrous use, as no one other than the tacit knowledge bearers have access to their knowledge (ibid., p. 222). Second, partisans of tacit knowledge, such as Johnson et al. (2002), develop the concept deploying diﬀerent meanings of private, in terms of distinctive aspects of interactive social situations within which tacit knowledge is generated.6 The tacitness of knowledge is its priviness to the social group – possibly to the individual alone. What is underdeveloped here is the potentially very diﬀerent levels of tacitness or, as Nelson and Winter (1982) have argued, the fact that tacitness implicitly is a relationship between individuals and large or small groupings of people with unequal distributions of knowledge, minimally, partially, or wholly shared. There are, therefore, even within tacit knowledge, diﬀerent communities – or publics – implied, contrasting with the supposedly ‘universal’ public of codiﬁed knowledge. Lastly, within this same argument, the fact that codiﬁed knowledge can also be ‘disembedded’ or detached from the bearers of codiﬁed knowledge does not thereby imply that its bearers are voided of it in the process. They continue to be bearers of codiﬁed knowledge as much as if it were tacit, and in neither case do they part with knowledge as an object of exchange. In sum, apart from operating within a static dichotomy of tacit versus codiﬁed, there is only a partial mapping, a patchy and unsystematic articulation between epistemic and economic organization. The ‘privateness’ of tacit and the ‘publicness’ of codiﬁed are particularly ill-described by an antithetical dichotomy. Types of Knowledge: Science versus Technology A third dimension to the debates surrounding public and private knowledge centres on types of knowledge. Again, there has been a long-standing debate, especially amongst historians of science and technology, about the diﬀerences and relationships between types of knowledge, which preceded the consideration of how types of knowledge might map onto the public–private issue. What is important for our purposes is the idea that diﬀerent types of knowledge imply a more or less natural division of

10

Table 1.1

Public or private economies of knowledge?

Stokes’s quadrants of science and technology (1997)

Fundamental understanding

Use goal: No

Yes

Yes

Pure basic (Bohr)

No

Birdwatching, amateur naturalists

Use-inspired basic (Pasteur) Pure applied (Edison)

Source: Stokes (1997, p. 73).

knowledge-producing labour in the economy. The key issue concerns what motivates particular knowledge-producing activities: to what ends they are oriented. Here the historic dichotomy pitted scientiﬁc knowledge oriented towards understanding against technological knowledge, oriented towards practical application. This ‘Two World’ view, particularly dominant in post-war science policy (Bush, 1945), saw scientiﬁc knowledge as the province of government support and technological knowledge as the business of ﬁrms. Implicit in this view was the idea that scientiﬁc knowledge could be translated into technological knowledge, even that the latter was derivative of the former. This underpinned the linear model of transfer of knowledge from public science institutions into useful knowledge developed by R&D departments in ﬁrms. Based on this reasoning, science was inevitably somehow public knowledge and technological knowledge was private knowledge. There was a natural and harmonious division of labour within capitalism. As is now widely recognized, the linear model that sees science translated into technology has been thoroughly discredited. One recent and prominent critique argued that much basic science is also use-oriented, so breaking with the antithesis between fundamental understanding and knowledge applied to use (Stokes, 1997). Pasteur’s scientiﬁc activity over a lifetime was held up as exemplary of this. But this argument is also articulated often with a discussion of, even policy justiﬁcations for, the resourcing of knowledge production whether scientiﬁc or technological, whether by governments or ﬁrms. Indeed, Stokes’s challenge to the prevailing paradigm of a dichotomized world of science and technology invoked a new contract between science and society, giving the justiﬁcation that much basic science was use-oriented, with anticipated beneﬁts to social welfare. This aimed to underpin a continued and expanding public resourcing of science, especially when driven by use orientation. He replaced the old dichotomy with four quadrants, each presented in terms of the goal orientation of knowledge. The question is whether we are better oﬀ with four worlds rather than two, given that Stokes’s quad-

11

Making knowledge public and private

rants are as taxonomic in spirit as the dichotomy he sought to replace, as is evident from his 2x2 representation of the four goals of knowledge. Nelson (1990, 2004), in an analysis that rests on and extends Stokes’s construct of a Pasteur’s quadrant,7 implicitly oﬀers a less rigid taxonomy, and one based on knowledge characteristics, whether scientiﬁc or technological, rather than goal-orientation. Boundaries become fuzzier and even disappear at the intersection of axes. Notably, as we have already seen, generic technologies are considered to be latent public goods, because they oﬀer wide and various potentials for use. Indeed, even if ﬁrms attempt to appropriate knowledge privately, it will often leak out as other ﬁrms ‘ferret out’ the details through reverse engineering and other techniques. But Generic

Far from utility

Close to utility

Specific Figure 1.1

Nelson’s axes of knowledge

Nelson essentially maps this characterization onto a dichotomized public/private world of the scientiﬁc commons (see Figure 1.1) and private intellectual property: the top left and bottom right quadrants, respectively. The main thrust of these arguments lies in establishing some fundamental distinctions that can help society establish which types of knowledge should be funded by governments and which by commercial organizations; and which should be privately appropriated and which should not. States and Markets: The Funding and Organization of Knowledge Production This dimension is the ﬂip side of the previous one. It concerns similar questions, but the issue is approached from the opposite direction. Dasgupta and David (1994) captured this by capitalizing Science and Technology. This is the distinction between knowledge produced under signiﬁcantly diﬀerent institutional settings, whatever types of knowledge they may be. Again, there are two worlds. In the most simple formulation, there is a normative assumption: knowledge production that is funded for governments and

12

Public or private economies of knowledge?

undertaken by universities and other ‘public research organizations’ should be public; knowledge produced by ﬁrms, funded through private capital, would be expected to be private. Thus we are presented with the distinction, widely used in general discussion, of public versus private sectors, or public and private organizations. In explanatory terms, the organizational and institutional dimensions take priority over the particular characteristics of diﬀerent types of knowledge. But there is more to it than simply the provenance of funding and the classiﬁcation of knowledge producing organizations. There is also the question of how each of the worlds is institutionalized. The implied private world of Technology, in its ideal-type formulation, involves proﬁt seeking (and therefore incentivized) ﬁrms seeking ownership and control over their knowledge resources through formal property rights. By contrast, the Dasgupta and David (1994) argument sets out the distinctive (largely Mertonian8) norms and rewards of the Republic of Science, especially that of reputation gained by priority to publication. These norms, it is argued, are the outcome of a fortuitous history of the exchange or bargaining relationships between patrons and scientists in sixteenth and seventeenth century Europe (David, 2001a).9 During this period there emerged a new process of production of knowledge, in which processes of validation, refutation and replication entailed the formation of a new community of scholars, and, indeed, new modes of communication between them. The knowledge is collectively elaborated on and improved, a critical social process of accumulation and growth of knowledge. These are the historically speciﬁc institutional norms of ‘open’ science and from this perspective it is further argued that public funding is necessary to sustain them. There might be two responses to this presentation of two worlds. The ﬁrst would be that they have never been as clearly dichotomized as the depiction suggests and the historical–institutional analyses certainly demonstrate that the worlds are contingent rather than transcendental. The second, which has been more resolutely pursued, is that there has been a progressive blurring and an increasing interdependence between the two worlds, so that the norms of the Republic of Science are now being undermined by a shift towards greater privatization.10 There has been a notable encroachment of private norms into Science. Firstly, the Bayh–Dole Act, 1980, shifted the legal regulation of IPR by encouraging university researchers to seek formal property rights over their knowledge, so breaking down the division between the Two Worlds (Mowery and Sampat, 2001; Nelson, 2004; Mowery et al., 2001; Mowery and Zeidonis, 2002). Secondly, this shift was reinforced by judicial decisions, notably by Madey v Duke University, 2002, that, post-Bayh–Dole, treated universities on the same footing as business enterprises with commercial interests.

Making knowledge public and private

13

But the Two Worlds were further confused from the other direction too. It has become increasingly common for ﬁrms to fund and undertake fundamental research, which falls within the norms of the republic of Science. Firms often publish their own results in the literature and increasingly fund research undertaken in universities. The most extreme version of this even suggests that, if market actors invest in the production of basic science, then this is proof perfect of public underinvestment in science (Eisenberg and Nelson, 2002).11 Knowledge and Property Rights: Commons, Enclosures and Anti-commons An overlapping but diﬀerent cut on the issue of economies of knowledge comes from the angle of property rights. The entry of the term ‘commons’ into the economies of knowledge debate was the rather peculiar one appearing in Hardin’s 1968 paper in Science. It relates to the need for population control in the context of ﬁnite earth resources, and he adopted the term ‘the tragedy of the commons’ (Hardin, 1968, 1998). Couched within the Demsetzian orthodoxy of property rights, its conclusion that ﬁnite resources can best be protected by individual property rights (in this case, the right to breed) constructed a vision of public property in terms of unrestricted access to, and consumption of, a naturally given and ﬁnite resource. In the absence of any one consumer bearing the full negative externalities of unlimited consumption, the inevitable result was a ‘tragedy of the commons’, and exhaustion or destruction of a common resource through overuse. This closely follows Demsetz’s argument that the optimal form of property rights can be found only with unequivocal, unambiguous, individual and private property rights (Demsetz, 1967). This initially created a clear dichotomy between private and public property. It is also a reason why, in this inﬂuential orthodoxy, public property is deﬁned as a residual: the absence of exclusion, or access ‘left open’, or ‘not enclosed’. This is coupled with the chosen prototypical asset as being land, a naturally given, unproduced good. From this starting point, it is diﬃcult to get to a positive notion of public good whether based on production or use of knowledge goods, by diﬀerent forms of social collectivities, with diﬀerent resource constraints. Indeed, it is interesting to note the asymmetry whereby private property is supported by legal rights, while any notion of public property in relation to knowledge relies, not on law, but on the considerably more fragile institutional arrangements described in the previous section. Subsequently, an eﬀective challenge to this view was mounted against considering knowledge as equivalent to the ‘commons’, so constructing other types of ‘public’ resources than the naturally given ﬁnite ones assumed by Hardin. For example, David (2001c) argues:

14

Public or private economies of knowledge? Information is not like forage, depleted by use for consumption; data-sets are not subject to being ‘over-grazed’ but, instead, are likely to be enriched and rendered more accurate, and more fully documented the more that researchers are allowed to comb through them. It is by means of wide and complete disclosure, and the sceptical eﬀorts to replicate novel research ﬁndings, that scientiﬁc communities collectively build bodies of ‘reliable knowledge’.

Although not developed in this direction, it is noteworthy that this argument refers to the process of production of knowledge, rather than its outcome in public goods products. Through this route, however, the argument points to there being diﬀerent types of public property, but without really pursuing the issue any further. For example, as ‘commons’ is a commons of use-rights, it is not at all clear how diﬀerent uses of knowledge involve diﬀerent types of rights over it. Nonetheless, the term ‘scientiﬁc commons’ slipped through the back door into the discussion, more by virtue of saying why it was not like commons of public grazing rights, than positively what the commons is like in terms of collective, social or public property rights. The knowledge commons is simply ‘out there’, to be contributed to, or accessed, at will. A second new direction taken in this debate was the proposal of a mirror image of a ‘commons’ , the anti-commons, also developed in relation to a very diﬀerent context, the transformation of socialist into market economies (Heller, 1998; Michelman, 1982).12 The concept of an anticommons mirrors that of the commons, in that there are multiple owners, but instead of shared rights to unlimited access and use of the commons, each is endowed with rights to exclude others with regard to diﬀerent aspects of a ﬁnite resource, without any hierarchy or coordination among the many owners (Heller, 1998, p. 671). The result is the opposite of the tragedy of the commons, namely one of underuse of a resource rather than overuse. As with the tragedy of the commons, in its original context,13 Heller pursued the ineluctable Demsetzian logic, with the suggestion that unequivocal hierarchical decision-making over use can only be eﬀectively achieved through clear formal laws of private property. But with respect to knowledge, and in collaboration with Eisenberg, the argument takes the opposite tack, namely that partial and multiple private rights over the knowledge commons creates obstacles to its full public use and development (Heller and Eisenberg, 1998). It is the balkanization of science by private property that impairs the full beneﬁts of a public commons. However, even this discussion did not lead to a more positive conception of a ‘commons’, merely that enclosures could be detrimental. The discussion of property rights discussed here does not make enough out of how diﬀerent rights might or might not be bundled together (Honoré, 1961). To open this up a little, we can consider diﬀerent aspects or dimensions

Making knowledge public and private

15

of property, notably rights to use, access, exchange and the exclusion of others from each or all of these. These can all be bundled together – under unambiguous hierarchical decision-making powers – or they can involve diﬀerent aspects of what is normally considered private or public. A patent, for example, with rules of full disclosure, gives rights of access separate from rights to use, and accords to its owner a right to exchange. Roads might be considered a typical ‘commons’ with attendant tragedies of overuse in Hardin’s or Heller’s sense, until one considers rules and restrictions on use and access, and the particular ‘publics’ constituted by them, such as qualiﬁed drivers, legitimate vehicle owners, pedestrians obeying highway codes. The point here is that property rights have diﬀerent dimensions, which play a key part in deﬁning the boundary between the public and private.

DISJOINTED DIMENSIONS The previous discussion gives us some important building blocks for thinking about the public and private character of knowledge, but it leaves us with a problem: how do these diﬀerent dimensions stack up? Is it possible to layer the dimensions on top of each other to produce two multidimensional worlds of public and private knowledge? The answer is no. It might be thought that there is an easy alignment between fundamental science, public good, public resources, codiﬁed information and scientiﬁc commons, on the one side; and applied technologies, private goods, market ﬁnancing, tacit knowledge and private or anti-commons property, on the other. In this case, there would be a consistency across the dimensions of types of knowledge, economic good, modes of ﬁnancing, knowledge embodiment and property rights. But this consistency has not been pursued, and it already appears clear that it would be diﬃcult to sustain. Within each of the debates, there is also something important in the tendency to progress from depictions of binary opposites to accounts of blurring and interdependence. We think that there are good reasons why the debates have moved along in this way. The interdependencies between public and private, however they have been deﬁned or implied in each of the dimensions, is indeed complex and variegated. The bioinformatics revolution demonstrates these complexities particularly well, indicating that turbulence in knowledge produces turbulence in the public–private relationship as deﬁned in each of the dimensions discussed. In thinking about types of bioinformatics knowledge, we would be hard pushed to locate it as either science or technology or even in Pasteur’s quadrant. Diﬀerent bioinformatic knowledge resources, bio-data or algorithms, can be all three. Indeed, we will show how, over time, speciﬁc knowledge

16

Public or private economies of knowledge?

resources migrate through quadrants, change their use orientations and even the degree to which they can be considered speciﬁc or generic. Concerning knowledge embodiment, it is not so much the distinction between tacit and codiﬁed that is important but, for example, the proliferation of ‘codes’ creating a Tower of Babel (Attwood, 2000) and the need to create yet new meta-languages that enable translation and communication between otherwise uncommunicative parties. In this respect, ‘the naming of parts’ – deciding how to designate biological entities – is as much about knowledge production as about communication (Oliver, 2002). We could make the same point about the other dimensions, and the empirical analysis in this book will do just that. For now it should suﬃce to say that our own position is that the public and private characteristics of knowledge are emergent properties, irrespective of the dimension considered. The framework that we develop in this book places its primary emphasis on processes of diﬀerentiation and interdependencies between the public and private aspects of knowledge. For any speciﬁc knowledge resource it is always possible to specify distinctive public and private characteristics, but we will draw attention to the processes through which these characteristics change over time and how they do so in relation to each other. In our analysis, we attempt to incorporate the insights of these various debates, and not to lose sight of the diﬀerent dimensions. But to develop a coherent and integrating framework, we choose a diﬀerent starting point, an economic sociology approach to economies of knowledge.

THE ‘INSTITUTED ECONOMIC PROCESS’ (IEP) APPROACH The approach we adopt here originates from the work of Karl Polanyi, and in particular his concept of economies as instituted economic processes (Polanyi, 1957).14 Here, the framework is considerably adapted and developed to address economies of knowledge, and analyse the evolving societal divisions of labour in contemporary capitalism between public and private modes. The starting point of the analysis is the articulation between four transformational processes: transformations respectively of qualitative characteristics, location in time and space, control, and functionality (Harvey, 2007a). These transformational processes underpin the more conventional distinctions between production, distribution, appropriation (exchange) and consumption. By viewing these as historically instituted processes, with no initial deﬁning characteristics, inspired by an anthropological stance towards empirical realities, an IEP approach is open to exploring a variety of instantiations and articulations between these processes. In

Making knowledge public and private

17

that respect, it is ideally placed to grasp formative moments and shifts in the development of capitalist economies, as well as to appreciate and recognize the distinctiveness and signiﬁcance of knowledge with them. By seeing markets or diﬀerent redistributive public modes of provision as outcomes of historically instituted processes, an IEP approach emphasizes variety and interconnectedness between diﬀerent economic modes. There is no presumption of systemic coherence or integration, let alone equilibrium, and this in turn makes crisis and conﬂict, as much as stability and continuity, central to any dynamic explanation. In a preliminary way, we can indicate what we mean by public and private in relation to knowledge within the IEP framework, although this will be elaborated and deepened especially through consideration of our historical material. The terms have been used in many diﬀerent ways, to the point that discussions become confused and often appear to be at cross-purposes with each other. When we use the terms ‘public’ and ‘private’, we are referring speciﬁcally to the question of how rights, opportunities and constraints to access, use and exchange knowledge become instituted. This is not just a question of formal property rights, though they are important, but also of how economies of knowledge are instituted in speciﬁc ways at particular points in time. The organization of economic processes impinges directly on how knowledge is made public or private in terms of access, use and the possibility for further exchange. To maintain a consistency in our use of the terms we will avoid descriptions of organizations, sectors or funding sources as public or private. That is not to say that the respective roles of ﬁrms and universities, or states and markets, are unimportant. They are central to our analysis, but we are developing a framework that refuses to assume any deﬁning connection between these institutional arrangements and the public and private characteristics of knowledge. We frame our analysis of public and private knowledge by raising an important prior issue about the emergent splits between production and use that accompany advances in knowledge. These splits arise from the inevitable processes of epistemic specialization and diﬀerentiation that accompany the growth of knowledge. And the emergent divisions (of labour) between production and use are coordinated by instituting processes of knowledge distribution and exchange. This is how we deﬁne an ‘economy of knowledge’: the speciﬁc articulation of the processes of production, distribution, exchange and use of knowledge. It provides us with a lens to analyse the diﬀerent types of organization involved in knowledge production and use, the variety of diﬀerent ways that knowledge production and use are funded, the mechanisms by which knowledge is disseminated and the alternative modes of appropriation that play a key role in deﬁning how knowledge is exchanged through market or non-market institutional

18

Public or private economies of knowledge?

arrangements. We use this framework to attack the question of how knowledge comes to have both public and private attributes and to explain how the balance between them shifts over time with transformations in the production–distribution–exchange–use complex. It also frames our analysis of the mechanisms that produce a myriad of diﬀerent forms of public and private knowledge. The following discussion elaborates on this framework for conceptualizing the ways that turbulence in knowledge creates turbulence in the economy.

ECONOMIES OF KNOWLEDGE: INSTITUTING PRODUCTION, DISTRIBUTION, EXCHANGE AND USE The structure and content of scientiﬁc disciplines and technological ﬁelds are subject to constant ﬂux with the continued production of new knowledge. Over the last half-century, biology has undergone a major transformation. The elucidation of the structure of DNA and the breakthrough that connected this structure to the mechanism by which living organisms pass genetic information to their oﬀspring proved to be the starting point of a revolution in biology that has seen it shift progressively towards becoming an information science. The invention of sequencing technologies, ﬁrst for the amino acid sequences that comprise proteins and then the nucleic acids that comprise DNA molecules, provided the second key impetus for this revolution. The sequences produced by these new technologies provided a new basis for molecular biology and a new set of epistemic practices built around novel experimental techniques, including new relationships with the hitherto only vaguely connected ﬁeld of computational science. These transformations are clearly examples of what Kuhn (1962) meant when he referred to revolutions in science and the formation of new paradigms. But more important for our analysis is how these advances added up to an emergent specialization within the ﬁeld of biology and how such specialization produced a new division of labour. The question of how processes of specialization and diﬀerentiation in knowledge map onto emergent divisions of labour in the production and use of that knowledge is a key to the analysis in this book. The emergence of new diﬀerentiated types of knowledge, notably the deluge of bio-data and related software scientiﬁc instrumentation, opened up these issues in quite a dramatic way. How was this new type of knowledge to be distributed and exchanged between new groupings of producers and users? Was it to be public and, if so, what new institutions of public knowledge would evolve? Was it to be

Making knowledge public and private

19

traded or private, and again, if so, what impact would this have on existing knowledge property institutions and appropriation practices? An argument developed throughout this book is that the mapping between the epistemic and the economic organizational is never straightforward. For example, from an epistemological perspective, recent advances in applying computational techniques to biology have been described as a new specialism in ‘dry’ biological science that has emerged as diﬀerentiated from traditional ‘wet’ biology. The extent to which this distinction between wet and dry maps onto a new social division of labour is an empirical one and the interesting questions relate to the processes through which new organizational arrangements have been instituted in order to accommodate the new modes of biological science. For now, it should suﬃce to say that, on the one hand, there have been some quite novel organizational and institutional departures, with the formation of new university departments, new institutions, new journals and academic societies devoted entirely to the new dry science. On the other hand, there are many instances where the divisions of labour have been cut in diﬀerent ways, so that dry and wet biological sciences coexist and distinctions and specializations formed according to diﬀerent criteria. Here we can introduce some of the themes that follow from this IEP framing position, themes that will become the focus of our empirical analyses in later chapters and will eventually come together within the synthetic framework articulated in the concluding chapter. We start with a consideration of emergent diﬀerentiations between the production and use of knowledge, to establish a context for analysing the knowledge distribution and exchange processes that structure the divisions and interdependencies between public and private aspects of knowledge. An overarching trend that accompanies the growth of knowledge involves the emergence of new divisions of labour leading to multiple diﬀerentiations between production and use. Within a new division, knowledge is used to produce further knowledge of the same or similar kind. But this also leads to new complementarities within the branch of science, also a knowledgeproducing activity, but one that makes use of the new specialism. In turn, both these developments open up new opportunities for the use of knowledge for the production of something other than knowledge, like a pharmaceutical or new crop variety. If we take proteomics as an example of a new division of labour, new proteomic knowledge is at once a basis for further exploring the reactions that proteins or their close relatives undergo in diﬀerent metabolic conditions; and provides new inputs to entirely diﬀerent branches of biology, evolutionary and developmental; and can be deployed to advance understanding of how a particular protein acts on the human body and so lead to applications for therapeutic purposes.

20

Public or private economies of knowledge?

But increasing specialization within the knowledge sphere can also lead to synergies forged between previously autonomous ﬁelds. The increasing use of computational science in biology that underpins much of the epistemic transformation at the core of this analysis is an obvious case in point. In one sense we can view the molecular biology community as simply users of computational knowledge. Pre-existing computational knowledge is used in an oﬀ-the-shelf manner to conduct in silico biological experiments. The outputs are new biological knowledge: nothing is added to computational science. In these instances, the molecular biologists need only a limited understanding in the computational techniques employed: when they should be used, how to use them and how to interpret the results – but not necessarily how they work. Bioinformatics typiﬁes all of these dimensions of new divisions of labour within biological knowledge, and for this reason is at the core of our empirical research presented in the book. Centrally involved in production of new concepts and data, it has become a specialism within biology, as well as feeding into almost all of biology’s sub-disciplines. At the same time, it has developed new synergies at the interface between molecular biology and computation. In this emergent discipline the outputs of knowledge production are hybrids of biological and computational knowledge. On the one hand, there are novel applications of information science to the biological realm that provide new information-based understandings of how cells and organisms work. On the other hand, bioinformaticians produce new tools for producing and analysing bio-data. These tools are the objects of knowledge production for the expert bioinformatician community, but instruments for knowledge production for the wider molecular biology community. In times of rapid advancement, new types of knowledge of this kind lead double (often multiple) lives. Again, the distinction is between epistemic practices involved with advancing knowledge related to the bioinformatic tools themselves and those involved with using the tools in the service of producing biological understanding, where detailed knowledge of how the tools work is not required. This form of diﬀerentiation in use is perhaps quite common in the case of research tools, as the simple example of the microscope illustrates. On the one hand, there is a knowledge accumulation circuit between production and use as successive waves of invention build on (that is, use) existing microscope-related knowledge (such as optics) to improve their performance. On the other hand, microscopes can be widely used in research to study the characteristics of small entities. Here, the knowledge produced through using the microscope is not concerned with how microscopes work, but with the ﬁeld to which their functionality is applied, say the study of micro-organisms. In this case, the user does not need to have a deep understanding of the properties of light or of optical

Making knowledge public and private

21

engineering, just knowledge of how to operate the microscope. In the early days of new technologies of microscopy, the institutionalization of sharp divisions between diﬀerent groupings of users and producers were yet to be formed and, we shall argue, in other kinds of scientiﬁc instrumentation may never be formed in the clear-cut model of the optical microscope. This discussion presents some quite fundamental issues relating to the way that growth in knowledge, accompanied by growing specialization, creates new divisions between the production and diﬀerent uses of knowledge. There are no pre-given or ‘natural’ ways that new epistemic divisions of labour map onto economic ones. The historical evidence we will examine suggests that new epistemic practices and products may well have multiple uses. We call this ‘polyvalence’. Furthermore, there may be no single economic organizational mapping, but competing alternative models. Indeed this is exactly what we see. Even when a clear division does emerge between a speciﬁc knowledge production and a potential use – a precondition, we shall argue – for the formation of new markets for knowledge, there is no inevitability that markets will emerge, and no pre-given code determining how such markets may be organized, and around what shape of knowledge product. On the one hand, bureaucratic or professional arrangements can be instituted to facilitate coordination between knowledge producers and users within organizations and disciplines. On the other, diﬀerent possibilities exist for conﬁguring knowledge production–use relationships for market exchanges. If we take knowledge of a speciﬁc therapeutic target for a speciﬁc disease as an example, we can readily see at least three possibilities. First, a drug target might be identiﬁed in the R&D department of a pharmaceutical ﬁrm, and used in that ﬁrm to develop a drug, which is then manufactured for sale by that same ﬁrm. Second, the target might be identiﬁed in an academic laboratory, published in the scientiﬁc literature and available for the interested scientiﬁc community to research further or for pharmaceutical ﬁrms to use to follow the path of the ﬁrst route. In the third possibility, the target is identiﬁed by a small biotechnology ﬁrm, which subsequently licenses the knowledge to a pharmaceutical ﬁrm for further research and development. The diﬀerences between these alternatives are examples of the variety of ways that knowledge can be distributed and exchanged even within a market economy. Moreover, these diﬀerent market organizations may compete with each other, some proving more viable than others. So the issue of emergent divisions between production and use raise questions of how knowledge is distributed and exchanged: these are the processes that account for the way that knowledge is transmitted from producers to users and circulated more generally within the economy. Distribution and exchange processes are closely intertwined but should be

22

Public or private economies of knowledge?

treated as analytically distinct, and clearly inﬂect diﬀerent dimensions of ‘publicness’ and ‘privateness’. Distribution is concerned with the instituting of processes for disseminating knowledge across space and time, between producers and users. The possibility for the distribution of knowledge presupposes the existence of a shared language, but the existence of such a language cannot be taken for granted. Indeed, one of the major challenges facing emerging epistemic domains is the creation of a language to support the circulation of new knowledge. We can recall the earlier quotation that deﬁned the important task of the early genomic era as the ‘naming of parts’, to illustrate the point. The establishment of an international Gene Ontology Network15 was motivated by a need to create common categories in cell biology to ensure the interoperability of key knowledge resources emanating from the rapidly growing genomic and proteomic sciences. Without the existence of common standards, there were fears that much of the eﬀort would be wasted. The extent to which a lingua franca exists in a ﬁeld can inﬂuence signiﬁcantly the extent to which new knowledge can be circulated between producers and users. In this respect, code formation as a dissemination process is a key aspect of distribution. Distribution also relies on the formation of communication channels between producers and users. In the simplest case, knowledge can be transmitted through conversation or through participation at academic conferences. Scholarly journals provide another very important vehicle for dissemination, more formally institutionalized, with rules governing acceptance for publication and control over what is placed in articles and what is not. Indeed, an important consideration in what is made public concerns the fact that the producers of knowledge inevitably know more than they are able to articulate in codiﬁed outputs. This is certainly not a question of tacit knowledge that is diﬃcult to codify, but owes more to conventions governing what is appropriate and necessary to communicate and what is not. Patents also provide a vehicle for communicating knowledge, but, as we will discuss in more detail below, this form of dissemination is stimulated by the allocation of temporary legal property rights to the knowledge producer. Insofar as both oﬃcial journal publication and patents entail issues of control over distribution and access, as is typically the case, they represent speciﬁc (and mutable) articulations between distribution and appropriation, which will prove central to our analysis. Another key mechanism for distribution, particularly important for our current study of genomics and bioinformatics, concerns the ways that data sets are made available. A data set represents a quite diﬀerent knowledge resource compared to an academic article, for example, where it is assumed that it is valuable to potential users with various levels of analysis, varying from

Making knowledge public and private

23

‘raw’ data to quite analytically elaborated information. Users can exploit the data for their own further knowledge production activities, as well as for numerous other purposes. So there are important questions relating to what is disseminated and, as we will see in the detailed examples in this book, there are key questions regarding when knowledge is made available. The contemporaneous revolution in information and communication technologies which accompanied the emergence of bioinformatics, and in particular the widespread availability of access to the Internet, draws attention to another aspect of distribution. Signiﬁcant innovations in the media of knowledge dissemination exert a major force on the instituting of economies of knowledge. The development of new ICTs has been especially important for recent advances in biology because of the new emphasis on large-scale data production. These changes to the technologies of distribution have echoed earlier transformations, such as the printing press, for their impact on knowledge production, distribution and use. Today, it would be hard to imagine the ﬁeld of genomics without the Internet and certainly the existing economies of knowledge would not have been possible without it. All forms of distribution involve the simultaneous instituting of an appropriation process, which involves transformations in the ownership and control of knowledge. The formation of particular modes of appropriation govern how rights over the knowledge resources are established and how they are exchanged between the producers and users of knowledge. The importance of legal institutions, such as those relating to private property rights, are important in this respect, as we shall discuss below, but here we focus on the diﬀerent forms of market and non-market exchange processes that can be instituted in ways that control the circulation of knowledge in the economy. Market appropriation of knowledge can usefully be analysed in terms of three types of exchange process. First, there are labour markets which are instituted for the recruitment of skilled knowledge workers to ﬁrms and universities. In emergent knowledge ﬁelds, these markets provide a key mechanism for the circulation of knowledge between diﬀerent organizations. What is exchanged in these circumstances is not the knowledge that these skilled personnel possess per se, but rather rights to the results of their knowledge-producing eﬀorts under the employment relation.16 In knowledge ﬁelds such as bioinformatics, recruitment of skilled scientists and technicians is a major channel for ﬁrms to acquire new knowledge capabilities, a substantial proportion of which have been developed in university training programmes. Furthermore, in such a rapidly growing ﬁeld, perceived skills shortages can create signiﬁcant tensions between the preservation of a suﬃcient critical mass of expertise within the university sector and the growing demand for the same personnel in industry. Within the university

24

Public or private economies of knowledge?

sector too, there is competition for knowledge resources through the ‘public sector’ labour market, and varying institutions and resources condition that competition. The prevailing structures of academic and commercial labour markets in most countries dictate that these circumstances can lead to a migration from the former to the latter. Capital markets provide a second focus for looking at the market-based knowledge exchange processes. This category includes venture capital funding, stock markets, mergers and acquisitions and the creation of new ﬁrms from university research groups. In these circumstances, rights over the knowledge-producing capabilities of a coherent group are exchanged along with formally owned (that is, through patents) or collectively shared knowledge resources. In addition to these primary knowledge resources, capital market exchanges also involve rights over facilities, distribution channels and branding that might be associated with the group in question. So, while capital markets are similar to labour markets in respect of providing arrangements for the exchange of knowledge resources through the movement of knowledge bearers, they also involve a number of other dimensions. Furthermore, capital markets diﬀer signiﬁcantly from labour markets in the way that the relationship between buyers and sellers is instituted. Finally, the question of how knowledge itself can be directly exchanged as a tradable good comes sharply into focus when considering product markets. Recent advances in the genomic and bioinformatic ﬁeld have been accompanied by numerous attempts to create tradable knowledge resources, including genomic and proteomic databases, sequencing technologies and bioinformatic analysis tools. In addition, the knowledge produced by using these knowledge resources has also been traded, therapeutic targets being a signiﬁcant example. The focus for our analysis is on how knowledge resources are packaged for sale and the nature of control and rights over their use. There is great diversity in the ways that market exchange processes can be instituted for knowledge resources. For example, buyers and sellers can enter into long-term contractual agreements based on a growing knowledge resource, such as successive versions of a software package or releases of a database. Alternatively, the exchange might be based on a one-oﬀ transaction. Furthermore, knowledge resource exchange processes are diﬀerentiated according to whether they involve rights over access to knowledge, use of the knowledge or rights to trade the knowledge on to third parties. These are only some of the ways that knowledge resources are exchanged in the market mode, demonstrating how important the institution of exchange processes is for controlling the ﬂows of knowledge between producers and users. But we also need to consider how rights over knowledge resources are instituted outside of the market mode and relating to some notion of public

Making knowledge public and private

25

or collective rights. Thinking about public control or rights over knowledge resources is not straightforward and the general tendency has been to assume public appropriation in the absence of private appropriation. Public rights over knowledge resources are perhaps best conceptualized through thinking about the institutions of open science, described by David and discussed earlier in this chapter. But that discussion focuses on public distribution, speciﬁcally on how knowledge is circulated and not on how rights and control over those knowledge resources become instituted. Creating access to knowledge through dissemination is not the same as establishing rights over access, use and further exchange of knowledge. In contrast to private legal rights, it is important to note that public control is not directly instituted through formal regulatory mechanisms. Indeed, as we shall see, in the ﬁeld of bioinformatics there have been several notable instances when private legal property rights (patents and copyrights) have been used strategically to create public control, by defending the knowledge against third party attempts at private appropriation. It is interesting that Eisenberg (2006) draws attention to various modes of instituting public control, such as the US National Research Council’s reform of the rules for public deposition of data for all its funded research, following Celera’s breach of normal codes of journal publication (Marshall, 2002; National Research Council, 1999, 2003). But, coming from a legal perspective, this does not lead to a general consideration of institutions of public control. This conﬁrms the somewhat fragile institutional structures for public appropriation, which are based more on presumed norms than on legal mechanisms. In addition, these presumed norms are themselves closely tied to the issue of ﬁnancial resource ﬂows. The dominant societal mechanism linking ﬁnancial resources to public appropriation involves governments. Under these arrangements, knowledge production and distribution are ﬁnanced through state funding, itself drawn from taxation, on behalf of actual or potential knowledge users (including specialized groups of users who are themselves involved with further knowledge production or, more broadly, to the potential beneﬁt of society at large). These economic arrangements which, following Polanyi, constitute a redistributive resource model, govern the majority of knowledge production in university-based research. There is a presumption, variously institutionalized in diﬀerent settings, that the fruits of this state-funded research should be disseminated as widely as possible, also creating a collectively owned and controlled knowledge base for society. However, it is frequently very unclear what these rights over government-sponsored research involve. Do they involve bundled rights over access, use and further exchange or can these rights be instituted separately? Who holds the rights? Is it the public at large, speciﬁc (say, national) publics, the universities, or is it governments?

26

Public or private economies of knowledge?

These are questions that we will seek to clarify through careful analysis of empirical material, but one thing is clear from the outset: these arrangements are not set in stone and have been subject to signiﬁcant transformation throughout history. As we have seen, the 1980 Bayh–Dole Act in the USA, for example, profoundly shifted the presumed connection between government funding and public rights over the knowledge produced, by encouraging university researchers to seek formal property rights over their knowledge outputs through patents (Nelson, 2006; Powell and OwenSmith, 1998; Mowery et al., 2001; Mowery and Sampat, 2001; Mowery and Zeidonis, 2002). But there is more to public appropriation of knowledge than the science–society contract instituted around government funding of research. We need to account for the often highly signiﬁcant instances when non-governmental organizations, including ﬁrms and charities, intentionally fund the production and public appropriation of knowledge. Not only has it been common for some time for ﬁrms and charities to support university research, but also for ﬁrms to place knowledge from their own research activities under public control. We mentioned the case of Merck’s gene index earlier, but that is just a high-proﬁle example of many instances where ﬁrms actively engage in publication of results and sharing of data within research communities. Although not so obviously redistributive as when taking the form of government grants funded by taxation, the diversion of resources by companies from commodity-producing activities that result in public, non-traded, outputs are a form of internal bureaucratic redistribution. There is much evidence, presented later, that there are circumstances when private enterprises have strong incentives to act in this redistributive way for ‘the public good’. This is too general a phenomenon to be treated as a philanthropic or commercially devious aberration. The key point here is that public rights over knowledge resources should not be assumed to follow directly from government sponsorship of research. It is not a question of two clearly deﬁned worlds. So, in the IEP framework we are proposing, analysis of the speciﬁc ways that knowledge distribution and exchange/appropriation processes are instituted provides an important step in explaining how knowledge is circulated within the economy, between producers and users. It is also an important ﬁrst step in explaining how we understand the processes through which knowledge resources come to have public and private characteristics, deﬁning the extent of accessibility to knowledge and the transfer of rights and control over use and further exchange between producers and users. But it should be obvious already that there is a further layer to add to this framework, namely the inﬂuence of diﬀerent institutional arrangements, legal or normative, that provide another dimension to thinking about the

Making knowledge public and private

27

processes through which the public and private characteristics of knowledge are constructed, and it is to this that we now turn.

LEGAL AND INFORMAL INSTITUTIONS OF APPROPRIATION Legal institutions such as those of contract, property rights and incorporation have recently been argued to be constitutive and enabling of economic behaviour (Swedberg, 2003; Hodgson, 2007), following a Weberian tradition of economic sociology (Weber, 1978). Legal concepts categorize and deﬁne both economic agents and their acts. However, a coevolutionary approach (Deakin and Wilkinson, 2005) suggests a more complex picture, where legal and economic institutional processes are seen to interact, each with their own speciﬁc modalities: laws are changed in ways diﬀerent from economic organizations and behaviours. They can frequently become out of kilter with each other, so stimulating interdependent processes of variation and selection of legal as well as economic forms. In the sphere of intellectual property rights, the biological scientiﬁc and technological revolution has thrown up new questions about how to deﬁne utility, the naturally occurring as against the human artefactual, novelty and discovery. As Deakin and Wilkinson argue, in a context of continuous economic transformation, ‘the process of legal change . . . is indeterminate and open-ended’ (Deakin and Wilkinson, 2005, p. 31). Patents and copyrights are designed to confer temporary property rights on the inventors of new technological knowledge, playing a prominent role in establishing economic incentives for the production of knowledge. At the same time, they also place an obligation on inventors to make the knowledge available for wider scrutiny. Consequently, they represent a contract between inventors and society that provides an opportunity for inventors to proﬁt from their inventions in return for publishing the details of the technical advance they produced. They also deﬁne what is not privately appropriable, so are institutions central to deﬁning the boundaries between the public and private domain, rather than simply establishing private property rights. These legal institutions have been designed to produce simultaneously a public and private dimension to the new knowledge. In addition to turbulence in knowledge and economy, there has been a parallel and equally signiﬁcant turbulence in legal institutions. They evolve in relation to the emergence of new knowledge and their associated economies of knowledge and they do so in two distinct ways. First, the formal rules governing what can be patented or copyrighted are subject to legislative change and this has been highly prominent in the recent period

28

Public or private economies of knowledge?

for new biological knowledge. The second mechanism involves the shifting interpretation of existing frameworks by patent lawyers in their judgments over the precise details of claims to be covered and in any subsequent legal disputes over infringement. New knowledge can frequently cause confusion over its eligibility for formal property rights as, for example, with the case of DNA sequences. There are also diﬃcult questions concerning how broadly the claims should be constructed. Over time, and with experience, new forms of consensus emerge in the community over these matters, but they are always subject to potential revision. Signiﬁcant changes in this respect inevitably arise in situations where the current rules are perceived to compromise the contract between society and inventor, creating unstable and undesirable asymmetries of advantage in the economies of knowledge. In this sense, we can see how these legal institutions of appropriation exert a force on the ways that economies of knowledge are instituted and vice versa. But in analysing institutions of appropriation, we mean something wider than formal legal instruments. We include under this umbrella term the range of common practices and norms that are adopted by organizations to appropriate knowledge. These include practices adopted in the use of legal institutions such as patents and copyrights. Such practices are subject to signiﬁcant change regarding what knowledge resources should be patented within strategies for making patent applications. But it has been widely documented that organizations employ other strategies for appropriating knowledge (see Levin et al., 1987, for a major survey). Secrecy is one possibility, where knowledge is retained with the organization for the production of further knowledge or for the development of tradable products. In the case of computational knowledge, this form of appropriation can be further enforced through the use of technical measures, such as encryption and ﬁrewalls. And, in other circumstances, ﬁrms make no attempt directly to restrict access to knowledge, relying instead on the competitive advantage that can come from being ahead of the rest or from the possession of complementary assets required to make proﬁtable use of the knowledge (Teece, 1986). Even when a formal court decision is taken, notably the Madey v Duke decision, which appeared to restrict the rights of universities to use patented knowledge for experimental and academic purposes except under licence, in practice universities and patent-holding ﬁrms continued to treat the patented knowledge for experimental use essentially as a free public good (Eisenberg, 2006). When these practices become widely adopted they can usefully be thought of in terms of institutions of appropriation, contributing to co-evolutionary change. Furthermore, just as the formal and legal institutions evolve interdependently with economies of knowledge, so too do these less formalized ones.

Making knowledge public and private

29

So far, much of this discussion has centred on how the private aspects of knowledge are constructed through the interplay of economies of knowledge and institutions of appropriation, but a key concern of our analysis involves an appreciation of the mechanisms through which the public attributes of knowledge are created. We believe this to be a signiﬁcant contribution to the relevant debates, in particular by providing a framework for uncovering the processes that are involved in creating public knowledge. There appears to be a strong current within existing studies that assumes the existence of public knowledge, where there has been no private appropriation. It is strongly implied that public knowledge is the residual of the stock of knowledge that has not (yet) been privately appropriated. This is an inadequate depiction of the public attributes of knowledge. Just as the private dimensions of knowledge arise out of the speciﬁcally instituted economies of knowledge, so too do the public attributes. Furthermore, just as there are institutions of private appropriation, so too are there institutions of public appropriation. The academic journals of learned societies and professional organizations are one such institution. There are distinctive rules governing what is published, and journal criteria and peer review processes play a key role in determining the quality of knowledge that is disseminated through these channels. Databases provide another means for disseminating knowledge resources and they have become especially important and highly controversial with the emergence of large-scale genomics and proteomics. The precise form that they take and the arrangements that are instituted to control what is deposited and what can be accessed clearly inﬂuence the public characteristics of the knowledge that constitutes them. One particularly interesting development that emerged from our study of the bioinformatics ﬁeld has been the use of copyright law to create a form of public appropriation. Developed initially in the wider software community, but taken up swiftly in the bioinformatic sphere, the user licences of copyright law, for example the General Public Licence (GPL), have been reformulated to institute unrestricted rights for users to access, modify and further disseminate (providing these rights remain intact) the source code of computer applications. A further example is provided by the Genetic Association Information Network, supported by the Foundation for National Institutes of Health. Control over access is exercised through a GAIN Data Use Certiﬁcate, granted on condition that data cannot subsequently be subject to private appropriation through patenting.17 The analysis of evolving institutions of public appropriation are thus of equal importance to those of private appropriation. This analysis leads us to consider the multiple forms that public knowledge can take, supported by alternative institutions of appropriation, rather than viewing public appropriation as an absence of exclusion by private property rights.

30

Public or private economies of knowledge?

PUBLIC AND PRIVATE ECONOMIES OF KNOWLEDGE We are now in a position to present the IEP framework succinctly. Economies of knowledge are constructed through speciﬁc conﬁgurations of production–distribution–exchange–use processes. A variety of diﬀerent types of organization are involved in the production and use of knowledge, with alternative channels for the circulation of knowledge and mixes of market and non-market arrangements for their coordination. It is within this framework that the nature of knowledge and ﬁnancial resource ﬂows between diﬀerent agents is understood. The ‘publicness’ and ‘privateness’ of knowledge is the outcome of ongoing processes of diﬀerentiation between producers and users and the coordination of knowledge and resource ﬂows through distribution and appropriation. The evolving societal division of labour, often conﬂictual at the lines of divide, also entail new interdependencies. The many diﬀerent dimensions of ‘publicness’ and ‘privateness’ referred to in previous debates are captured, we believe, in the IEP framework, but in a way that now articulates them more coherently with each other. Economies of knowledge are dynamic: the perpetual tensions between the public and private aspects of knowledge exert a pressure on the organization of production, distribution, exchange and use processes, leading to their reconﬁguration and to new ﬂows of knowledge and ﬁnancial resource. The public and private characteristics of economies of knowledge are further shaped by institutions of appropriation, both formalized and legal or informal and normative. These institutions are also subject to change over time, partly in response to the dynamics of economies of knowledge. Economies of knowledge and institutions of appropriation have their own internal evolutionary dynamics, but also evolve in relation to the other. The empirical cases described in this book highlight the signiﬁcance of instituting public characteristics of knowledge. Rethinking public knowledge in this way then requires a rethinking of private knowledge, since it follows from the position we have taken that the two are mutually and simultaneously constructed in relation to each other. In this sense, the opportunity to create markets for tradable knowledge products relies signiﬁcantly on whether equivalent public knowledge exists or might be expected to in the near future. As our empirical cases show very clearly, when governments are slow to invest in particular knowledge resources, new ﬁrms and markets appear; when governments do provide investment in those same knowledge resources, the markets dwindle and ﬁrms fail or reinvent themselves. A deﬁning feature of economies of knowledge is the way that the ‘fruitful tensions’ (Eisenberg and Nelson, 2002), complementarities and interdependencies between their

Making knowledge public and private

31

public and private components play out. Turbulence in knowledge brings with it signiﬁcant turbulence in economic organization, with multiple experiments in the conﬁguration of production, distribution, exchange and use processes, and in the ways that the public and private attributes of knowledge are constructed. This leads us to one ﬁnal but important reﬂection about experimentation in economies of knowledge. We track the historical emergence of multiple forms of economic organization and multiple knowledge trajectories. These are ‘economic experiments’ (Rosenberg, 1992b). The experiments relating to knowledge are more obvious: many diﬀerent intellectual endeavours, relating to ideas, theories, experimental practices, technologies and products, can be pursued at any one time. Only time can tell which ones will prevail and which do not and these outcomes depend greatly on the speciﬁcally instituted competitive processes that judge the superiority of some over others. Experiments in economic organization work in a similar way. The standard Schumpeterian account of this takes ﬁrms as the featured actors. Firms lay their bets on which knowledge trajectories to pursue and how best to organize their activities. Ex ante, it is impossible to know which ones will be successful and which will not. Their fate is decided through competition in markets. In this model of the capitalist engine of progress (Nelson, 1990), the dynamics of competition between diﬀerent modes of organizing commercial activities is where the real action is, supported by public (scientiﬁc) knowledge, the availability of which provides a steer for ﬁrms in their searches for economically useful knowledge. Our model of economic experimentation is somewhat diﬀerent and places public knowledge within the capitalist engine itself, as a central part of the economy, a key contribution to its dynamism. The varieties in economic organization that we explore involve experiments in constructing diﬀerent public attributes of knowledge as well as diﬀerent forms of knowledge, and critically diﬀerent modes of interdependency between the public and private. This leads us to a view of multi-modal capitalist economies of knowledge, where the growth and transformation of economic activities occurs across the public and private domains, with the turbulence in knowledge producing ever new divisions of labour, new complementarities and new rivalries.

ANALYSING ECONOMIES OF KNOWLEDGE There is always an inevitable to-ing and fro-ing in the development of a theoretical perspective (in our case, the IEP ‘economies of knowledge’ approach) and the conduct of an empirical programme of research. We have already

32

Public or private economies of knowledge?

described the major contours of our theoretical position and, in the following three chapters, we will present the empirical evidence that supports it. But ﬁrst it is useful to oﬀer an outline of the approach we employed to gather our data and provide an argument for this particular methodology enabling us to ask the pertinent questions relating to the public and private characteristics of knowledge in a new way. We have taken key developments in biological knowledge or, more precisely, speciﬁc knowledge resources as our principal analytical lens. This has allowed us to ask questions about the public and private aspects of knowledge at a ﬁne-grained level of detail, as they pertain to speciﬁc knowledge entities. Three main types of new knowledge resource are examined: databases of diﬀerent biological entities, such as expressed sequence tabs or protein sequences; bioinformatic tools for searching and analysing biological data and the algorithms on which they are based; and the genomes of speciﬁc organisms. Critically, it is an approach that allowed us to track the historical trajectories that mark the development of these knowledge resources over time, capturing the dynamic processes that create, dismantle and recreate the public and private characteristics and their interdependencies. Taking these objects of knowledge as our point of entry then led us to question the nature of the economies of knowledge that have been constructed around them, the processes that account for how they are produced, distributed, appropriated and used in the economy and how this changes over time. This pulled into focus the question of which organizations (ﬁrms, universities) are involved, how they become located in emerging divisions of labour and how these new divisions of labour become coordinated. Then there was the question of scale. To capture the important international dimension to the emergence of this new knowledge ﬁeld, we conducted interviews in Europe, the USA and Brazil. This allowed us to consider the role of nationally speciﬁc institutions in shaping economies of knowledge. But, more importantly, it pointed us towards a study of emergent economies of knowledge that transected national boundaries: their geographical scale and reach could be analysed as outcomes of the ways that economic processes became instituted. In doing this, we have been able to add the issue of emergent scales to our understanding of public and private knowledge, including various global scales of markets and publics. Undertaking a programme of empirical investigation into ‘economies of knowledge’ posed us the considerable challenge of getting our minds into the knowledge itself. We could not aim to become competent bioinformaticians, but needed suﬃcient depth of understanding to grasp processes of specialization, and the alternative scientiﬁc and technological strategies being pursued. Indeed, this level of understanding proved critical for

Making knowledge public and private

33

asking questions about how economies of knowledge function. The structure and content of knowledge resources, the extent to which they can be partitioned or combined, are important factors in the way divisions of labour emerge and are coordinated. We have frequently found ourselves trading thoughts across the disciplinary boundaries between the biological and social sciences. Our own insistence that a grounded understanding of the knowledge resources is imperative for understanding the economic dimension is matched by the bioinformaticians’ questioning of the economies of knowledge that condition the environments in which they conduct their work. The bioinformatic community has been highly aware of the economic tensions accompanying the emerging ﬁeld and have published their own views on access to and ownership of knowledge and arguments about where funding priorities should be focused. Furthermore, as one interviewee made perfectly clear, many discussions at bioinformatics conferences have been as likely to focus on diﬀerent models of organizing economies of knowledge as on the latest scientiﬁc discovery or which science or technology strategy to pursue. So we found many eager participants in our empirical programme of research, enthusiastic to discuss past, current and future strategies for organizing the economies of knowledge in order to advance biological knowledge. We present the results of this empirical research in the next three chapters. Chapters 2 and 3 deal with bio-data and bioinformatic tools, respectively. Our rationale for splitting the ﬁeld in this way is not arbitrary, nor simply for analytical convenience. Rather, it is a reﬂection on how we found the ﬁeld to be organized or, more precisely, how historical processes have created a bifurcation between the two in terms of how the respective economies of knowledge became instituted. The split was not inevitable and could be reversed, but for now it seems the most appropriate way to cut the cake. The two chapters present general overviews of developments in data and tools and then focus in on speciﬁc knowledge resources to chart their trajectories of development and unravel the transformations in their associated economies of knowledge, that deﬁne their public and private characteristics. In Chapter 4, we present an analysis of one particularly fascinating race to sequence a genome, Agrobacterium tumefaciens, an organism of signiﬁcant biotechnological importance and scientiﬁc interest. In this account we foreground the competitive processes, involving ﬁrms and universities, that produced a series of dramatic twists and turns in how the public and private characteristics of this particular knowledge were constructed. The cases that we concentrate on in these three empirical chapters have been selected because they were some of the most dramatic we encountered in terms of turbulence in knowledge and in economic organization. Picking these most turbulent cases provides the most fertile territory for exploring the dynamics of division and

34

Public or private economies of knowledge?

interdependence between the public and private, because the ﬂuidity and unpredictability of outcomes were most pronounced. Inevitably we are mainly dealing with historically unique and globally signiﬁcant series of events. They cannot be claimed to be representative of a population of similar events: there are no comparable facts. It would be possible to take a diﬀerent empirical cut on these same events, although our focus on objects of knowledge around which economies of knowledge develop has an advantage of capturing the dynamic role of knowledge accumulation. The cases demonstrate processes of diﬀerentiation and interdependency between public and private domains with emergent properties. Moreover, they strongly suggest a variety of processes with diﬀerent outcomes. In this sense, when freezing an historical frame at a given point in time, ‘public’ and ‘private’ domains could be presented as broad umbrellas sheltering a complex interplay between many strands and dynamics resulting from the development of economies of knowledge. Changes in processes of knowledge production, distribution, appropriation and use result in multiple intersecting ‘publics’ and ‘private’ forms at any given point in time. Against most received wisdom, our evidence suggests the need to recognize a much more dynamic aspect of public domain formation, with new norms and institutions of knowledge. Interaction, competition, tension and rivalry between public and private domains, often triggered by developments in knowledge production, are critical for understanding processes of diﬀerentiation and asymmetries of interdependency. Markets for knowledge are formed in the context of the growth of public domain knowledge, and vice versa. This multi-modality of economies of knowledge requires an understanding of multiple and interacting dynamics between public and private domains, and this itself is a persistent source of instability. Although there may be periods of relative stability, there are apparently no permanently stable conﬁgurations – indeed, why should there be? Turbulence in knowledge can always provoke a reconﬁguration of economies of knowledge and the emergence of new complexes of private and public domains of knowledge.

NOTES 1. 2. 3. 4.

John Sulston was the UK director of the Human Genome Project at the Sanger Centre. Maynard Olsen was director of the Genome Center at Washington University. Craig Venter was founder of Celera and The Institute of Genomic Research. Previously he was a researcher at the National Institutes for Health. The Merck Gene Index was reported in Nature, 16 February, 1995, p. 549 as processing 300 000 human gene sequences in 18 months, and depositing ESTs into the EST division of the public database, Genbank, at the rate of 4000 sequences a week.

Making knowledge public and private 5. 6. 7. 8. 9.

10.

11. 12. 13. 14.

15. 16.

17.

35

‘The blurry line between generic knowledge and speciﬁc application ﬂags attention to the fact that the division of labour between industrial labs and universities is neither sharp nor innate’ (Nelson, 1990, p. 197). They also map the tacit-codiﬁed dichotomy onto another fourfold taxonomy of knowwhy, know-what, know-how, and know-who, which in turn is partially mapped on to the antithetical dichotomy of science (know-why) and technology (know-how). ‘Most of science is in Pasteur’s quadrant. . . .The lion’s share of the research done in the US, funded by government, and undertaken in universities and public laboratories, is in ﬁelds in Pasteur’s quadrant’ (Stokes, p. 461). Merton’s republic of science can be seen to be constituted by ﬁve essential characteristics: CUDOS or communalism universalism disinterestedness originality scepticism (Ziman, 1994). ‘The methods of modern science are not suﬃcient to create the unique cultural ethos associated with “the Republic of Science”. . . . Rather than emerging and surviving as robust epiphenomena of intellectual inquiry, the institutions of open science are independent, and in some measure, fortuitous, social and political constructs’ (David, 2001a, p. 17). Nelson (2004): ‘Open access permits many potential inventors to work with new knowledge. Privatisation closes oﬀ access to only those whom the owner allows to make use of it’ (p. 457). ‘Important areas of science are now much more under the sway of market mechanisms than used to be the case’ (ibid., p. 462). ‘If we need proﬁt-seeking ﬁrms to tell us that the public domain has value, something is missing from our understanding of science [sic, as an economic activity, or knowledge?]’ (Eisenberg and Nelson, 2002, p. 103). The term ‘anti-commons’ was ﬁrst coined by Michelman (1982). The case is made with respect to retail space, or domestic space within communal or shared domestic housing, a ﬁnite and given resource. The IEP approach substantially modiﬁed Polanyi’s original seminal concept, in relation to a broad empirical range including contemporary markets of many kinds (food, call centres, labour, mobile telephony, construction), as well as the emergence of pension systems, welfare regimes, and legal and taxation institutions as they condition market transactions (Harvey et al., 2002; Harvey et al., 2007; Clasquin et al., 2004). www.geneontology.org. Perhaps one of the most famous cases illustrating that was the recombinant DNA technology developed by Cohen and Bayer at Stanford University. The university held the patents to the results of their eﬀorts, and when they left to form biotechnology companies, they had to license back the technology from Stanford (Nelson, 2006). ‘The goal of the GAIN Intellectual Property (IP) Policy is to maximize the public beneﬁt of research fostered by GAIN to identify the genetic factors that contribute to common diseases, as well as molecular targets for the prevention, diagnosis and treatment of these diseases. It is the intent of the GAIN Steering Committee to promote broad freedomto-operate for all users of GAIN data by rapidly placing data in the public domain’ (http://www.fnih.org/GAIN/GAIN_home.shtml).

2. The data explosion: the emergence of a new form of bio-knowledge WHAT IS BIO-DATA? A central feature of the revolution in biological knowledge has been the emergence of bio-data on a massive scale. Dire predictions have been made of a science swamped by unanalysed data or, more signiﬁcantly, of a proliferation of data of such complexity and variety as to outstrip the development of concepts and tools to coordinate, analyse and interpret the results of data production (Botstein, 1999; Butler, 2001; Reichardt, 1999; Harvey and McMeekin, 2002). This is well summarized in a report commissioned by the US National Institute of Health: The information that biomedical researchers are amassing in profuse quantities today – from the Human Genome Project, clinical trials, statistics, population genetics, and imaging and visualization research – creates enormous digital repositories of information. The scale of those databases swamps all the information collected before. They encompass multigigabyte, multivariate functions of three dimensions, of wavelength, and of time. The infrastructure needed to make them available is phenomenal: a single biomedical laboratory could produce up to 100 terabytes of information a year – about the same as the information in one million encyclopaedias. In order to be useful, the data must be indexed and stored, and the challenges for data analysis and abstraction are formidable. (Botstein, 1999)

Although similar data deluges can be seen to occur in astrophysics or the monitoring of supermarket consumption (Grindrod, 2001), nonetheless bio-data production is signiﬁcantly driving the development of the next generation of super-computers. The issue, however, is not solely or primarily one of sheer volume. In this and the subsequent chapter we shall be exploring the eﬀects of the separation of data production from both software development (algorithms, sequence similarity tools, models and so on) and, more radically, hypothesis-driven experimentation and theory building. It marks a break from established epistemic routines of (mostly ‘wet’) laboratory biology. Data production and validation as it developed was dependent on software to assemble robust data, add relevant information (annotate) and test for 37

38

Public or private economies of knowledge?

quality. But how, even whether, a division of labour between data production and warehousing and software development would become established, within and across the public and private economies of knowledge, only became clear over the course of several decades. Given loose talk about ‘raw data’, however, it should be emphasized that data production is strongly informed and conditioned by theory. The ‘data’ of genomic or protein sequencing is the outcome of continuous scientiﬁc as well as technological experimentation. Whether for the basic building blocks of DNA nucleic acid sequence data or for complex 3-D imaging of protein structure, the past 25 years have been a period of unending development, in terms of speed and quantity, certainly, but also of quality, reﬁnement and an expanding range of data informed by biological theory. How long strings of continuous DNA double helix sequences were divided up to be analysed, whether into blocks of identiﬁed sequence related to known chromosomal position, or into random fragments drawn from the whole genome, became as much a theoretical as a data-production issue. As we shall see, there were heated public controversies over technologies of clone-by-clone versus automated ‘shotgun’ methods of nucleic acid sequence data production in the early and mid-1990s (Sulston and Ferry, 2003; Cook-Deegan, 1994; Zweiger, 2000; Moody, 2004). There were also scientiﬁc disputes about the very constitution of a genome (Olson, 1998). Moreover, from the outset of these ‘gene wars’, sequences of acids considered as ‘data’ in this generation were the theoretical discoveries of the previous one. And that pattern of routinization, from one generation’s theoretical construction to the next generation’s data production, is typical. With this in mind, the digital information database is nonetheless a remarkable and newly prominent feature on the biological knowledge landscape. For the central question of this book, moreover, precisely the separation of databases from other forms of knowledge institution presents quite distinctive issues for public and private production, distribution, appropriation and use. The most striking feature of this new landscape is undoubtedly the emergence of major global institutions supporting these databases: GenBank in the USA; the European Bioinformatics Institute at Hinxton, in England; the National Database of Japan. To explore the processes of making data public and private, as a distinctive form of knowledge, successive turbulences surrounding genomic databases will be presented, the human genome race itself, and a key phase related to it (Expressed Sequence Tags, or EST databases). Inextricably connected to these developments, a particular turbulence concerned the development of Single Nucleotide Polymorphism (SNP) databases, because of the potential signiﬁcance of genetic mutation and variation for diagnosing and understanding human disease in diﬀerent populations. Finally, the case

Emergence of a new form of bio-knowledge

39

study of the historical development of a global protein database, UniProt, demonstrates further complementary elements for an understanding of what is entailed in making knowledge public and private. This chapter, however, is a powerful demonstration of the ‘making public’ of knowledge, against the prevailing understanding of the erosion of the public domain by creeping privatization, encouraged, it is argued, by governments through changes in incentives to academia, most notably the 1980 Bayh–Dole Act in the USA (Coriat and Orsi, 2002). At the early stages of this story, the future shape of public and private knowledge was uncertain, and many anticipated a wholesale privatization of genomic knowledge. Questions were asked by the scientiﬁc community about the viability of publicly supported databases, fearing for their imminent demise (Ellis and Kalumbi, 1998, 1999; Ellis and Attwood, 2001). But, at this stage of history, there is no question but that publicly resourced, publicly institutionalized and publicly accessible databases dominate the landscape.1 The challenge of this chapter is to understand that outcome, ﬁrstly, but then also to demonstrate how complex the reality of public knowledge is, including the speciﬁcity of public data. The ﬂood of data itself has been entirely dependent on the technological revolution in processes of data production, high-throughput sequencers, microarrays and polymerase chain reaction (Mullis and Falcona, 1987). Although much of this has been led by private companies (for example, Aﬀymetrix, IBS and Agilent), a moment’s examination will reveal much more complex interactions between the data producers and the instrumentation developers.2 But even if this interdependency enlarges an understanding of ‘making public’, databases and data production present quite new dimensions within the process of making public, notably in the establishment of public rules and standardization, quality of data, interoperability of databases, the process of ‘publication’ through deposition of data without normal peer review, rules on the frequency and timing of putting data into the public domain, changing journals’ rules for the acceptance of articles in a way that recognized the separation of data from analysis, and rules of data access and use. From an institutional point of view, there was a void of norms on how the new streams of bio-data could come under public control, determining how such data could be produced, accessed, distributed and used. The instituting and resourcing of databases was breaking radically new ground. Last, but not least, the scale of the public in key instances has been shifted from the national to the global. Unlike many easy assumptions that knowledge in the public domain is ipso facto globally ‘universal’ public knowledge, the piecemeal and hesitant institution of global, data-interchanging, bio-databases, demonstrates that this shift is made, not given, and that making ‘scale’ is in itself quite a complex process.

40

Public or private economies of knowledge?

It would be overly ambitious to survey the vast range of new biodatabases that have emerged over the past decades. We have deliberately chosen those new forms of knowledge where there were the most acute tensions and conﬂicts, seeing how public databases emerged out of turbulence in which radical commercial alternatives seemed equally possible. Firstly, we explore the revolution in data production consequent upon new technologies of sequencing. This is then followed by an analysis of perhaps the critical and formative moment in the mid-1990s when there was a ‘gold rush’ for what were seen to be ‘keys to genes’. This set the context in which new public norms of data deposition and distribution were established. We then analyse how, in the following period, the emergence of university– industry consortia play a signiﬁcant role in developing new public institutions for gene mutation data, in spite of their considerable commercial interest. The ﬁnal empirical example is taken from the long histories of protein databases, and is explored for the critical scale-dimension of global public databases.

TECHNOLOGIES OF DATA PRODUCTION Although not a primary focus of the argument of this chapter, technologies have revolutionized digital data production, without which the databases as we now know them would not exist. Conversely, however, scientiﬁc combined with medical goals of such undertakings as the Human Genome Project, almost hubristic in its ambition, drove technological development. Distinctively, the demand was for data production on a previously unheard of industrial scale (Hunkapiller et al., 1991; Waterston and Sulston, 1998). Determining the continuous and exact sequence of 3 billion base pairs of nucleic acids required quite new epistemic procedures and resources, at the same time economic, scientiﬁc and technological. The interdependence and interaction between technologies of scientiﬁc instrumentation and scientiﬁc knowledge have been widely demonstrated (De Solla Price, 1984; Rosenberg, 1992a). Although not in the business of mass-manufacturing instrumentation, universities have often developed original prototypes, subsequently developed and sold in markets for instrumentation with a variety of possible uses. Prototypes of high-throughput sequencers of genomic data were ﬁrst developed in Caltech and the European Molecular Biology Laboratory, before being developed for commercialization, respectively by ABI and Amersham Pharmacia. Meldrum (2000a, 2000b) provides a myriad of examples of instrumentation technologies for every one of the multiple phases of genome sequencing, some originating from industry to be developed in university laboratories, and then returning to commercial

Emergence of a new form of bio-knowledge

41

manufacture, others originating from universities, migrating backwards and forwards during their development. She emphasizes, moreover, that diﬀerent laboratories undertaking diﬀerent strategic scientiﬁc or commercial goals assemble distinctive suites of instrumentation, from academic and commercial sources, as well as adapting them within the laboratory context. The objectives of data production required new combinations of technological and disciplinary competencies (chemistry, computer science, engineering, mathematics and physics) to be wedded to the new biology (Meldrum, 1995). These interactions are constitutive of public and private scientiﬁc and technological activity: it is diﬃcult to conceive of a market dynamic separate from a public knowledge-growth dynamic, or vice versa. The key breakthrough technologies of mass data production,3 all depended on this dual dynamic, for development, public use and diﬀusion, and markets. Having said that, two aspects of the technologies of data production can be singled out to demonstrate that this dynamic is frequently and critically one of tension and competition, not symbiotic interdependence or mutually beneﬁcial interaction. Both relate to genomic data production, the ﬁrst concerning completion of a genome by the route either of whole shotgun or clone-by-clone sequencing, the second the relationship between hardware and software. Although a gross simpliﬁcation, the challenge to complete the whole human genome required the reconstruction of a full-length double helix of DNA with 3 billion base-pairs stretching a length of six metres within each cell. Diﬀerent techniques were developed for breaking it down into bits and then reassembling the bits. It is rather like an impossibly large jigsaw puzzle made up of monotonously similar pieces. One key technical problem in data construction was the size of the pieces, and various techniques of cloning strings of DNA of varying lengths were developed over the course of genome construction.4 Within each piece, of course, there is integrity of the sequence, so the size of the piece becomes important. There was also a question of whether and how much to rely on existing knowledge of the organization of chromosomes, and the location of genes and strings of DNA sequence that had already been mapped. Moreover, especially in the human genome, there are lots of ‘repeats’, and this means that, if pieces are small, two pieces could well look exactly the same in reconstruction, but in fact ﬁt into quite diﬀerent places in the overall sequence. There is a problem of ﬁt – or of contiguity – that has to be established, which is why eventually the good ﬁts were called ‘contigs’. Unlike a normal jigsaw, the ﬁt was assured not by the shape of the pieces but by the sequence of the four nucleic acids, As, Cs, Ts and Gs. There is also a problem of potential gaps, arising from misﬁts and false identiﬁcations of ﬁts.

42

Public or private economies of knowledge?

The whole-shotgun sequencing method involved automation and small fragments of unknown location. But, the smaller the pieces, the more randomly they were picked from the whole set of pieces of the total genome sequence, and the greater the sophistication of software needed to check and reassemble the fragments. In order to overcome this, the jigsaw had to be assembled several times over so as to ensure, with statistically high levels of probability, that all the pieces were in the right place (this was called ‘coverage’, with up to 10x coverage considered as necessary for reliability). PCR as a technology enabled the rapid and mass cloning or hybridization of relatively small ‘pieces’ of the jigsaw, or strings of DNA. Automated sequencers were capable of reading and reassembling contigs also on a mass-assembly line scale. The key breakthrough technology was the laserreading of ﬂuorescent dies using electrophoresis on gels on which there were fragments of DNA produced by PCR (Smith et al., 1986). This became the basis for technologies enabling the ‘whole genome shotgun’ approach. It revolutionized genome research, and produced genomic sequence data at an unprecedented rate. The alternatives to reconstructing the jigsaw took larger pieces, and from particular chromosomes, and known regions of the overall jigsaw, combining mapping with automated sequencing. This technique (much slower, and requiring many more manual laboratory operations) was called the ‘clone-by-clone’ method. It was generally assumed that only clone-by-clone mapping would eventually be required to ﬁll in the whole jigsaw, eliminating repeats and ﬁlling tricky gaps in the less ‘interesting’ or murky parts of the picture. Conversely, the shotgun method might arrive at an incomplete picture, but with all the ‘best’ and well deﬁned (commercially or medically signiﬁcant) bits. This would be a gene hunt, rather than a complete genome undertaking. The ‘shotgun’ technology and its use were deeply controversial, especially at the initial stages of human genome projects. At the peak of hostilities, the Nobel Prize winner, Watson, described the shotgun method as work done by ‘monkeys’. For our problematic the controversy had a particular twist. Although there were three competing automated sequencers,5 Applied Biosystems Inc., soon to be purchased by Perkin-Elmer, held a strategic market advantage with its Caltech association, strengthened by Craig Venter joining a new company, Celera, from NIH. From Perkin-Elmer, Celera had both ﬁnancial backing and the new data-producing machines, and were hence capable of producing genome sequence data independently of mapping on a clone-by-clone approach. At the time, there was uncertainty in the scientiﬁc community as to whether the shotgun method, successful in completing genomes in bacteria, would ever succeed in genomes, such as the human, where ‘repeats’ were more numerous by a factor of 25x (Green, 1997).

Emergence of a new form of bio-knowledge

43

By and large, therefore, a divide opened up between the publicly funded laboratories of the Human Genome Project and the commercially-led projects, adopting a no-mapping, totally shotgun approach to the production of data. In these circumstances, the HGP felt constrained to adopt the shotgun approach, abandon its linkage with clone-by-clone mapping, and deposit data into Genbank that was relatively unprocessed (Sulston and Ferry, 2003). The proof of the shotgun method was ﬁrst demonstrated by Venter with the publication in 1995 of the inﬂuenza virus genome, Haemophilus inﬂuenzae,6 and followed up by the ﬁrst whole genome of a free-living organism, the classic object of genetic science, Drosophola melanogaster in 1999 (Moody, 2004).7 At the same time, obviously, they were compelled to purchase the ABI 3700 sequencers,8 under pressure to compete, and so at least doubled the market for the sequencing machines from the very company that was ﬁnancing their major rival. So competition forced publication of unmapped sequence data in the main global databases, GenBank, EMBL and NDJB. We shall see later that there was great scientiﬁc merit in the availability of such data for the purposes of similarity searches and comparative genomics. Here, the issue is one of technologies of data production, and the dynamic tensions between private and public processes. Eventually these drove the construction and publication of unﬁnished jigsaws, the so-called ‘rough drafts’, in many ways quite a peculiar and arbitrary, politically negotiated,9 picture. In that sense, the dynamic in combination with the technology intensiﬁed the separation of ‘data’ from analysis, and was more ‘raw’ in the sense of being incompletely processed by the standards of closure required to ﬁnalize a genome. The technologies of PCR and automated sequencers enabled the churning out of masses of sequence data only processed to a certain stage, and the competition ensured deposition in the public domain. The HGP was thoroughly concerned (and for good reason, as we shall see below) that public deposition was a necessary step to block possible patenting by their commercial rivals. So this ﬁrst example demonstrates both interdependence and competition between private and public in ways that drove the development of data production and its separation from other related epistemic activities in speciﬁc directions. It altered the course of scientiﬁc activity. This was later enshrined in a consensus modus operandi, in which complex genomes were phased into separate shotgun and ﬁnishing stages (Waterston and Sulston, 1998; Meldrum, 2000b). The second aspect of these new technologies of data production was whether software to process the digital readings of the automated sequencers (or any other digital data producing instrument) was embedded in the commercially developed and marketed hardware, or separate from it.

44

Public or private economies of knowledge?

Here again the issue was one of where the divide was constructed between public and private, the nature of the interdependency. In the next chapter we shall be exploring the issues involved in software development. But, in terms of technologies of data production, even the preliminary ‘data’ of a shotgun method involved the application of algorithms of sequence similarity mapping, and statistical methods for ensuring reliability. Ironically, one of the major opponents of the shotgun method for complex genomes, Phil Green, was responsible for developing statistical reliability tests (‘phred’ and ‘phrap’) that eventually became publicly accepted quality norms required for ‘publication’ by deposition of sequence data into the public databases. Techniques of similarity searching, however, were in essence an experimental activity in the new dry laboratories. Diﬀerent methods could produce diﬀerent results, and many scientists, particularly working to produce scientiﬁcally robust results, required control over that critical phase of converting instrumental digital outputs into scientiﬁc data. For such data to be appropriated by the ‘public’ of the scientiﬁc community, accessible and distributed in a commonly usable format, required shared norms and standards. The issue surfaced signiﬁcantly in the genome projects. The new ABI sequencers came with their own embedded software, that was both closed to development and also not interoperable with Staden’s software programs used for sequence assembly (Sulston and Ferry, 2003).10 Consequently, Sulston and Staden decrypted the ABI software, reverse engineered it and developed their own data display programs. This enabled them to retain and develop experimental control over this phase of data production.11 The Sanger-based automated sequencers of the 1980s are now being replaced by the next generation of sequencers, the 454, Solexa and Helicos Biosystems machines. These have the capability of accelerating genomic data production to a new scale. Very similar controversies are emerging (O’Rourke, 2006), with the new machines once more embedding their own closed proprietary software data quality systems. The 454 claims to be ‘phred-compatible’ with its Quality Score software, while Solexa’s Bustard works on diﬀerent principles. Nusbaum, from the public Genome Sequencing programme of the Broad Institute, and Phil Green, once more join the fray, insisting on the importance of the scientiﬁc community being in control of open methods of converting instrumentally raw into scientiﬁc data according to testable quality procedures.12 Once again, moreover, new hardware methods of data production have required common public norms of data deposition, as a condition for data to be made ‘public’. The interdependency of hardware, software and, most importantly, the scientiﬁc use of the data once in the public domain, demonstrates that new technologies only develop new markets on condition of the development of

Emergence of a new form of bio-knowledge

45

new public norms, one aspect of which is that the norms themselves are not up for sale. In this respect, innovation of public norms is as integral to the innovation process as the new bits of kit. There is reluctance in the scientiﬁc community to go down a supermarket route with an oﬀ-the-shelf Solexa quality brand bidding against a 454 quality, with users unable to check the quality procedures. In terms of an ‘instituted economic process’ analysis, appropriation by the public science community meant exercising control over the scientiﬁcity of the data – standards of rigour open to scrutiny – and enabling common channels of distribution to a community of users. A Solexa brand, like a Microsoft Windows package, might ensure a usability, but not a scientiﬁcity.

COMPETITION AND CONFLICT OVER GENOMIC DATA If technologies of data production demonstrate the asymmetric interdependence, and indeed co-development, of the public and private, the making of data as a new institutional knowledge form created much greater, publicly manifest, turbulence. From the early 1990s, the idea of acquiring and demonstrating the knowledge of a whole genome, ultimately the whole human genome, became a major scientiﬁc objective. For some, a ‘whole genome’ is quite easily deﬁnable, but the idea of complete and ﬁnished knowledge with clear boundaries, given the continuously shifting grounds of science, is quite unusual. In the early stages, there was no hint at the possibility of there being a ‘rough draft’, and even, given the then prevailing physical techniques of genetic mapping, of the probabilistic and statistical standards of completeness that eventually emerged. Reverting to the jigsaw metaphor, however, completeness attained a quite concrete meaning of being sure that all the pieces were there and in the right place. What it did not mean was that the picture was clear, or that every part of what it represented had been made understandable. Ewan Birney, a leading genomic bioinformatician at the European Bioinformatics Institute and head of the Ensembl group preferred the metaphor of a huge and dense Russian novel. Knowing that all the pages are there, and all the sentences are in the right order, gives an immensely satisfying achievement of completeness: Yes, there are some pages that are deeply mysterious to you, you don’t know what the author meant. . . . It’s a way of encircling the entire organism . . . The wonderful thing about genome sequencing is that it’s this kind of Russian novel that is incredibly thick and dense and diﬃcult to understand but it is complete.

46

Public or private economies of knowledge?

The important aspect of this concept of the genome for our purposes is that it already implies a certain level of separation between data (the pieces, sentences, pages) and the analysis, or scientiﬁc understanding of their signiﬁcance. Completion meant a complete set of the relevant data, here the total sequence of As, Cs, Gs and Ts, the millions or billions of base pairs of a genome of a given species.13 This becomes a more or less generic characteristic of biological databases, whether the sequences are of nucleic acids or protein sequence data. However, where boundaries begin to dissolve, and completeness as an objective of scientiﬁc knowledge begins to slip through the ﬁngers, is in how much interpretation, and of what kind, should be attached – in the case of genomes, by a process called annotation – to the sequence data. Completeness of a genome can be extended to mean that every gene within the sequence data had been identiﬁed and located in the map. Then, every gene would be identiﬁed with a known function. This conception of completion, however, carried within it the now widely discredited linear model, in which causal connections could be directly established from every discrete sequence of every single gene to a particular protein, from protein to cellular function, and from there to the total physical phenotype of the organism, even including expressions in behaviour, manifestations of intelligence, and so on. It is clear, however, even before the emergence of non-linear, systems biology approaches, that this conception of completion included far more than a closure on all relevant empirical data, and constituted a particular, once fashionable, biological theoretical model. At the outset of the genome projects, there was much less consensus or clarity about what would eventually constitute a database of complete genomes, and even greater confusion and conﬂict over how to get there. We have already seen how the technologies available profoundly shaped the eventual routes taken, and indeed, what therefore came to be accepted as the paradigm of a biological database. But very early on, a key issue became one of making data public or private. How to make data public knowledge was problematic, how to make it private, equally so, and whether it should be one or the other, perhaps the most controversial of all. Nowhere is this more clearly demonstrated than in the contrasting cases of Expressed Sequence Tags (ESTs) and Single Nucleotide Polymorphisms (SNPs). The patenting of data, as a type of knowledge, presented quite new challenges to existing patent law. The issue is not, as often portrayed, one of patenting nature, or naturally occurring phenomena,14 or thousands of genes or fragments of genes. The issue, as always, is patenting knowledge, and what was novel was the type of knowledge. When patenting fragments of genes, the issue was one of patenting knowledge fragments, not organic fragments, and the threat was to the coherence of knowledge, and its development.

Emergence of a new form of bio-knowledge

47

THE SCRAMBLE FOR ‘KEYS TO GENES’: EXPRESSED SEQUENCE TAGS What are Expressed Sequence Tags, and why, for a time, did they assume such signiﬁcance? Faced with the gargantuan task of sequencing whole genomes with, in the case of the human genome, billions of base pairs of nucleic acids, in the early 1990s some scientists thought that they had discovered a short-cut, particularly when combined with the technologies described above. In the overall sequence of nucleic acid, only certain segments of sequence (exons) code for proteins, as a consequence of a transcription from DNA into messenger RNA (mRNA). In the linear model, the passage from DNA to RNA to protein, to cell, is the royal road to biological understanding. In this framework, only those segments of DNA that express themselves in proteins are genes and, conversely, if it is possible to ﬁnd the sequence of all protein-encoding DNA, the whole genome would be a jigsaw puzzle of all genes (if not of all DNA sequence). First Sydney Brenner (of the UK Molecular Biology Laboratory; Brenner, 1990) and then Craig Venter considered this to be the desired short-cut, even if it was only to a staging post in arriving at the whole DNA sequencing of a genome. The EST provided the technical means to that end. Taking neural tissue from the protein-rich brain, extracting the mRNA expressed in it, and then reversing the transcription to produce only the complementary DNA segments (cDNA) corresponding to protein expression, would result in a tool that could mine for all segments that correspond to genes (Adams et al., 1991). Genes were deﬁned by those DNA segments expressed in proteins. These cDNA segments could be ‘tagged’ so as eventually to build up contiguous sequences from the fragments. For that reason, Venter and colleagues called them Expressed Sequence Tags. ESTs, however, or more particularly, the cDNA sequence segments, were tiny jigsaw pieces, often only 150–300 base pairs long. It was hoped that one fragment of a DNA sequence would be suﬃcient to identify the whole protein-encoding DNA exon sequence segment: an EST was the key to the discovery of a gene, through similarity searching between the fragment segments and the whole exon sequence segments. At the time, Venter was employed by the National Institute of Health and, in early 1991, through to 1993, NIH began patenting thousands of ESTs, the ‘gene-keys’ (Roberts, 1991a, 1991b; CookDeegan, 1994; Zweiger, 2000).15 The resultant storm of controversy exempliﬁes how turbulence is at once epistemic and economic. The scientiﬁc controversy over ESTs was expressed in starkly divergent claims as to their value, Venter and colleagues suggesting that a full EST-based description of the human genome

48

Public or private economies of knowledge?

would result in 80–90 per cent of genes (as opposed to all DNA) being sequenced, John Sulston, representing the Human Genome Project, countering with an estimate of only 8–9 per cent (Roberts, 1991a; Marshall, 1994; Sulston and Ferry, 2003). For some enthusiasts, ESTs were at least the royal route to the epistemic (and commercially interesting) goal of identifying the vast majority of genes. There is no doubt that this was their scientiﬁc belief. For doubters, there were two main weaknesses in scientiﬁc method and assumptions. The ﬁrst was that genes are expressed to varying extents in diﬀerent tissues, and diﬀerent developmental stages, so that a data source based on the results of gene expression was inherently unreliable and uneven. A lot would be missed, and the comprehensiveness of tissue sampling was never properly addressed. The second concern, perhaps even more serious, was the paradigm assumption about genes and the growing uncertainty surrounding it. ESTs would only tap DNA sequence segments expressed in proteins. It would miss all gene regulation segments that switch expression on and oﬀ, or segments critical for hierarchical functional organization of multiple genes.16 This second source of doubt went to the very heart of the question: what, biologically, is a gene (Attwood, 2000)? Both protagonists and antagonists were passionately engaged in a process where, for some years, these issues were unresolved. What, scientiﬁcally, ESTs were good for, their potential knowledge beneﬁts, was disputed. They were symbolic of one of those critical phases of radical uncertainty involved in the development of scientiﬁc knowledge. The scientiﬁc uncertainty necessarily enmeshed with the economic uncertainty, namely, what was an EST worth, either as a public good or as a private good. The NIH decision to patent ESTs created a parallel and intertwined turbulence, alongside the scientiﬁc one. In the short term, it resulted in the resignation of James Watson, co-discoverer of DNA, from the NIH Human Genome Project, and who was vehemently against patenting. Eventually, the Director of the NIH that promoted the patenting, Bernadine Healey, also resigned when political changes following Clinton’s election led to her replacement by Harold Varmus. He, along with Al Gore, had taken ﬁrm decisions opposed to further patenting. Given these high-proﬁle ‘casualties’, the fact that there were congressional hearings, high-level cross-Atlantic diplomatic interchanges, as well as partisan and many-sided debates within the main science journals, it is not surprising that the issues raised by EST-patenting are both complex and profound. But, as with the scientiﬁc turbulence, the drama unfolded in great confusion about what the issues were, especially in the early days, between 1991 and 1993. Although strongly promoted under the technology transfer policy by Adler (Adler, 1992), who ﬁrst initiated the move to patenting, there was industry uncertainty and opposition to the role of a public funding body,

Emergence of a new form of bio-knowledge

49

the NIH, in licensing its potential patent holdings (Eisenberg, 1991). There was major uncertainty as to NIH’s purposes of patenting, whether the intention was to secure revenues for the funding bodies or support US biotechnology and pharmaceutical companies; to facilitate or indeed obstruct future patenting of full genes; to protect US companies from foreign patents. Real fears were expressed that the patenting would either sabotage any publicly funded human genome project or eﬀectively de-rail any attempt to characterize the whole DNA genome. The very possibility of international collaboration for genome projects (already proving successful for yeast) was seen to be placed in jeopardy. International patenting wars appeared on the near horizon, with threats of retaliatory action by European and Japanese institutions (Anderson and Aldhous, 1991; Gillis, 1992). In short, a whole range of fundamental institutional arrangements and assumptions were thrown into the melting pot. This double ‘crisis’ can in retrospect be seen as an historically critical turning point, whose ultimate development was unforeseen by any of those involved, where the stakes were extremely high, and whose consequences signiﬁcantly shaped subsequent changes in economies of knowledge. The schematic chronology of the main events is presented in Table 2.1, below. In deciphering the history,17 however, what needs to be stressed are the dynamic tensions and interactions between processes of making data public and private. The initial process of making EST data private was through the use of patents by the NIH, in a claim that was very broadly framed, so adding to the confusion and turbulence (Roberts, 1991a, 1991b; Zuhn, 2001; Eisenberg, 1991). The claim was for a patent not only on the short fragments of cDNA, the ESTs themselves as ‘keys’ for genes, but also for the full-length DNA sequence discovered by similarity searches of public DNA databases: the gene itself; for the proteins encoded by the gene; and as experimental tools for amplifying and cloning genes using PCR.18 The fact that an EST could identify as yet unknown genes, coding for as yet unidentiﬁed proteins, was an immediate and central issue of contention. The NIH patent ﬁling appeared to be a ﬁshing exercise, broadly cast and with an indiscriminately large net. Given that known genes were patented and patentable, the real fear driving the NIH’s claim was undoubtedly that publishing rather than patenting genes, even when unidentiﬁed, would either preclude any future patenting19 or, worse, enable foreign competitors to patent them in diﬀerent patent jurisdictions. ESTs promised both too much and too little, and demonstrated an inherent dilemma of being either too soon or too late in capturing knowledge in a rapidly developing process by a ﬁxed deﬁnition of utility and novelty. As we will see, several years down the line, the promise of ESTs was subsequently undermined by

50

Public or private economies of knowledge?

Table 2.1 Main historical events in the process of making ESTs public and private 1991 June 1991 December 1992 February 1992 June 1992 July 1992 July 1992 September 1992 September 1992 November 1993 August 1993 . . . .

1993 November 1994 February 1994 . . . 1994 . . . . 1995 February

1995 July 1996 October 1997 June 1997 September 1998 . . . 1999 September 2005 May

NIH ﬁles patents, Science publishes 350 ESTs UK Medical Research Council withholds EST data NIH ﬁles patents for 2735 ESTs NIH invokes international Patent Cooperation Treaty to protect its patents against MRC MRC ﬁles patents for 1200 ESTs, some the same as NIH’s Venter quits NIH, establishes TIGR with rights to commercialize through HGS, with $85 million USPTO rejects NIH’s ﬁrst claim for EST patents NIH ﬁles patents for 4448 ESTs Clinton elected, Varmus replaces Healy at NIH USPTO rejects NIH EST patents (2421) for a second time SKB obtains rights to ﬁrst use of EST dbs from TIGR, and academics subject to reach-through licensing agreements (RTLAs) for $125 million. Academics boycott TIGR database Incyte IPO to develop its own EST database with cDNA libraries, public and in-house Varmus withdraws NIH patent applications for 6896 ESTs Pﬁzer ($25 million) and Upjohn ($20 million) pay Incyte for non-exclusive rights to access to EST database An EST subdivision of Genbank, dbEST was established, with 50 000 ESTs from various organisms Merck release its ﬁrst batch of ESTs into GenBank, soon accounting for half the sequence records in GenBank USPTO issues new Utility Examination Guidelines, opening up possibility of patenting ESTs A database of non-redundant ESTs, UniGene, was created within dbEST TIGR breaks with HGS and deposits 110 000 ESTs in Genbank without reach-through rights USPTO declares ESTs to be patentable, under speciﬁcation limiting claim to deﬁned partial sequence Venter and Incyte move from EST to shotgun DNA sequencing USPTO issues Revised Utility Examination Patent Guidelines Court of Appeal, CAFC, rejects EST patent claims from Monsanto, In re Fisher

Emergence of a new form of bio-knowledge

51

developments in genomic knowledge and technologies. The move to patent keys to genes, which eventually were shown to be defective in many respects, in retrospect makes the whole exercise appear woefully premature. Before turning to the next critical phase, however, it is worth taking pause to reﬂect on exactly what the patent entailed in terms of the dynamic of making public and private. For it was no simple assertion of private property rights, not least because, in this case, the claims were made by a pre-eminent public, in the sense of a governmental and tax-resourced, scientiﬁc funding body. Scaled up to the vaunted promise of delivering 90 per cent of all human genes, the EST patents implied a potential role for the NIH as a quasi-state-monopoly licenser to the whole commercial pharmaceutical sector. As important, however, was the fact that the patent itself enshrined a particular interdependence between the patented ‘invention’ and public domain knowledge. The epistemic value of an EST relies on the use of a public domain software algorithm (BLAST)20 searching public domain DNA sequence databases (Genbank and others). The public character of neither the software nor the DNA data banks was in any way encroached by patenting. The third aspect to the peculiarity of EST patenting concerns the ‘publicness’ of knowledge of the laws of nature, as against the appropriability of human artefacts. The very hybridity of ESTs in this regard, as well as the nature of the particular NIH patent claim, contributed greatly to the confusion and controversy surrounding their status. ESTs are at once clearly artefacts of a technical procedure of fragmentation of the mRNA sequence followed by subsequent reverse engineering of DNA sequence to create complementary DNA, neither entity nor process being natural, and at the same time a recruitment of natural processes (transcription and translation into protein) and natural entities (genes). As the patent claims indicated, ESTs were at the same time a research tool and what was discovered by the research tool (the gene sequence data). In the event, the US Patent Oﬃce rejected the NIH patent claims to ESTs on all three main grounds, of non-obviousness, novelty and utility, as well as lack of full disclosure of information or biological materials suﬃcient to enable replication by other scientists. Although rejection might have induced prolonged negotiations reﬁning the scope of the claims, the political shift in November 1992 resulting from the Clinton election and the replacement of Healey by Varmus at the head of the NIH eﬀectively put an end to the NIH project of patenting ESTs, no more being submitted, and eventually, a year later, all existing claims being withdrawn. As we shall see, this did not block the patenting of ESTs terminally, and, more importantly, ESTs continued to be a key element in stimulating the change in patent regulations, redeﬁning private appropriability of knowledge.

52

Public or private economies of knowledge?

Far from resulting in a status quo ante of reinforced public domain knowledge, however, the obstacles to patenting combined with scientiﬁc hostility to the EST short-cut to the human genome, triggered a reconﬁguration of the public–private divide, and a period of further attempts to re-draw boundaries and interdependencies. Above all, this was symbolized by Venter’s move out of NIH, taking his whole EST team with him (Hamilton, 1992), primarily because his research was blocked by the drying up of public research funding (Cook-Deegan, 1994, pp. 315–16). The move witnessed the establishment of a new and distinctive public–private arrangement. The Institute of Genomic Research (TIGR) was founded under Ventner, along with his database of ESTs, as a not-for-proﬁt organization. TIGR was funded by a $70 million over ten years ﬁnance deal with Human Genome Sciences (HGS), a private genomics corporation.21 However, in spite of Venter’s wish to deposit ESTs on the public database, HGS had ﬁrst sight, and a time slot of six months’ exclusive use, of discovered ESTs. Soon afterwards, SmithKline Beacham made the ﬁrst substantial investment of a major pharmaceutical company for exclusive rights to the genomics data, a deal worth $125 million. Moreover, even once the data were in the public domain on the TIGR database, Reach Through Licensing Agreements were imposed (Marshall, 1994). The latter, in eﬀect, restricted any future private appropriation of innovation using such data by patenting except under licence. Abandoning the patent route, ESTs were held behind ﬁrewalls and released by HGS for exclusive use to GSK. Following this development, similar commercial strategies around EST databases were developed by Incyte, basing their Lifeseq Gold on the licensed use of ESTs, this time on a non-exclusive basis, with deals worth $25 million to Pﬁzer and Johnson & Johnson, respectively. At this point of historical development, therefore, the prospect appeared to be one of extensive, almost exclusively private, EST databases, holding the keys to most human genes.22 Extensive markets for biological data were opening up, with several major players,23 in which large pharmaceutical companies were the primary clients. Even then, however, such a prospect depended absolutely on the identiﬁcation of genes through the development of public DNA sequence databases. Nonetheless, even during this phase, wholesale commercialization of genomic knowledge and the formation of extensive knowledge markets appeared quite unstable. These new forms of private appropriation and public availability applied to genomic data provoked a reaction from university-based science, now unable to avail itself of less-than-fully-public data under licence. The public–private divide had become sharper. In these circumstances, two main counter-currents were developing in the public domain. In the ﬁrst place, there were parallel developments of public EST databases, subject to scientiﬁc scrutiny, and increasingly generating new

Emergence of a new form of bio-knowledge

53

and more robust data. Already in 1994, a new EST database (dbEST) with distinctive protocols was being developed in GenBank, by the National Center for Biotechnology Information (NCBI)24 (Boguski et al., 1994; Boguski, 1995). The data were available on the World Wide Web, so published and distributed free to a global public. By 1996, this public resource was signiﬁcantly developed from a scientiﬁc standpoint with the creation of a non-redundant set of ESTs, and a new database, UniGene (Schuler et al., 1996). Over 16 000 genes were identiﬁed and approximately localized on a gene map, a number then estimated at a third of human protein-coding genes.25 The public database, however, received an enormous boost as a result of the intervention of a major pharmaceutical corporation, Merck, in a consortium with the Genome Sequencing Centre at Washington University (St Louis), Columbia University and the Lawrence Livermore National Laboratory. In 1995, Merck announced a programme to deposit EST sequences into Genbank on a massive scale: 300 000 ESTs were to be placed in the public domain, at a rate of 4000 per week. The result was that, within two years, half the entries into dbEST were from this source. This was widely interpreted as a commercially-driven competitive move between Merck and SKB (Marshall, 1994; Poste, 1995; Eisenberg, 2000a, 2000b).26 It has also been interpreted as a sign of the failure of government to recognize and adequately fund public domain genomic databases (Eisenberg and Nelson, 2002). But in the wider dynamic being explored here, the central issue is that, whatever the motivation of the actors or indeed the source of the funding, public knowledge resources were developing into new institutional forms, fundamentally shaping the context for knowledge market formation. In the circumstances of the time, it was reinforcing and expanding the new institutions of public biological databases such as Genbank. It was an intervention in the nature of public–private interdependency. The second major counter-current to the wholesale commercialization of EST databases was through the development of science itself, the dynamic of knowledge growth. This was partially already evident from the UniGene project, but also from the huge number of ESTs, growing at an apparently exponential rate. There were just too many diﬀerent keys to match any reasonable anticipation of numbers of genes. As attempts were made to produce non-redundant sets of keys in order even to determine the number of genes,27 it gradually became evident that ESTs were never going to be able to deliver the promised short-cut to identifying the total set of protein encoding human genes. The fragmentation of cDNA sequence, the problem of tissue sampling, and especially the unreliability of one end of the gene sequence,28 meant that ESTs lost a signiﬁcant part of their value, both as scientiﬁc, but also as commercial, instruments for gene discovery.

54

Public or private economies of knowledge?

The numbers of ‘singletons’, EST fragments with no overlaps with other ESTs, continued to increase with sampling, suggesting an inﬁnite number of keys to an inﬁnite number of genes (Zweiger, 2000). By the time they became patentable, they had a continuing importance but considerably less than their original promise. Above all, it should be borne in mind that the EST scramble for genes was taking place in the context of the ongoing, determined, public science developments of whole genome DNA sequencing, if by much slower methods. The limits of ESTs were progressively exposed by the long road to whole genome sequencing. At the same time, changes in patenting regulation progressively reached a conclusion that restricted the scope of a claim for an EST, to a point where its commercial value became signiﬁcantly less interesting. Instead of a patent to a key to a gene eﬀectively being a patent of the gene itself, the 1999 regulations meant that substantial utility was restricted to the function of the key as a key, and further that the key was shown to open a door to a useful result (Zuhn, 2001; Auth, 1997; Flattmann and Kaplan, 2001). Patenting an EST could no longer eﬀectively patent the gene itself, or indeed pre-empt others from patenting a relevant gene, as assumed by the original NIH strategy and claims. However, patenting of ESTs was still the subject of legal dispute, reaching the US Appeal court in 2005 that again rejected substantial utility on a claim advanced by Monsanto for some maize ESTs.29 The closing phase of the turbulent history of ESTs was marked in 1998 by its major protagonists, Venter at TIGR and Incyte, moving away from EST discovery, to apply the same automated, high-throughput techniques to the direct shotgun sequencing of DNA fragments. The end of the EST saga was only the opening chapter of the subsequent human genome race described above. Venter himself terminated TIGR’s arrangements with HGS and SKB, taking 110 000 ESTs, and placing them in Genbank’s dbEST, without strings. They were now fully public data. This symbolically conﬁrmed the formation of a new public resource of knowledge as the dominant feature of biological databases. The turbulent development of ESTs as a societal knowledge resource represents a key phase in a new form of knowledge being instituted, data as a public good. At the outset, quite the opposite seemed a distinct possibility: the private appropriation by formal property rights, held by a state institution. Once blocked, this in turn was followed by a phase where commercialization of bio-data in new knowledge markets, with licensed access to data held behind private corporation ﬁrewalls, also seemed a distinct possibility. The prospect once more was one of private appropriation of the vast majority of what were considered to be the crucial keys to human genes. But such developments were vulnerable to an equally dynamic development of new

Emergence of a new form of bio-knowledge

55

forms of public institution of knowledge, on the one hand, and changes in scientiﬁc understanding, on the other. The Merck intervention pioneered public–private consortia that have since become a widespread feature of the landscape.30 As we shall see, these are distinctive developments of new forms of private–public interdependency. The formation and expansion of genomic databases in turn exposed the weakness of the new knowledge markets, undermining the commercial value of ESTs. This was further reinforced by development of public science knowledge itself, with ESTs, although still scientiﬁcally and technically interesting, falling far short of their original promise. The public EST data resource in the end established unchallenged pre-eminence within the global bio-databases. The peculiarity and polyvalent character of ESTs as tools, genetic materials and portions of protein-encoding genes created challenges to existing patent law, especially on utility. This resulted in successive changes in the rules for formal appropriability, and the strengthening of the force of ‘substantial utility’ as a condition. Overall, therefore, what happened to ESTs demonstrates a dynamic interaction unfolding in time: there are no absolute beginnings or endings to this account. The EST history was of critical signiﬁcance for the development of new forms of public knowledge institution, on the one hand, and conditioned future genome races on the other.

DEVELOPING NORMS FOR PUBLIC GENOMIC DATA: THE BERMUDA RULES Towards the end of the turbulent phase of the EST story, and no doubt conditioned by it, a new set of public rules emerged that decisively conﬁrmed the distinctive institution of genomic data banks. As the previous discussion highlighted, time was of the essence as a form of appropriation for the rapidly expanding volume of genomic sequence data of all kinds (Joly and Mangematin, 1997). An initial meeting was held in Bermuda in February 1996, attended by the Wellcome Trust, the NIH National Centre for Genome Research, the US Department of Energy, the Human Genome Project of Japan, The German Human Genome Project, the UK Medical Research Council and the European Commission (Bentley, 1996). It set out new rules as a precondition for international collaboration between contributing laboratories to the human genome project. Most striking was the norm that all raw sequence data of more than one kilobyte (a relatively small jigsaw piece) should be deposited on a public database, such as GenBank, within 24 hours of assembly. The initial rules were subsequently revised in the following year and in 2000, to take account of some emergent implications of this radical departure from all existing norms of scientiﬁc publication (Marshall,

56

Public or private economies of knowledge?

2001). In some ways, it could be seen as equivalent to a requirement to publish results of laboratory experiments on a daily basis, prior to analysis and without reference to experimental hypotheses. In 1997, the norm was both relaxed a bit – limiting the requirement to sequences of greater than two kilobytes – and tightened. In particular, in March 2007, it was recognized that raw, unannotated and ‘unﬁnished’ sequence data should be ﬁled separately from ‘ﬁnished’ sequences. Standards were imposed on the quality of the data, the reporting of experimental procedures and statistical reliability tests (phred and phrap). An acceptable error rate was quantiﬁed. In 2000, the National Human Genome Research Institute conﬁrmed the signiﬁcance of ‘making public’ raw data by allowing its use by scientists other than the original data-producers for any publication or analysis other than the publication of a complete genome relevant to the sequence (http://www.genome.gov/10000910). In so doing (as will be discussed in the next chapter) the institution of raw data publication was considered to be essential in expanding the pool of sequence data available for comparison and similarity searching by the whole community of genomic science. The quality norms, error rates and deﬁnitions of ﬁnished and unﬁnished, raw and annotated data were necessary precisely because of the novelty of this institution of public knowledge, especially the absence of peer review, and the separation of data from analysis. The data had to stand up publicly as data. Scientiﬁc journals had to change their rules and accept papers for publication at least in part reliant on data that were already in the public domain, whether or not produced by the original data-producers. Adams and Venter, by this time in TIGR and engaged in EST and whole genome sequencing, protested against the breach in scientiﬁc norms (Adams and Venter, 1996). They argued that premature data release might compromise future scientiﬁc journal publication, breaking the established embargo on data release prior to paper publication. More strongly, they also argued that raw data publication breached scientiﬁc norms on evaluation of data quality by peer review, where data quality was part and parcel of an estimation of its scientiﬁc quality and signiﬁcance. It is indeed signiﬁcant that the Bermuda Rules were as much, if not primarily, addressed to public domain concerns about scientiﬁc practice and establishing ground rules for international collaboration, as they were to preempting private appropriation by patenting or secrecy. As the 1996 inititial statement insisted: It was agreed that these principles should apply for all human sequence generated by large-scale sequencing centres, funded for the public good, in order to prevent such centres establishing a privileged position in the exploitation and control of human sequence information (http://www.ornl.gov/sci/techresources/ Human-Genome/research, our emphasis).

Emergence of a new form of bio-knowledge

57

Nonetheless, the Bermuda rules are a striking historical instance of new institutional modes of appropriation and distribution emerging almost as an outcome of institutional experimentation by the community engaged in data production.

WHO PICKS THE ‘LOW-LYING FRUIT’? THE PUBLIC APPROPRIATION OF SNPS In many respects, it might be expected that single nucleotide polymorphisms are a type of biological data even more attractive and suited to private appropriation than ESTs, either by patent or by technical secrecy. After all, as the name suggests, SNPs are inherited (that is, conserved) genetic mutations of a single nucleotide base (allele) associated with individual phenotypic diﬀerences, amongst which proclivity to a disease is especially signiﬁcant. Indeed, in some cases, particular inherited genetic disorders are directly related to such a singular base-pair variation in genetic codes.31 Identiﬁcation of SNPs provides a basis for personalized medicine and potential targets for pharmaceutical therapy, and hence also of primary interest to commercial therapeutic enterprises, pharmaceutical or other. At the time, SNPs were widely deemed to be ‘low lying fruit’ and SNP databases, unlike much genomic data, were seen as a resource close to commercial exploitation. SNPs came into the scientiﬁc foreground out of the EST background and context. Initially, from 1998, techniques of identifying SNPs and of associating SNPs with genetic disorders took the established institutional routes of publication of scientiﬁc papers and patenting discoveries (Coronini et al., 2003). By far the largest percentage of patents, however, were not for the biological data as such (the sequence code and location of the SNP itself) but for techniques of identiﬁcation, sequencing and arraying (55 per cent of all patents referring to SNPs). Moreover, it appears that both papers publishing identiﬁcation of SNPs and patenting of SNPs rapidly reached a peak in 2000, before dropping quite rapidly (Coronini et al., 2003, Figure 2.1). For, in 2000, a major institutional innovation occurred, initiated by commercial companies, charitable NGOs and academia, taking the Merck EST intervention as a precedent, but now on a distinctive scale, and with a new form of public appropriation, hence the interest for the current discussion. A non-proﬁt entity, The SNP Consortium Ltd was formed to govern a project to place SNPs in the public domain (Marshall, 1999). The research was undertaken by ﬁve of the global pre-eminent public science genome facilities: the Whitehead Institute for Biomedical Research, Washington

58

Public or private economies of knowledge?

University School of Medicine (St Louis), the Wellcome Trust Sanger Centre, Stanford Human Genome Center and Cold Spring Harbor Laboratory. The major contributing life science and pharmaceutical companies were Astra Zeneca, Bayer, Bristol–Mayer Squib, Hoﬀman–La Roche, GSK, Hoechst, Novartis, Pﬁzer and Searle. By November 2001, 1.8 million SNPs had been released into the public domain, and new web interfaces to expand and enable wider and easier ‘consumption’ were developed, with new search software packages (Thorisson and Stein, 2003). In contrast with the genome databases, the primary anticipated use was signalled in the dedication of the database to biomedical research, although it was clear that the SNP database was also open to other potential uses, notably the understanding of evolutionary processes of genetic variation. The institution of the consortium was innovative both as an organization and as a form of public appropriation, as well as being sustained by commercial and governmental ﬁnance. Departing from the newly established norms for depositing genomic data to the major public databases under the ‘Bermuda rules’, the Consortium opted for releasing data quarterly and under a recently developed patent law instrument, the Statutory Invention Registry (SIRs). By ﬁling claims for newly discovered SNPs under an SIR, despite the resultant delay in putting the information into the public domain, once there it was protected from being absorbed into commercial databases and from any future related patent claims. An SIR eﬀectively establishes prior art more robustly than scientiﬁc publication (Eisenberg, 2000a). In such a way, although commercially useful to all that might be interested, it enshrines the non-exclusivity of future uses of public domain knowledge. As a consequence, SNPs assumed a novel form of public property by extending non-exclusivity to rights of use. Rather than just a pre-competitive strategy, it established a non-competitive level playing ﬁeld in relation to data under an SIR, given that there was no future intention to patent by the SNP Consortium. Moreover, by establishing public SNP databases, the Consortium institutionalized a polyvalence of use: the data was made available to a scientiﬁc community pursuing a variety of interests and a corporate world engaged in developing pharmaceutical and genomic therapies. However, The SNP Consortium itself had a deﬁned life and, once it had essentially determined the form of public appropriation, and prevented large-scale private appropriation either by secrecy or by formal IPR, it then dissolved itself. After securing 1.8 million SNPs as public property by 2000, it then handed over the data to Genbank, and the related global bio-databases. Clearly, ‘making public’ was an irreversible process, but the knowledge torch was then passed on, and the Consortium database was complemented by fully public realm databases, notably the dbSNP located

Emergence of a new form of bio-knowledge

59

at the National Center for Biotechnology Information, and the Human Genome Project. Furthermore, dedicated public realm software tools (such as SNPEntrez) for searching across these diﬀerent SNP databases eﬀectively integrates them for the user, so blurring in distribution and use the diﬀerent institutional and resource bases of the producing and warehousing organizations. In terms of our analysis of knowledge ‘consumption’, it is somewhat ironic to see that the NCBI boasts a site for ‘one-stop SNP shopping’ that facilitates scientiﬁc research in a variety of ﬁelds, ranging from population genetics and evolutionary biology to large-scale disease and drug association studies. Yet, despite the comparative advantage context of public domain SNP databases, examples abound of companies in specialized SNP knowledge markets, often operating in collaboration with public science institutions.32 Although dominant, the new public forms of SNP bio-databases co-exist with, and indeed are supplemented by, a private domain protected by technical secrecy, patents and licensing. As with ESTs, the creation of dbSNP databases, and their establishment within the main bio-database warehouses (Genbank, European Bioinformatics Institute and the DNA Databank of Japan), involved new norms and standards for data deposition, quality checking and formatting, including location of SNPs within genomic sequences (Kitts and Sherry, 2002/2006). Moreover, as scientiﬁc knowledge developed, the original ‘lowlying fruit’ view of SNPs, as a short-cut to associating nucleotide sequence data with disease, became more sophisticated. Firstly, in August 2001, The SNP Consortium itself commercially contracted from Celera the development of an SNP-linkage database map for subsequent depositing in their public database. An SNP linkage map provides information on the ways in which ordered and chromosome-positionally related SNPs are co-inherited from generation to generation. This reﬂected the developing view that multiple rather than single SNPs needed to be considered, along with processes of conservation and inheritance. As part of this strategy, the Consortium engaged Rutgers University to analyse the Celera-produced SNP-linkage database, in their Laboratory of Computational Genetics. Secondly, as the nearly 3 million SNPs represented in dbSNP suggest,33 the vast number of SNPs involved data of variable quality and, as with ESTs, the need was realized for a non-redundant set of higher-quality, more annotated and curated data than the main dbSNP database. Consequently, a new database was generated, the Human Genetic Variation Database (HGVbase) from 1999, which developed a rapidly expanding resource over the succeeding years, with a unique identiﬁer for each non-redundant SNP (Gudmundur and Stein, 2003). HGVbase directly emanated from the original consortium, with the Karolinska Institute (Sweden) collaborating with the EBI,

60

Public or private economies of knowledge?

and sponsored by GlaxoSmithKline. This is evidence of evolving public institutions of knowledge production, appropriation and distribution for polyvalent use.

FROM NATIONAL TO GLOBAL SCALES OF PUBLIC DATA: TRAJECTORIES OF PROTEIN DATABASES So far we have considered the institutionalization of genomic databases. Historically, however, the precursor to all sequence databases was undoubtedly Margaret Dayhoﬀ’s protein database, manually constructed and published in book form as the Atlas of Protein Sequence and Structure. The ﬁrst volume was published in 1964, as a public document, by the National Biomedical Research Foundation (Smith, 1990; Attwood and Parry-Smith, 1999; Bairoch, 2000; Moody, 2004). The eventual emergence in 2004, of UniProt, a global integrated protein database, represents many of the elements of diﬀerentiation and independence between public and private forms of knowledge and institution already discussed, but brings to the foreground some new dimensions. The history is one of the intertwining of modes of ﬁnancing, public and private ownership, technologies and public and private rights of access, the separation of data from software and, especially, issues of geographic scale, both of markets and also eventually of the public database. Dayhoﬀ’s protein atlas was the foundation stone of what became the main US-based protein database, the Protein Information Resource (PIR), maintained and developed by PIR-International from 1988. This was a consortium of public science institutions, including the NBRF, but now adding important international links with the International Protein Information Database of Japan (JIPID) and the Martinsreid Institute for Protein Sequences (MIPS). PIR was therefore a relatively stable, publicly funded and publicly institutionalized database although, as we shall see, it subsequently developed some interesting forms of private collaborations with a major US corporation. The emergence of a major European protein database, SWISSPROT, and its development, was much more institutionally ﬂuid, uncertain and almost chaotic (Bairoch, 2000). Amos Bairoch, a Swiss bioinformatician, developed a software program for sequence comparison and searching, comprising both genomic and protein sequences in 1984, as a PhD project. The protein sequence data, given the technologies of the day, were manually typed in from Dayhoﬀ’s Protein Atlas, although 1984 was also the ﬁrst year that data from PIR was distributed by computer tape. The software package, initially called COMPSEQ, attracted the interest of a small

Emergence of a new form of bio-knowledge

61

Genevan bioinformatics ﬁrm, Genoﬁt, which distributed it commercially, together with the protein sequence database. In the USA, Intelligenetics, a Stanford University spin-out, expanded the market for this software– protein database product, by distributing it there, and, in negotiating the deal with Genoﬁt and Bairoch, its name was changed to PC/Gene. The aim was to make the data device independent, accessible and manipulable on most desk-top PCs. The package was continuously updated, so that, by 1988, it had expanded to 53 1.2Mb ﬂoppy disks. Although PC/Gene continued to be sold as a software package, used by 2000 laboratories in 45 countries by 1996,34 a critical change took place in 1986 when the database was separated from its supporting software. Bairoch was both dissatisﬁed with the quality of the PIR data (it lacked signiﬁcant annotations and additional proteomic data) and wanted to make it more amenable to software that enabled linkages between nucleic and protein sequence data. Consequently, he changed the data into the EMBL format for sequences and, with the support of EMBL in 1986, created a new, now entirely public, and freely accessible protein database, SWISSPROT. With a team of annotators, this rapidly became the world premier protein database, in terms of quality and richness of data, and its utility in similarity searching across nucleic and proteomic domains. So, at the same time as the software and biological database were separated, the one remained commercial, while the other became ﬁnancially supported principally by public resources, and made publicly and freely available, including by Internet access, in 1993. Moreover, progressively from 1986 onwards, SWISSPROT was supported by several freely distributed software packages through EMBnet. Although royalties from the sale of PC/Gene and additional commercial support from GSK sustained additional staﬀ in annotation and software development, SWISSPROT was becoming more diﬀerentiated as a public economy of knowledge during this phase, in terms of resourcing, production and distribution. However, in 1996, just before its tenth birthday, SWISSPROT experienced a severe funding crisis which jeopardized its continuation. In terms of resource constraints, a global public research resource was proving beyond the scale of a national research budget. Renewed Swiss public funding could only be authorized if the operation received signiﬁcant supplementary European Union funding. This was not forthcoming through a combination of national politics of funding, bureaucratic confusion and misinterpretation of what renewed funding was required for. Firstly, there was an immediate issue of scale. Not being a member of the European Union, the Swiss government refused funding for European projects and, conversely, European funding could not be diverted to support a Swiss scientiﬁc institution.

62

Public or private economies of knowledge?

Secondly, the biological database presented and continues to present challenges to old conceptions of the public resourcing of science. Is maintenance and development of a database scientiﬁc research? Is it for the production of new knowledge, or perhaps for commercial exploitation? The novelty of the bio-database institution has no doubt been an obstacle to its public resourcing, nationally and internationally. In the absence of clear public support, two alternatives initially emerged: either the database could be passed to private ﬁrms and established as proprietary databases,35 or a non-proﬁt association could be formed, with costs recouped through subscriptions from all users. Both alternatives would have involved signiﬁcant transformations of production, appropriation and distribution. The eventual outcome was a funding model based on a combination of resource ﬂows (similar to the Ramsey cost-recovering pricing model discussed by David, 2001c). Some EU and Swiss public money was made available, but not enough to cover the increasing costs of maintaining quality and keeping pace with the burgeoning data emanating from the scientiﬁc community. To meet this shortfall, a two-tier access arrangement was set up, whereby the database would continue to be freely available to academics, but industrial users would be required to pay an annual subscription: a partial privatization of the database. A new company, Geneva Bioinformatics (GeneBio) was set up and mandated to manage the relationships with industrial users. In eﬀect, commercial subscriptions were partially cross-subsidizing public data production and distribution. A comparison between SWISSPROT and the proprietary Incyte and Celera databases of the same epoch reveals diﬀerences in processes of appropriation and distribution, and hence diﬀerent forms of both private ‘ownership’ and public resource. The Incyte and Celera databases were wholly sustained by annual subscriptions and access to the data was controlled by technical means (passwords and ﬁrewalls), so exclusive to licencefee payers. The SWISSPROT model was partially sustained by a quasi-market transaction on the basis of annual subscriptions from industrial users, but without any technical controls to access. GeneBio did not ‘own’ the data by establishing controls over access. Companies were trusted to make payments on arrangement with GeneBio. Therefore, there were no formally instituted means for policing and monitoring access to the database, although any very obvious misuse, if detected, would be acted on. To further strengthen the IP position, the future versions of SWISSPROT would be subject to copyright, which could be mobilized for legal proceedings in the event of any unauthorized use. One of the key reasons for setting up access to SWISSPROT in this comparatively more ‘open’ way concerned the manner in which the database content had been traditionally reproduced and redistributed for extensive use by a variety of for-proﬁt and

Emergence of a new form of bio-knowledge

63

non-proﬁt third party organizations, as a public resource. Under the new regime a set of guidelines were established relating to the manner in which third party organizations could continue to do this, largely in order that they could do so. To illustrate, if a bioinformatics ﬁrm wished to distribute SWISSPROT in CD-ROM form (to paying users of other services, for example), it would be required to provide GeneBio with a list of those that were for-proﬁt organizations. In addition, it would be required to meet certain standards concerning the format in which the data was redistributed. This multiple form of public and private access and exchange, combined with an institutional openness to diversities of subsequent use for knowledge production, commercial and non-commercial use, displays a polyvalence typical of databases, but conﬁgured in a particular way as a consequence of resource constraints. The two-tiered model of mixed public–private funding and open, but regulated access mitigated the funding crisis, but in fact only lasted for several years before a new economy of knowledge was formed around SWISSPROT. In what amounts to an eﬀective ‘renationalisation’, SWISSPROT has now been placed back in the public domain, in its entirety. This move has been orchestrated by a USA public science policy, with associated funds and a major grant from the United States National Institute of Health (NIH). From 2002, SWISSPROT became part of the UniProt consortium, the change in name symbolizing the outcome of a unique form of ‘merger and acquisition’. In an ironic twist, given the origins of SWISSPROT in its early COMPSEQ days involving the commercialization of PIR data, UniProt was the outcome of a grand merger (Apweiler et al., 2004; Bairoch et al., 2005). When UniProt was created in 2002 it in eﬀect integrated the four major international protein databases, to become the single, global-scale, public resource. Notably it brought together PIR, MIPS, JIPID, SWISSPROT and TrEMBL,36 an automatically annotated protein sequence database managed by the EBI. This illustrates neatly the formation of a new scale of the ‘public’ in public resources, a transition from several international resources to a single global knowledge resource. SWISSPROT as a key component of this new global resource had traced a history of commercialization of bio-data, to a predominantly public economy of knowledge, a subsequent hybrid market–public institutional form at a regional geopolitical level, to a new form of public institution at a global political level. Shifts in the public form of economy corresponded to shifts in the scale of the economy, whether market or public, that supported its existence. The ‘acquisition’ aspect of the deal, however, has involved a certain aspect of ‘nationalization’ as against ‘universalization’. The US biotechnology and pharmaceutical industry had had PIR as its doorstep public resource, but had naturally been obliged, along with all other commercial

64

Public or private economies of knowledge?

users, to license data commercially from SWISSPROT. DuPont had established a close collaborative relationship with PIR, to the extent that it was developing collaboratively a protein classiﬁcation database, using complementary assets, with Dayhoﬀ and Wu’s biological annotation expertise, and DuPont’s compute power and bioinformatics. The collaboration itself was a distinctive form of privatization, with the output of their co-production being retained behind a shared ﬁrewall, shared, that is, between a private corporation and a public science institution. However, for DuPont, whose senior bioinformatician was also on the advisory board of SWISSPROT, the overall optimal solution was viewed as the creation of an integrated public resource.37 So, along with NIH, DuPont was partly instrumental in overcoming resistances in PIR to a merger with SWISSPROT, in the creation of the UniProt consortium. Granted the central importance of the SWISSPROT database as the highest quality component of this venture, the move by the US funding body could be seen as representing a strategic policy of removing the accessibility constraints to the database, so that it can be freely used (in both senses) by US ﬁrms in their own R&D activities. The creation of a globally united protein database was thus equally an enhancement of a public realm resource for US corporations, strengthening, it could be argued, their global competitive position. However, and not in contradiction with this, UniProt can be seen as a scientiﬁc project, and an enhancement of scientiﬁc knowledge in its own right, by virtue of integrating the diverse and institutionally separate forms of data. Nonetheless, enhancement of a global public resource of scientiﬁc knowledge and its shift to the US and NIH funding is seen by the European Bioinformatics Institute as a loss of potential control and ﬂow of resources to European protein databases. The failure of the European Union to deliver on resourcing, ﬁrst, SWISSPROT and then its potential successors, illustrates the politics of the political economy of public realm science. In this light, the formation of UniProt can be seen in part as an outcome of a public-topublic competition between Europe and the USA, and in part the consequence of the political failure within Europe to overcome obstacles to create a fully public economy of knowledge for its database. Unlike the other examples, the historical trajectory analysed here for protein databases is primarily one of emergent public institutions, and the creation of new scales. PIR had always been public, and from an early stage had been strongly internationalized. Yet, for decades, there was lack of integration of protein databases on the global scale, equivalent to GenBank, DDJB or EMBLbank for genomic sequence data. From Margaret Dayhoﬀ’s Protein Atlas in the late 1960s to the formation of UniProt in 2002, protein databases were developing along overlapping and disparate trajectories, used by a global community both commercial and academic. The ﬁnal integration

Emergence of a new form of bio-knowledge

65

and creation of a global-scale public resource was the outcome of a complex negotiation, and then considerable further development of new norms for production, deposition and distribution of data. SWISSPROT, despite its early ﬂirtations with market forms, as soon as a separation between software and data distribution had been consolidated, involved public knowledge institution building, even when reliant on industrial subscriptions. Although of great commercial use, protein databases never passed through a phase where wholesale commercialization and emergence of markets for protein sequence data seemed likely to become the dominant institutional form.38 Nonetheless, the tortuous history, especially of SWISSPROT, demonstrates more clearly that ‘making knowledge public’, or forming public economies of knowledge, is a continuous and ongoing process. One of the critical aspects of this, as with genomic sequence data, has been the evolution of quality norms, and the structuring and standardization of data. So, for example, as high-throughput microarray data from proteomic experiments expanded rapidly, the need to establish protocols, interoperable software and standards for annotation, became pressing. As a consequence the Microarray Gene Expression Data Society, an international collaborative organization, was established to develop new institutional norms for making such knowledge public (Brazma et al., 2000; Brazma and Parkinson, 2004; http://www.mged.org). Likewise, the protein structure database, the Macromolecular Structure Database, part of the worldwide Protein DataBank (wwPDB), exempliﬁes a continuing process of quality norm development as an integral part of making integrated data accessible and utilizable to various communities of experts, the new and emergent public of users (Tagari et al., 2006). In all these respects, the dynamic of public knowledge institutions is one of the most signiﬁcant features of protein databases.

CONCLUSION: EMERGENT PUBLIC ECONOMIES OF BIO-DATA There are two main conclusions that can be drawn from a review of these key examples of new epistemic institutions, the proliferating bio-databases. The ﬁrst could be stated as a proposition that almost inverts conventional wisdom. Markets for bio-knowledge develop in the context of governmental failure to institute public economies of knowledge. Rather than governments intervening when markets fail, it is almost the other way round. The second conclusion is that the growth of knowledge itself involves the development of new modes of making knowledge public. The public economies of knowledge have an ongoing dynamic in terms of norms constituting

66

Public or private economies of knowledge?

valid public knowledge, protocols for instituting norms, as well as new processes of public production of knowledge such as the pooling and integration of data, so creating searchability and comparability of data. We argue that the control over standards by the scientiﬁc community is a key feature of public appropriation. New institutions of control developed in response to the emergence of new epistemic forms, the bio-databases. New modes of distribution, establishing rights of access, facilitating ease of use to broaden the public of use, establishing common and, within a community, shared tools to search data all contribute to this dynamic aspect of continuously remaking what is public about public knowledge. The ﬁrst of these conclusions is perhaps the more controversial, the second the more overlooked. But, in a broader perspective, both conclusions reﬂect the interdependency and asymmetries between public and private processes of production, distribution, appropriation and use of knowledge, in a time of turbulent growth. Our case studies, limited in this chapter to one newly emergent form of knowledge institution, the biodatabase, demonstrated that markets for bio-data could only be established in the context of public knowledge accumulation, in the ﬁrst place, but also radically transformed conditions for production and distribution of public knowledge. Choosing ESTs and SNPs as key examples of genomic sequence data, given that these appeared to be the most market-proximate knowledge products, enabled us to draw the strong conclusion that the dynamic of public database formation, after much turbulence, restricted the scope for markets for EST or SNP data. We could have chosen the perhaps better known examples, of Incyte or Celera, companies that in the 1990s built an enterprise model around trading basic genomic sequence data drawn from public databases, but with value of dedicated software and enhanced in-house genomic data. By the beginning of the new century, repeating the story of our case studies, however, both these companies had collapsed these markets, shifting their business strategy to drug discovery in much closer alliance with pharmaceutical companies. In the competition and rivalry that drove developments of both new public and market forms during this period, there was a clear and distinctively epistemic dimension – not a given, but emergent property of knowledge accumulation – that was also signiﬁcant. Databases become a more epistemically valuable societal asset by virtue of being open to include all available comparable data, so extending possibilities of comparison, and by virtue of new and standardized quality norms replacing those of peer review appropriate for other forms of knowledge production. By establishing a global community of producers of data, constitutive of public production, the institution of an all-inclusive bio-database of a particular category gains a competitive advantage over partial databases. The more eﬀective this process of inclusive

Emergence of a new form of bio-knowledge

67

bio-database growth, the less the scope for partial databases traded by commercial enterprises. Likewise with shared quality norms, and the robustness of data: as and only on condition that these are developed within a shared and critical community, the less brand reputation and enterprise-internal quality standards are capable of establishing a distinctive market for data superior in quality to public data. In short, provided that the public domain actually does develop all-inclusiveness and superior quality norms of databases, private domain data markets become restricted in scope to the margins of providing niches in areas unsupported by public resources. But dynamic growth of public institutions is not a story of growth at the expense of markets, and this is not the implication stemming from our ﬁrst conclusion. Quite the contrary. Whether from the actions of major pharmaceuticals like Merck in contributing to public databases, or from the formation of consortia of both public and private institutions to accelerate the public growth of databases, public data resources are as much an advantage to commercial users of such knowledge as for public growth of science. Put bluntly, if quality and scope of databases are enhanced by public processes of production and distribution, it is to the advantage of all potential commercial users of such data that it should be non-exclusive, rather than to gain market advantage by exclusive access to data that is of inferior quality by virtue of balkanization or less robust proving and testing. The example of ESTs is salutary. The very growth of knowledge through public experimentation and data pooling eventually subverted the commercial dreams of exclusive possession of keys to genes. But illusory advantages of exclusive rights to ESTs were then replaced by general non-exclusive beneﬁts, commercial and scientiﬁc, of a more comprehensive understanding of genes and genomes that emerged during this period. This chapter has been about bio-databases, and the turbulence arising from this particular change in epistemic practices and institutions. The conclusions just drawn are about the speciﬁcs of these developments. We have argued that the public forms of public databases are made, not given, and the case studies amply demonstrated that there was no inevitable logic, no one set pattern, and indeed no ﬁnal conclusion to this process. In the next chapter we will consider bio-software, a quite diﬀerent domain of epistemic practice and institution, and that will further underscore the point that there is no essential script to knowledge as such that encodes for such outcomes.

NOTES 1.

This is not to say, as will be discussed later, that corporations, notably large pharmaceutical or biotechnology companies, do not, in addition, possess signiﬁcant private databases

68

Public or private economies of knowledge?

behind their ﬁrewalls. For example, as will be seen in a later chapter, we can reasonably hazard that alongside the public genome of the laboratory strain of Agrobacterium tumefaciens, Monsanto also disposes of the genomes of the strains it uses for genetic modiﬁcation, especially the genetic variation involved in virulence and transfer of genetic material from pathogen to host. 2. The latest generation of high-throughput sequencers – destined to revolutionize again the process of genome sequencing – has prompted fresh tensions and interactions between private hardware and software producers and public software protocols. This centrally concerns the quality of data generated by the new machines, and their compatibility with public databases (O’Rourke, 2006). 3. The most signiﬁcant were Maxam and Gilbert’s and Sanger’s invention of sequence reactions in 1977; the technology for the mass-production fragments of DNA sequence, the ‘raw material’ for sequencing (Smith et al., 1986; Hood et al., 1987), through polymerase chain reaction (PCR) (notably by Mullis at Cetus Corporation, Mullis and Falcona, 1987); and microarray chips for protein expression data (Moore, 2000). PCR became a ubiquitously used technology, and ironically encountered deviations to both being made public and private. In spite of emanating from a private corporation, PCR was never patented as a generic technology, although later novel uses were (Zweiger, 2000); and, on discovery, its publication was refused by both Science and Nature, in 1984 (Cook-Deegan, 1994, p. 72), before eventually meriting a Nobel Prize in 1993. 4. For example, Yeast Artiﬁcial Clones and Bacterial Artiﬁcial Clones. 5. The ABI Prism 377, ALFexpress originating from EMBL (Ansorge et al., 1987) but developed by Amersham Pharmacia, and the largely ineﬀective DuPont Genesis 2000 (Hunkapiller et al., 1991; Sulston and Ferry, 2003). 6. It took 14 ABI sequencers, eight technicians and three months to produce 23 304 sequence fragments (Moody, 2004). 7. With the new generation ABI 3700, it now took only 12 hours of computer time for the assembly of the genome, with 10x coverage of the 1.8 million base pairs. 8. Laboratories often complemented them with the Amersham machines. Within ten years of its creation, ABI had sold over 3000 sequencers to public and commercial organizations, at approximately $110 000 per machine. 9. In spite of huge diﬀerences in quality and in the size of the DNA sampling of the Celera and HGP rough drafts, an agreement brokered by Clinton and Blair, in June 2000, settled on an arbitrary level of what counted as a ﬁnished rough draft and a date for its public deposition in 2001. The HGP went on to complete the genome, with a total mapping, in 2003. Celera never did go beyond the rough draft. 10. As Sulston later wrote: ‘I could not accept that we should be dependent on a commercial company for the handling and assembly of the data we were producing. The company even had ambitions to take control of the analysis of the sequence, which was ridiculous’ (Sulston and Ferry, 2003, pp. 93–4). 11. ‘We had complete freedom to design our own display and assembly systems. It transformed our productivity’ (ibid., p. 94). 12. Chad Nusbaum: ‘I think for any kind of quality scores to have the conﬁdence of the community, they have to be an academic enterprise’; Phil Green: ‘I think the research community should have the opportunity to develop quality measures and, to do that, they need to get access to the raw data’ (quoted in O’Rourke, 2006). 13. Even this notion of completion of a genome for a given species in terms of DNA sequence is potentially misleading, given intraspecies genetic variation. One of the public science reservations about Craig Venter’s Celera human genome project is that the sample DNA was taken from just ﬁve individuals, of whom he was one. The HGP used DNA from several thousand. But, given signiﬁcant genetic variation between diﬀerent human populations and the importance of such variation for evolution, it could be argued, as Ayala did in 1986, that completion of the human genome, even for DNA sequence, should be of the whole human population, the pool of human genomes, not the human individual (Zweiger, 2000).

Emergence of a new form of bio-knowledge 14.

15. 16.

17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

27. 28. 29.

30.

69

Although not a subject to be developed here, it is worth noting that patent law embraces some quite naïve epistemological assumptions that add to this confusion. The law attempts to distinguish between patenting naturally occurring objects or phenomena and patenting artefacts created by humans. This dichotomy, however, is a false one, given that the ﬁrst term is really patenting knowledge of naturally occurring objects or phenomena. The issue is all the more complex when one considers that much knowledge is not acquired by observing phenomena as they naturally occur without human intervention. Experimentation, or any instrumental intermediation, necessarily entails some human transformation, hence knowledge of artefactual phenomena, as a means of understanding a natural reality (e.g. a complete genome DNA sequence) existing independently of human interactions with it, even in the realist epistemological perspective accepted here. As we shall see, this becomes a signiﬁcant consideration, particularly with ESTs as experimental artefacts. The ﬁrst set of 600 ESTs were ﬁled for patent in June 1991, followed by a continuation in part for a further 2735 sequences in February 1992, and a new batch of 4448 in September of that year. In these early stages, this controversy was partly manifest in ‘guestimates’ of the total numbers of genes in the human genome, ranging from 30 000 to over 100 000. The genome ‘completed’ in 2003 identiﬁed a number close to 30 000, surprising many, but strongly conﬁrming the importance of gene organization and regulation, rather than the more linear view of sheer quantity of genes, as underpinning complexity. More narrative historical accounts are provided by Cook-Deegan, Sulston and Ferry, Zweiger and Moody. In the midst of the furore, these utilities were all claimed by Venter and Adler (Roberts, 1992). A future patent claim on the gene might fail the test of novelty and prior art. See Chapter 6 below. HGS was a new company, subsidiary of Healthcare Investment Corporation, then the largest health care venture capital company in the world. By 1994, HGS were claiming to have identiﬁed 35 000 unique ESTs for human genomes, extravagantly asserting that they were in reach of obtaining rights over 80 per cent of human genes (Marshall, 1994). Apart from HGS and Incyte, these included Genset in France, Millennium Pharmaceuticals, Hyseq and AXYS Pharmaceuticals. The NCBI itself must be noted as a signiﬁcant development in the public institutionalization of the emergent science of bioinformatics. By 1998, the ﬁgure had reached 43 000. But similar attempts with the TIGR EST database yielded 73 000, so beginning to indicate the inner weakness of an exclusively EST route to gene identiﬁcation (Zweiger, 2000). Strangely, in view of the appropriation strategies of all the main commercial EST database companies at the time, Merck was characterized as, even attacked for, pre-empting the patenting of ESTs by placing them in the public domain. Given that patenting was eﬀectively blocked at that time, and only eventually facilitated on a limited basis in 1998, this is quite implausible. What it might well have done is to undermine the market for commercial licensing of exclusive access to databases behind company ﬁrewalls. For example Fields et al. (1994). Genes are identiﬁed by exons, the protein encoding section of DNA sequence, with a 3’ and 5’ end. The 5’ ends were frequently incomplete in complementary DNA libraries, resulting in false similarities between ESTs. In re Fisher eﬀectively conﬁrmed and strengthened the judgment of Brenner v Manson requiring the demonstration of substantial utility and enablement, deeming that the claim left considerable research yet to be done to demonstrate utility eﬀectively (Zuhn, 2005). The SNP consortium is the best known, but there are many others, such as the Wheat Genome Consortium, the International E-coli consortium, the Structural Proteomics Consortium, and so on.

70 31.

32.

33. 34.

35. 36.

37.

38.

Public or private economies of knowledge? Some caution should be added to this statement, however, because association between disease and an SNP may be statistical, and inheriting a particular SNP may be a necessary but not suﬃcient condition for expression of a given disorder. In a systems biology perspective, many other conditions, including environmental and cellular factors, as well as gene regulatory, or multiple gene complex issues, need to be taken into account. SNPs are not necessarily in themselves causes of a disorder, and hence SNP diagnosis is not necessarily the magic formula. Perlegen, for example, is collaborating in a project for SNPs related to breast cancer. ParAllele, a spin-out from Stanford, now acquired by Aﬀymetrix, has a commercial project concerning SNPs related to bipolar disorder. Myriad is probably the biggest commercial player for this kind of market. This was the ﬁgure in July 2006, from the most recent data release from the NCBI. Genoﬁt was declared bankrupt in 1991, and Intelligenetics became its sole distributor up to 1996. For the signiﬁcant role played by Intelligenetics in supporting bioinformatic markets, and in joint ventures with the public sector, see Chapter 3 below, Zweiger (2000) and Moody (2004). This privatization was a real and major concern, particularly for US commercial clients, and was a concern that in part stimulated the developments discussed below. TrEMBL stands for translated from EMBL. It was created by the EBI to alleviate the problem that SWISSPROT had in keeping pace with manually annotating all available protein sequences. TrEMBL is a database of translated sequences from the EMBL nucleotide sequence database, using bioinformatic annotation techniques. This did not preclude the continued privatizing collaborations between Georgetown and DuPont, and, indeed, the transfer of knowledge from the public realm into a shared public–private private realm exempliﬁes diﬀerent and emergent new forms of appropriation, as well as interdependency between ‘public’ and ‘private’. In spite of the dominance of public eﬀorts, there have been examples of commercial protein databases, notably the Protein Atlas developed by Conﬁrmant, a joint venture between Marconi and Oxford GlycoScience.

3. Tools of the trade – trade of the tools INTRODUCTION If genomic, proteomic, SNP or EST databases are the most prominent achievements of the current biological revolution, then bioinformatic tools are the unsung heroes, for without them there would be no datasets and no way to make sense of them. The use of bioinformatic tools to analyse the strings of digital data comprising genomes, for example, has thrown into controversy the validity of the ‘Central Dogma’ and the deﬁnition of what a gene is. As the sequencing phase of the human genome projects drew to a close and attention turned to analysis, predictions of the number of human genes intensiﬁed, largely through the use of diﬀerent bioinformatic tools, exhibiting wide variation from under 30 000 to over 300 000. Injecting some levity into the debate, a betting pool for predictions, Genesweep, was set up at a meeting at Cold Spring Harbour Laboratory in 2000,1 but this only serves to show the limited understanding of what a gene is. As Attwood (2000) asks: ‘Is it a heritable unit corresponding to an observable phenotype? Or is it a packet of genetic information that encodes a protein or proteins? Or perhaps one that encodes RNA? Are genes genes if they are not expressed?’ (p. 471). The development and use of new bioinformatic tools are central to these and other controversies at the heart of biological science. As with biodatabases, the emergence of these new epistemic forms was destabilizing partly because they deﬁed established categorization, and did not ﬁt into existing norms of scientiﬁc activities, private or public. We cannot straightforwardly equate them to scientiﬁc instruments, for example, as if the tools were like telescopes, and bio-data is what is seen through them. The tools are often constitutive of the data, and at one and the same time, embody procedures for the analysis of data and their statistical evaluation. They are multifunctional, as well as being put to a variety of diﬀerent uses. In this chapter, we look at the diﬀerent economic arrangements that have been instituted for the production, distribution, exchange and use of bioinformatic tools. To do so, we take a number of diﬀerent cuts at the issues to draw out the key points. After a brief introduction and discussion of what 71

72

Public or private economies of knowledge?

bioinformatic tools are, the chapter is divided into two broad sections, the ﬁrst providing the broad historical context, and the second presenting detailed case studies of two of the most dominant bioinformatic software tools. We start by locating bioinformatic tools in the wider ﬁeld of molecular biology and genomics in order to show the nature of technical interdependencies between tools and other knowledge resources. This is particularly important because of the way that technical interdependencies inﬂuence the emerging organizational divisions of labour and the possibility for creating markets for tradable bioinformatic knowledge resources. In presenting some of the key functional attributes of bioinformatic tools, we draw attention to several occasions where new interdependencies resulted in frictions at the interface of public and private domains. To develop the themes in more detail, a short and schematic history of sequence similarity is presented, an area that has emerged as a dominant force in bioinformatic analysis. This trajectory of bioinformatic tool development was undertaken solely in academic laboratories, but nevertheless provides important insights into the multiple processes by which the knowledge resources were made public. This discussion is followed by a two-stage presentation of attempts by ﬁrms to develop proprietary bioinformatic software packages. In this overview of an emerging sector, we indicate the perpetual but changing dependencies of the new ﬁrms on public science activities and point to the variety of diﬀerent economic experiments that were undertaken in attempts to institute stable private economies of knowledge. Then we turn to the speciﬁc and detailed examination of two exemplary cases of bioinformatic tool development. In each case, we chart the emergence and destabilization of successive economies of knowledge, drawing attention to the manifest conﬂicts and tensions between public and private interests and showing quite clearly how the public and private characteristics of the bioinformatic tools evolved over time in ways that defy any static deﬁnitions. But ﬁrst, the basics: what are bioinformatic tools?

WHAT ARE BIOINFORMATIC TOOLS? We can deﬁne bioinformatic tools as technologies for biological research comprising an algorithm and its computational implementation in a software program. The distinction between these two parts is technically important and, as we shall see later, it can also prove particularly signiﬁcant in understanding certain dynamics and interdependencies across public and private domains. Algorithms involve a set of steps that deﬁne a computational process at an abstract level and they are implemented through the development of an appropriate computer software program (Attwood and

Tools of the trade – trade of the tools

73

Parry Smith, 1999). There can be many diﬀerent ways to implement computationally a particular algorithm and this is one important feature of the evolution of bioinformatic tools. So the development of tools rests on a number of knowledge ﬁelds, including information theory, novel mathematics, statistics and computation, on the one hand, and biological theory, on the other. Furthermore, in terms of how bioinformatic tools are used, we can usefully distinguish between those that contribute to the production and management of biological data and those that are involved with analysing data to produce biologically signiﬁcant knowledge. In many cases the fundamental algorithms on which they are based are the same, but they are of polyvalent use within the broader bioinformatic knowledge production process. On the data production side of the process, bioinformatic tools played an important part in automating a number of key tasks initially undertaken by skilled scientists. In the previous chapter we saw how sequencing capability increased dramatically in a relatively short period of time and the way that the introduction of high throughput technologies underpinned the opportunity to undertake large-scale sequencing projects. We can now assess the role of bioinformatic tools in this. In the early era of nucleotide sequencing, the ‘trace’ output from sequencing equipment would be read manually as representing an A, C, G or T by trained scientists. Base calling, as this process was known, emerged as a major rate limiting step in the quest for high throughput capacity. The solution was to automate the step using software and, similarly, to ﬁnd a way to assemble computationally the output fragments to produce contiguous DNA sequences and ultimately entire genomes. It is obvious that high throughput sequencing technologies would be useless without a way to automate the tasks of base calling and fragment assembly, but it was less obvious from where, in the overall economy of bioinformatic knowledge, the provision of this functionality would come. As we saw in the previous chapter, the sequencing equipment manufacturer, ABI, initially tried to embed base calling software within their hard technologies, but academic bioinformaticians decoupled the two, preferring to develop their own programs to ensure transparency in the process. It is a good example of how technical interdependencies are formed and in this case how attempts to provide a proprietary and blackboxed solution met with resistance in public academic laboratories. Indeed, the issue of ascribing error probabilities to base-calls would later become an important aspect of making the data public, with the inclusion of quality scores, through the use of Phil Green’s Phred and Phrap programs, enshrined in the Bermuda Rules. Phred and Phrap scores became standards for almost all nucleic acid sequence evaluation, the world over. The norms for public data were thus predicated on technical developments in bioinformatic tools.

74

Public or private economies of knowledge?

Bioinformatic assembly tools were also at the heart of the scientiﬁc controversy surrounding the best approach to adopt for sequencing the human genome and speciﬁcally the use of the shotgun method (Pop et al., 2002). The key determinant of whether the whole genome shotgun (WGS) method could work lay in the ability to reassemble the genome-wide DNA fragments and this pointed the spotlight at Celera’s assembly capabilities. In the run-up to human genome sequencing, when the focus was on sequencing smaller organisms, Gene Myers of Celera developed a software package that automatically assembled the genome of the fruit ﬂy, taking some 12 hours to assemble the 180 million base pairs. This development of a bioinformatic tool was as signiﬁcant for high-throughput sequencing as the hard technology described in the previous chapter. Quick calculations revealed that scaling up for the human genome would require 30 days of computer time – arduous, but not impossible, particularly with computer processing power increasing rapidly (Moody, 2004). Celera’s triumph illustrates how competition between diﬀerent scientiﬁc strategies contributed to the creation of tradable entities, in this case both of Celera as a ﬁrm and of its products. Had the WGS assembly been unsuccessful, major doubts would have emerged over the possibility that Celera could develop a tradable genomic product with demonstrable higher value than public domain alternatives. Furthermore, such a failure would have cast doubts over the viability of Celera as a sustainable business model. Following this success, at least for a short period, Celera’s fortunes were bolstered. But it is important to note that, in this case, the bioinformatic tools themselves had not been developed to be traded by Celera. Rather, they were appropriated and kept private within the company for use as a process technology to produce the data sets, the products that the company traded in. Tools were privately appropriated, but untraded. The market was for data. While the signiﬁcance of bioinformatic tools for the creation of databases is clear, it is perhaps the role they play in analysis that captures the imagination more dramatically. Put bluntly, the three billion As, Cs, Gs and Ts that constitute the human genome do not amount to very much without a means to extract biological meaning from them. Bioinformatic tools play a key part in producing knowledge from the information held in these sequences. The past 20 years or so have seen a burgeoning of diﬀerent algorithms, some representing improvements on previous ones, others performing diﬀerent forms of analysis for diﬀerent types of biological data. Many of these algorithmic tools attempt to decipher the sequences (of nucleotides or amino acids) directly, for example those that search for likely genes within sequences by searching for particular structures. Others mimic, in silico, biological processes, such as translation, by automatically translating between codons in the DNA sequence and amino acids that

Tools of the trade – trade of the tools

75

make up the proteins. And the holy grail of bioinformatic analysis was the idea that protein function could be predicted from the DNA sequence that coded for it, through the following steps: DNA could be analysed to predict protein sequence; protein sequence could be analysed to predict protein structure; and structural attributes could contribute to predictions of protein function. Although this dominated the imaginations of many bioinformaticians, in fact the emergent core of much bioinformatics has been sequence similarity searching, to which we now turn.

PART I HISTORICAL CONTEXT: PIONEERS IN KNOWLEDGE AND ECONOMIES Laying the Foundations: Algorithms for Sequence Similarity and their Computational Implementation One of the major foundations of bioinformatics and the new molecular biology is the opportunity to analyse new experimentally derived data (for example, a new protein or nucleotide sequence), where there is little knowledge about its biological structure or function, by comparing it to other similar data, where the knowledge of its characteristics and role in biological systems is known. This principle of comparison, as Hodgman (2000) points out, has actually played a signiﬁcant role throughout the history of biological research, including species identiﬁcation for evolutionary studies or palaeontology, or tissue status determination from histological or pathological specimens. The historical roots of sequence similarity searching reach back way before the era of genome sequencing and ﬁnd their origin in early eﬀorts to compare or align amino acid sequences in protein research. These early eﬀorts from the 1960s onwards, emanating from university and public research laboratories, are frequently considered to be scientiﬁc and technical foundations of the bioinformatic ﬁeld, and the historical evolution of sequence comparison techniques is now well documented (Attwood and Parry Smith, 1999; Hodgman, 2000; Ouzounis and Valencia, 2003; Moody, 2004). Given the key role that sequence similarity plays in bioinformatics, it is useful to dwell brieﬂy on what exactly is involved and how the techniques have developed over the past few decades. In principle, sequence similarity is a straightforward matter, based on comparisons of the strings of letters that correspond to the molecular building blocks of proteins or genomes. In the same way that the DNA sequence is composed of a string of four letters, A, C, G and T, proteins have an alphabet of 20 letters corresponding to the

76

Public or private economies of knowledge?

20 amino acids that form their sequence. Early work on sequence similarity, as we have said, was directed at the 20-letter alphabet of proteins. The value of sequence comparison lies in the assumption that, even in instances where similarity between two proteins or genes is quite limited, it could be signiﬁcant enough to indicate common evolutionary roots or structural and functional similarity. While this assumption seemed promising and worthy of further exploration in the early period, it has now become axiomatic to the new biology. The main thrust of sequence comparison is to discover the best possible alignment between two sequences allowing for the deletion or insertion of a letter (corresponding to an amino acid residue for proteins or a nucleotide base for DNA). The algorithm iteratively searches for this optimal result by adding gaps to either of the sequences and the number of these gaps produces a score to oﬀer a metric for the extent of similarity. So the smaller the number of gaps, the greater the similarity and, with the optimal alignment established, the greater the number of matches, the higher the degree of similarity. Early contributions in wide use were the eponymous algorithms of Needleman and Wunsch (1970) and Smith and Waterman (1981), the latter a signiﬁcant modiﬁcation of the former. These approaches were appropriate at the time that they were introduced, but would prove too time-consuming for the new era of burgeoning databases and greater sequence lengths, despite the ongoing developments in computer power. During the 1980s a series of enhanced algorithms were developed and in each case they aimed to enhance the speed of execution rather than the sensitivity of searches. Ultimately, these developments led to the introduction of BLAST (the Basic Local Alignment Search Tool) by Altschul et al. (1990). Of course there is nothing exceptional in this type of trajectory within science. It is a commonplace for experimental techniques to be reﬁned and modiﬁed as new theoretical advances are made, new opportunities arise from the development of complementary technologies (in this case computers) and new types of data emerge. But scratching beneath the surface of the journal articles in which these algorithms were published reveals several key dynamics relating to our concerns over how knowledge is made public and private. The ﬁrst point to note is that the development of these algorithms, culminating with BLAST in 1990, is evidence of the temporal accumulation of technical knowledge. We might justiﬁably view the achievement of BLAST as the fruits of all those that worked on sequence analysis previously, and close reading of the key scientiﬁc papers over the two decades or so reveals the incremental and additive basis of the development of these tools – BLAST from the past (Moody, 2004). But this is saying little more than Newton when he claimed that his insights were only possible because he stood on the shoulders of giants. What is more interesting in the current case

Tools of the trade – trade of the tools

77

is the persistence of strong interpersonal connections and the institutional context which allowed for such developments to occur. The dominance of only a handful of institutions throughout the period and ultimately the strong role played by the National Center for Biotechnology Information (NCBI) suggests an important relationship between the reproduction of existing capabilities and the production of new technical knowledge. Above all, in this early phase, it seemed quite normal, and conforming to existing assumptions, that these algorithms were all developed in the public research sector and at the time of their publication would have been considered signiﬁcant scientiﬁc achievements. As such, scientiﬁc papers outlining the mathematical and computational principles by which they worked were published in academic journals. This leads us to an interesting consideration of what exactly is made public, and relates directly to the distinction between an algorithm and its computational implementation. The distribution of an algorithm and its theoretical justiﬁcation were made available to the scientiﬁc community for scrutiny through publication. But this does not directly make the tool available for actual use, since the algorithm still requires computation. The potential user would need to be in possession of the necessary capabilities to do this prior to any biological experimentation, thereby restricting the user community signiﬁcantly. By the time Wilbur and Lipman (1983) published their algorithm in 1983, this potentially exclusive aspect was being explicitly addressed: a footnote to their article states that the computer programs utilizing the algorithm were available from the authors on request. However, it was the introduction of the Internet that produced a radical transformation in the distribution of tools. By 1991, BLAST, developed at the NCBI, was made available at their host server alongside Genbank database, thereby increasing the ease of access dramatically. At this stage, neither the bio-data nor the bioinformatic tools that accompanied them presaged the turmoil of the decade that followed. While this illustrates one way that tools could become more publicly available, it does not explain why there was an increasing demand for unrestricted and free access. There are several reasons for this. First, the scientiﬁc value of similarity searching was not always obvious and during the early period there was a long-standing debate regarding whether similarities were representative of shared ancestry or merely down to chance (Doolittle, 1981). But a pivotal discovery in 1983 brought about a shift that would consolidate the rationale for assigning function on the basis of signiﬁcant similarity, thereby providing a signiﬁcant boost to the raison d’être of database searching and similarity search tools. The discovery, made independently by two groups (Doolittle et al., 1983; Waterﬁeld et al., 1983) involved a match between newly sequenced proteins connected with

78

Public or private economies of knowledge?

normal cell growth (platelet-derived growth factor) and the product of the onc gene from a cancer causing virus in monkeys. The implication was that the onc gene could contribute to the malignant transformation of cells by inappropriately producing a product that normally stimulates cell growth (Marx, 1983), thereby providing an important insight into the possible action of the onc gene. But the wider implication was that it marked the beginning of functional assignment by similarity, rather than traditional wet biology (Hodgman, 2000). This important discovery played a key role in demonstrating the potential power of similarity searching and in silico biology more generally, and at the same time promoted the case for widely accessible sequence databases. It did so by demonstrating the utility of having analysed or unanalysed sequence data in repositories that could be checked for similarity with newly produced sequences. Withholding such data would run the risk that important similarities between sequences could be overlooked. Within a few years of this discovery, new textbooks for sequence similarity emerged. This heralded a new era, when the techniques became of signiﬁcantly greater interest to the wider biology community, expanding from the relatively small community of bioinformaticians previously involved. But the expansion of use involved other changes too. The introduction of microcomputers and the increasing number of laboratories that could gain access to them meant that there was a potential boom in the user community. In 1985, Lipman and Pearson published their FASTP algorithm (another in the sequence of algorithms linking Needleman and Wunsch, and BLAST). They wrote in that paper that ‘Because of the algorithm’s eﬃciency on many microcomputers, sensitive protein database searches may now become a routine procedure for molecular biologists’ (Lipman and Pearson, 1985, p. 1435). The new algorithm allowed comparison of a 200 amino acid sequence against the 500 000 residues held in the National Biomedical Research foundation library in two minutes on a minicomputer or ten minutes on the newly released IBM PC. Previous algorithms required eight hours of computer time for the same analysis. The trajectory of innovation described above resulted in tools of enhanced usability and speed, so creating new and broader constituencies of users. Initially, those that used the algorithms were also those that produced them. But the demonstration that they could yield signiﬁcant biological ﬁndings invited access to a wider molecular biology community. This community was less concerned in how the algorithms worked and more towards actually using them for in silico experimentation. Today, the large majority of BLAST users have limited expertise or interest in the underlying algorithms. The provision of BLAST over the World Wide Web simply requires the user to input their own sequence. Interpretation of

Tools of the trade – trade of the tools

79

results requires some training, but this has become elementary in introductory bioinformatics courses. What is important for our analysis is the way that two specialized and diﬀerentiated scientist communities have been formed: the specialist bioinformatician concerned with the development of improved algorithms and tools and the molecular biologist as user of the tools, for all kinds of possible purposes. In our terms, this division between producers and users exempliﬁes the diﬀerentiation between a community of co-producers, on the one hand, and communities of specialized users, on the other. The ﬁrst is inherently involved in setting standards and norms, developing new norms of public control: it is a novel specialized ‘public’. The second is the much broader public, a public of users of tools made available by distributional channels, and given a key tool to the access of public data. The algorithms themselves are available to anyone with access to the relevant journals, but only accessible for detailed scrutiny and further development to those with a deep knowledge of computation and statistics. On the other hand, the tools embodying the algorithms are available for use to the wider molecular biology community providing the necessary recipe-like instructions (Pevzner, 2004) are available for guidance. Thus the tools are accessible and distributed to two novel and emergent public constituencies. The early public development of bioinformatic tools so far has raised a number of key issues, particularly relating to the interdependency between diﬀerent knowledge and technology components. Algorithms were developed for many diﬀerent forms of analysis and these have been implemented in an even wider range of computer programs. Bioinformatic tools are intimately connected with the construction of databases, and so ﬁt into the broader bioinformatic ﬁeld in a number of ways. As new epistemic forms, there was experimentation in how they may be bundled together with datasets or with equipment or developed and distributed as stand-alone packages. The tools themselves may be combined or diﬀerentiated in a variety of ways for a variety of purposes. All these early possibilities raised important questions for the organizational division of labour responsible for the production and use of diﬀerent parts of the bioinformatic ﬁeld and for the way that such a division of labour is coordinated. There was never any obvious or natural organizational division of labour. From the outset, tensions and interdependencies emerged between diﬀerent types of key actors involved in the process, in particular between ﬁrms and universities. Furthermore, a fundamental issue arose as to how the division of labour, particularly between producers and users of bioinformatic tools, would map onto newly emergent forms of their appropriation and distribution. The turbulence of knowledge creation, the very novelty of the epistemic practices and forms, must be seen in its own right as a source of change.

80

Public or private economies of knowledge?

The systemic interconnections between tools and other elements of the bioinformatic ﬁeld point to a key pressure underlying the innovation dynamics in the development of new tools. First, new types of biological data called for new algorithms for analysis and for the construction of databases. Second, enhanced computer power provided the opportunity to develop tools that could not be eﬀectively computed before. Third, the progressive development of ever larger databases raised issues of speed and sensitivity of analysis, creating new problems to be solved by improved algorithms. Fourthly, advances in biology theory provided new hypotheses to be tested by new forms of bioinformatic analysis. These dynamics create a perpetual turbulence in bioinformatic knowledge as improvements in one area pose new problems for others. Because there has never been a static division of labour between the constituent parts of the bioinformatic universe, the turbulence in knowledge repeatedly throws into disarray the prevailing organizational conﬁgurations and opens up new tensions and new forms of interdependence between the public and private. Markets for Tools: Early Forms of Private Appropriation and Distribution Given the early history of public bioinformatic tools, there were no commercial templates or models to follow as to whether and how bioinformatic tools could be established as tradable products, when markets for tools ﬁrst arose. Would a new class of economic agent emerge in the commercial division of labour, the bioinformatic software ﬁrm? How might new markets be instituted? Although bioinformatic analysis of proteins had been undertaken since the 1960s, the era of new ﬁrms dedicated to the provision of proprietary bioinformatic software products arrived with DNA sequencing in the early 1980s. In addition to the growing production of nucleotide data, this period saw increasing numbers of biology laboratories gaining access to computing facilities and this played a key role in creating a potential market for bioinformatic software tools. The pioneers of proprietary packages ﬁrst appeared in 1987, when three competing commercial sequence analysis software packages, all designed to operate on minicomputers using a Microsoft DOS operating system (Cannon, 1987), opened a new market. DNAStar was distributed by the company of the same name, MicroGenie by Beckman and IBI-Pustell by IBI. Each package provided functionality in sequence mapping, protein analysis, homology analysis and database handling and all three drew on the publicly available algorithms, such as the Lipman–Pearson alignment method (based on the similarity algorithms discussed earlier). But there were signiﬁcant diﬀerences between the packages in terms of the range and depth of analysis that they provided and in the usability aﬀorded by the

Tools of the trade – trade of the tools

81

user interfaces. The DNAStar package was more comprehensive than the other two, but was also more expensive and so the other two oﬀered a viable alternative to users with narrower requirements. The other major commercial package of the early period emerged from Europe. Amos Bairoch,2 working at Geneva University, had been developing a software package for sequence analysis, particularly relating to proteins. Early versions, with 15 programs, were distributed with releases of the EMBL database. Through a circuitous route which nearly saw it combined with the IBI-Pustell package, the package was exclusively licensed to Intelligenetics in 1986, and was continually developed to incorporate more programs over the next few years (Bairoch, 2000). This early phase of commercialization is instructive in several ways and from the perspective of our framework involved much more than the simple and conventional model of spinning-out research from universities to the private domain, although that was clearly part of the story. From the outset, there was profound uncertainty over how to develop and disseminate new bioinformatic tool knowledge and functionality most eﬀectively. The possibility of breaking out from the early public-science model, largely conﬁned to a community of bioinformatic experts, clearly arose from the extension of the user community to include numerous groups that could beneﬁt from the functionality of tools. The division of labour created opportunities for exchange, other than through public distribution. From the mid-1980s, therefore, there was clearly an emergent competition between public and private production, provision and distribution of bioinformatic tools. Commercial providers sought to provide added value over what was made available in the public domain, both by enhancing usability and by broadening the potential user community. This foreshadowed an emergent private–public division of labour, whereby the ﬁrms would become responsible for collecting together and implementing computationally the algorithms produced in universities. In this model, dissemination of the algorithm knowledge would continue through journal publication, and distribution of rights to access their functionality, through computer programs with a user-friendly interface, would be through newly formed markets. However, in practice the arrangements were not that clear. Certainly, not all new tools were subject to this type of division of labour. The highly prominent BLAST tool had always been made available at no cost to the user through the server at the NCBI. The ‘Wisconsin package’, developed in the University of Wisconsin, one of the case studies discussed in detail below, was also distributed by them, albeit with annual user licences to recover their costs, but not as a fully commercial activity. So during the 1980s there was an unstable, quite muted, rivalry between the new commercial activities and university-based research, without any

82

Public or private economies of knowledge?

emergence of a clear division of labour between them, a mutual dependency based on diﬀerentiation of activities. The second wave of commercialization dramatically raised the stakes, and changed the parameters of the rivalry that would ultimately result in a rearrangement of the public–private divide, and the emergence of a quite distinct and diﬀerent market for bioinformatic tools. The Tumultuous 1990s: Market Explosion and Implosion A new era of commercialization emerged during the 1990s, diﬀering in signiﬁcant ways from the earlier period. The shift was based on new expectations over what economically might be achieved with bioinformatics, expectations that grew out of key developments in both the public and private domains. In the earlier period, the formation of commercial bioinformatic tool providers occurred in the context of a relatively small customer base, largely consisting of molecular biology departments within universities and basic research teams in some life science ﬁrms. The expansion of genome sequencing activity up to the early 1990s bolstered expectations that the new biological science would yield results that could directly contribute to product innovation in the agrifood and pharmaceutical sectors. As we saw in the last chapter, this was the time when the explosion of biodata was combined with a prospect of its wholesale commercialization, either by patenting or by licensed use of ﬁrewall-protected databases. The watershed moment, the one event that signalled the arrival of this new environment, came with the announcement that SmithKline Beecham were paying a new company, Human Genome Sciences, $125million for exclusive rights to their EST database, itself an exemplary case of data and proprietary bioinformatic tool capability combined. The signal was that big pharma expected genomics and bioinformatics to contribute signiﬁcantly to their innovation processes and it was a signal that was registered by scientist-entrepreneurs and capital markets. Furthermore, the search to appropriate new types of knowledge privately through patent protection, especially in the context of pharmaceutical innovation, placed extra emphasis on the value of tools. Signiﬁcantly, the key requirements for utility demanded by revisions to patent criteria could be strengthened by enhanced annotation, especially by ascribing functional attributes to sequences. Bioinformatic tools provided this capability, without the need for traditional biological experimentation. Combining this new functionality with user-friendly software programs in the context of the new genomic and proteomic data promised dramatic expansion of markets. This buoyant economic climate produced a number of alternative business experiments based on diﬀerent models for creating tradable bioinformatic

Tools of the trade – trade of the tools

83

tool products. The possibility of creating these new classes of economic agent and their associated markets was inextricably linked to what was going on in the public domain. In particular, the process ran in parallel with the economies of bio-data. Initially, we can distinguish between three main variants amongst the ﬁrms at the time: those that developed and protected software in-house in order to trade in the data produced and analysed by it; those that traded in a combined package of software tools and accompanying data, much drawn from public databases; and those that traded in the software bioinformatic tools, as stand-alone products. In the market euphoria of the early 1990s, Celera and Incyte championed the ﬁrst alternative: they were seen as the exciting new big-hitters in bioscience knowledge markets. The second alternative, exempliﬁed by Intelligenetics and PC/Gene, as we saw in the previous chapter, experimented with the model of a combined product–package of software tools and data. A third class of ﬁrm, continuing the tradition of some of the early commercial pioneers, sought to develop and trade bioinformatic tool packages. Some of these were based on comprehensive analysis technologies, while others pursued niche strategies, focusing more directly on one type of biological data, so even within this class of ﬁrm there was a high degree of diﬀerentiation. Furthermore, as we shall see later, some ﬁrms that started out as niche tool providers developed into ﬁrms oﬀering comprehensive analysis systems, while others continued to pursue the niche strategy. In the early 1990s, given the commercial interest in genomic and proteomic data, and the possibility of its wholesale marketization, the Celera/Incyte alternative was clearly the dominant one. But, as a consequence of the developments in bio-databases discussed in the previous chapter, it is equally clear that market explosion was rapidly followed by market implosion. Eﬀectively, the emergence of the public databases at the expense of commercially traded bio-data forced a divide between data and tools, and resulted in the disappearance of ﬁrms trading either in data or in the combined tool and data packages. We have seen that Celera and Incyte shifted their business elsewhere. In the broad picture, that essentially left the third model, the bioinformatic tool providers, as the remaining contender for the creation of markets. Before turning to our case studies that explore in depth the question of how, and under what conditions, public and private economies of bioinformatic tool knowledge developed, it is worth setting out their historical context. Establishing a viable space for the formation of bioinformatic tool markets produced a new competitive dynamic concerning the type of business model that would deﬁne the sector. In the uplift of expanding demand from pharmaceutical and biotechnology ﬁrms, the prominent trend involved strategies aimed at the rapid expansion and integration of proprietary bioinformatics.

84

Public or private economies of knowledge?

In light of this, the niche strategy fell somewhat into the background, though critically it did not disappear. These expansionary strategies were fuelled by expectations that the ﬁeld would be subject to a high degree of concentration, perhaps leading to monopoly. There was even a vision of the Microsoft scenario, a single dominant bioinformatic product containing a fully integrated suite of tools oﬀering a comprehensive solution to all customers. In this context, the race amongst rival ﬁrms centred on expanding market share and increasing R&D capacity to build the ‘one-stop-shop’ integrated informatics packages. It is worth dwelling on these drivers for a moment. The drug discovery and development process is notoriously expensive, estimated at around $800 million per drug. The business model underpinning the idea of a onestop-shop informatics platform to facilitate this process rested on projected savings and the potential to identify new targets and new leads. A pharmaceutical company warehouses a vast array of data of varying kinds, chemical, biological and medical, often dispersed throughout its organizations. An integrated bioinformatic platform promised to short-circuit the old linear, step-by-step, approach to drug discovery. This model was pursued by Lion Bioscience, one of the ﬁrms that ﬁgures prominently in the more detailed cases analysed later in this chapter, and it is revealing that von Bohlen, the CEO, drew comparisons between the product his company oﬀered and the highly successful business process software packages provided by companies such as SAP.3 If SAP could create a growing business by providing a software product aimed at making business processes more eﬃcient, then Lion should be able to do the same with the pharmaceutical R&D process. Furthermore, the business model reﬂected one of the major technical conundrums of the time, namely how to deal with the problem of interoperability between diﬀerent data and diﬀerent tools. The race for monopoly was predicated on a wave of high expectations in capital markets. Each of the rivals in the race sought large sums of money to fund the expansion of R&D and distribution: Lion Bioscience raised Euro200million, Rosetta Inpharmatica, $100million and Genomica $122million in 2000. Still catching the wave of high expectations, Rosetta Inpharmatica went one step further, when it was acquired by Merck for $620 million the following year. These ﬁgures indicate the scale of investment in the expanding bioinformatic tool sector and we can be sure that the growing commercial R&D capacity involved signiﬁcant migration of skilled bioinformaticians from the public laboratories, at their expense. This period of growth was also marked by a ﬂurry of merger and acquisition activity pointing to the operation of markets for corporate control over knowledge assets. The acquisition of one ﬁrm by another involves the market exchange of knowledge resources in terms of both the rights

Tools of the trade – trade of the tools

Table 3.1

Acquisitions by Accelrys to create a ‘one-stop-shop’

Company (founding dates)

Accelrys (2001)

85

Company (founding dates)

Companies (founding dates)

Molecular Simulations Inc. (MSI) (1984)

BioSym (1984) Polygen (1984) Cambridge Molecular Design (1989) BioCAD (1991) Hare Research (1982)

Oxford Molecular (1989)

GCG (1982) Intelligenetics (1980) Health Designs (1978) Chemical Design Ltd. (1983) CAChe (1986) Biostructure (1987) UniChem (1991)

Synopsys (1992) Synomics (1997) SciTegic (1999)

over formally appropriated intellectual property (covered by patents or copyright) and rights over the capacity of skilled employees to produce further knowledge. In particular, Lion Bioscience and Accelrys pursued their aims to create integrated solutions through this strategy (McMeekin, Harvey and Gee, 2004). As we follow the Lion case in detail below, Table 3.1 illustrates how Accelrys was created by its holding company Pharmacopeia from a succession of previous acquisitions to build knowledge production capabilities (in scale and scope) and to remove, through assimilation, competitors from their market. Pharmacopeia’s software division had expanded from a valuation of $35 million in 1996 to $90 million at the time of Accelrys’ formation in 2001. As a consequence of acquisitions, Accelrys’ capabilities include modelling of chemical behaviour, chemoinformatics, genomic sequencing and genomic database searching, data management, and an interoperability middleware platform. For a moment at the turn of the century, it seemed entirely possible that the resounding victory of the public domain over rival commercial interests in the provision of genomic data (discussed in the previous chapter) would be accompanied by the reverse in bioinformatic tools: the erosion of public provision with the emergence of a dominant and proprietary ‘onestop-shop’ software platform.

86

Public or private economies of knowledge?

However, this ﬂurry of merger and acquisition, seemingly destined for a monopoly (or near monopoly) outcome, was soon checked by countervailing trends. In just a few years, the boom abruptly halted. Many of the ﬁrms that had announced impending IPOs withdrew their plans for ﬂotation. A wave of redundancies struck the sector, with many bioinformaticians returning to academic laboratories. And a number of high-proﬁle ﬁrms went out of business, including one of the potential giants, featured in our case study, Lion Bioscience. No doubt part of this contraction can be attributed to the general shift in capital markets away from technology ﬁrms during 2000, but there are several other factors that are more important for our analysis. First, overly optimistic expectations about the extent and pace of impact that bioinformatics and genomics would have on the pharmaceutical R&D process were quickly revised. This in turn raised questions about the value of companies operating in the sector and the value of their products. There had been no rapid alleviation of underperforming drug development pipelines and the revolutionary potential of genomics and bioinformatics was signiﬁcantly downgraded (Nightingale and Martin, 2004). There still remains a strong belief that the recent advances in biological sciences will have a major impact on associated industries, but that the impact will take much longer than initially anticipated. Secondly, the integrated ‘one-stop-shop’ business model has not proved dominant at the expense of smaller bespoke bioinformatic technologies. The major potential commercial customers, pharmaceutical ﬁrms, have resisted the opportunities to buy integrated, R&D process-wide solutions in favour of undertaking the integration themselves by purchasing a variety of tools and creating their own systems (Sender, 2000). In a ﬁeld which is characterized by a rapidly shifting frontier, the dependence on one provider perhaps proved too restrictive, despite the advantages of interoperability across technologies. As a result, the niche bioinformatic tool providers, often oﬀering just one speciﬁc type of tool and with strategies aimed at slower and steadier growth (relative to those seeking rapid expansion discussed above) have proved sustainable. One factor in this trend has been the growth of in-house bioinformatic capabilities within the large pharmaceutical and agrifood ﬁrms, able to customize and combine diverse packages to develop suﬃcient interoperability to meet internal requirements. Of course, there is nothing to suggest that this could not change in the future, perhaps with the development of a platform technology able to integrate a variety of compatible products. The third factor, of course, relates to what was going on in the public domain during the period and the continuing frictions between the proprietary and public provision of bioinformatic tools, rather than emergence of

Tools of the trade – trade of the tools

87

a division of labour characterized by harmonious coexistence. The opportunity to develop and maintain markets for proprietary tools appears to have been ‘squeezed’ from a number of directions, and we explore the shifting and blurring of boundaries between public and private economies for bioinformatic tools as a central focus of our case studies. The current state of play in bioinformatic tool provision continues to be marked by an uneasy coexistence of proprietary and public activities. For the moment, the once anticipated opportunities for a private domain have not materialized. Some of the major players that pursued rapid expansion to create scale advantages have withered away. Other major players have radically shifted their business focus, including Incyte and Celera, who no doubt retain bioinformatic capabilities, but are no longer trading in genomic data or bioinformatics tool-based knowledge. But, equally, proprietary interests have not vanished and there remains a vibrant population of smaller ﬁrms oﬀering specialized bioinformatic products and services. The preceding overview of economies of bioinformatic tools raised many important issues concerning the processes through which knowledge had been made public and private in this area. The remainder of this chapter provides much more detailed accounts of speciﬁc bioinformatic tools, in order to develop a picture of dynamic transformation of economies for speciﬁc bioinformatic knowledge resources. The two cases analysed in the next sections involved knowledge resources that originated in university laboratories and subsequently became the basis for the creation of new ﬁrms. This process of forming a new ﬁrm from university research activities has been widely discussed in the literature and has become a preoccupation of innovation policies. The various diﬀerent descriptions come under the headings of spinning oﬀ, technology transfer and privatization. The ﬁrst two terms are rather ambivalent in respect of our interests in the public–private relationship; the third, however, directly confronts it, but does so in a way that presupposes a neat and simply linear transformation and an idea that it is obvious what was public and what would become private and in what ways. In the analysis that follows, we will be particularly interested in drawing out the processes that produced a succession of temporarily stable economies of knowledge in each of the cases and the tensions between public and private domains that lay beneath the transformation from one phase to the next. The ﬁrst case charts the evolving economy of knowledge for one of the major dedicated bioinformatic tool providers, Lion Bioscience; the second takes a speciﬁc object of bioinformatic knowledge, the GCG package, as its empirical focus to consider transformations in the processes through which it has been continually redeveloped, distributed, appropriated in a variety of diﬀerent ways and used in diﬀerent contexts and for diﬀerent purposes.

88

Public or private economies of knowledge?

PART II TRACKING TRAJECTORIES: TWO CASE STUDIES OF BIOINFORMATIC TOOLS Lion Bioscience: Evolution and Extinction of an Economic Agent The European Molecular Biology Laboratory (EMBL), based at Heidelberg, is a major public science institute and was centrally involved from the outset in the historical development of genomics and bioinformatics. Its remit was to conduct scientiﬁc work at the leading edge of molecular biology and, as the discussion will demonstrate, this basic-science orientation has been the source of frictions around key bioinformatic knowledge resources developed in the course of its research programmes. The fate of two key bioinformatic tools developed in the EMBL was decided during the 1990s.4 Both tools were developed to take advantage of burgeoning biological data to make progress in scientiﬁc understanding. At the time of their inception, they were viewed as contributing signiﬁcantly to scientiﬁc programmes of work and represented in themselves signiﬁcant scientiﬁc advance, drawing on information science, computation and biological theory. However, once developed, their status changed within the EMBL environment. Despite the fact that each tool could and would be further developed, they took on the mantle of technical instrumentation to be used for scientiﬁc research, but were no longer considered scientiﬁc achievements in their own right. The distinction is important, because of the basic science remit of the EMBL. At the time, the distribution and maintenance of instrumentation was viewed at the EMBL as a service activity and therefore not appropriate for the receipt of EMBL resources. This raised the possibility that the tools should be spun out of EMBL, particularly since there had been a general shift at the EC level towards promoting opportunities to commercialize outputs from the science base (arguably in response to the considerably more entrepreneurial US academic system). It was in this context that Lion Bioscience was established as a bioinformatic ﬁrm dedicated to developing and marketing sequence analysis software. While in development at EMBL, the ﬁrst software tool was called GeneQuiz, renamed BioScout5 in its LION incarnation. The development of GeneQuiz was intimately bound to the ratcheting-up of genomic and post-genomic science. The production of whole chromosome or organism genomes created a signiﬁcant bottleneck in the labour-intensive annotating necessary to render raw data biologically meaningful. GeneQuiz was designed to solve this problem, providing the capability of automatically and rapidly annotating protein sequences corresponding to coding regions of DNA (Andrade et al., 1999). As a new tool it distinguished itself from existing approaches by putting speed of analysis at its core, so oﬀering what

Tools of the trade – trade of the tools

89

the developers described as the optimal combination of speed and state-ofthe-art annotation. The output of GeneQuiz analysis would provide the maximum possible annotation of protein sequences, including prediction of protein function. To do so, it was programmed to interrogate the maximum number of existing biological databases. It is obvious from this that the functionality of GeneQuiz was based on the incorporation of many existing public domain tools and on the existence of a public domain of biological data. Ultimately, these technological interdependencies would prove to be a source of economic vulnerability in attempts to develop markets for the tool. The ‘transfer’ itself involved a licence agreement between the new company and the EMBL and two key members of the EMBL GeneQuiz group, Reinhard Schneider and Georg Casari, became founder members of the new ﬁrm. In terms of knowledge exchange, there was a transfer both of ownership of the intellectual property and of embodied knowledge, through the exodus of these two bioinformaticians. During the following years the BioScout source code was continually rewritten and became increasingly diﬀerentiated from GeneQuiz, which itself was further developed at the European Bioinformatics Institute (EBI). In the period after the spin-out, LION and EMBL held meetings every six months to discuss the licence agreement surrounding the GeneQuiz source code. In 2002, both parties agreed that the BioScout package was suﬃciently distinct from the original for the licence to be terminated. Prior to this agreement, the meetings were used to determine the royalty payment due to EMBL, based on the similarity of source code between GeneQuiz and BioScout. Gradually, during this period, one knowledge tool became two, diﬀerentiated through further development, and rights over their use were split into a public model and a proprietary model. They were competitors, but the GeneQuiz version was oriented more toward academic laboratories, BioScout toward commercial users. Lion’s other major technology platform was also initially developed within EMBL (and subsequently the European Bioinformatics Institute – EBI). The Sequence Retrieval System (SRS) has been a key technology within the ﬁeld of bioinformatics ‘providing a homogeneous interface to about 80 biological databanks for accessing and querying their contents and for navigating among them’ (Etzold et al., 1996, p. 115). It was developed in response to the proliferation of new biological databases with diﬀerent structures, ﬁelds and data types. SRS became a dominant technological standard among academic and commercial users for conducting multi-database searches and analyses. It is worth emphasizing the signiﬁcance of the technical functionality of SRS: it was the key tool for integrating a huge and expanding range of

90

Public or private economies of knowledge?

bio-databases. Figure 3.1 represents the ‘universe’ of public domain biological databases integrated by SRS, creating a uniquely articulated public domain resource from otherwise discrete sources. Rather than thinking about the indivisibility of knowledge, here the issue is one of the immense challenge, both technical and organizational, of integrating disparate and fragmented knowledge into a coherent knowledge resource. Without SRS, there would be a public domain of separate compartmentalized knowledge; with SRS, there was the opportunity to create an integrated universe of public biological data. The development of the SRS bioinformatic tool represented a step change in the scale and scope of the public domain of biological knowledge. It created a distinctive mode of knowledge distribution through interoperability and articulation of data and enabled new forms of further knowledge production, through cross-database analyses that would otherwise be unavailable. It is precisely this strategic epistemic function that made its mode of economic development, appropriation and distribution so conﬂictual. Signiﬁcantly, SRS also emerged as a computational environment within which bioinformatic tools could be developed, including EMBL’s GeneQuiz, for example. But, mirroring the fate of GeneQuiz, SRS was not considered as state-of-the-art scientiﬁc research, according to the established norms of public science. Because it required signiﬁcant resources for its maintenance and development, Thure Etzold, the inventor, had sought commercial backing for some time. Social connections formed through EMBL favoured the approach made by Lion Bioscience, who were particularly keen to own the SRS technology, because its own tool, BioScout, had also been developed in an SRS environment. Their interest in the SRS technology demonstrated very well the signiﬁcance of interdependency between diﬀerent bioinformatic technologies. Developing the BioScout software as the primary business for LION potentially exposed the company to excessive dependency on whoever owned SRS. This was tolerable where the source code remained fully in the public realm, but the prospect of SRS becoming a proprietary technology of one of their competitors was of considerable concern. Critically, BioScout’s ongoing development required access to the SRS source code. Given that the core analytical functionality of GeneQuiz relied on access to the maximum number of public domain databases, it is clear how strong the dependence on SRS integrative capacity was. Owning SRS would enable LION to develop integration and application technologies with a greater degree of synergy. This points to a second and perhaps more signiﬁcant motivation for LION. Owning SRS and alleviating their own vulnerabilities in application design would place them in a position where their competitors would become increasingly dependent on them. Equally,

91

Figure 3.1

LISTA

PIR

SeqRelated

SEQANALRABS

Genome

Metabolic

BLOCKS

Literature

Others

Transfac

Mapping

NAKAIMATRIX

LIMB

PRINTS

NRL3D

PDBSHORT

TFFACTOR NAKAI

REBASE

SWISSDOM

PRODOM

PDBFINDER

ECOCYCGENE

ECOCYCPROT

ECOCYCENZ

ECOCYCCOLI

ENZYME

TFMATRIX

TFSITE

SEQANALREF

SWISSP/FAM

PFAM

PROSITE

SWISSNEW

SBASE

SWISS-PROT

ECDC

EC2D

ECOCYCPATH

ECOCYCCOMP

ECDCONTIG

NRSUB

WORMPEP

VIRGIL

CPGISLE

SPTEMBL

PROSITEDOC

PIRALN

FLYGENE

EMBLNEW

FLYREFS

Protein Structure

SACORF

FLYCLONES

EPD

EMBL/GENBANK

DBSTS

RHDB

The SRS universe, Lion Bioscience, database integration system

Sequence

GBRNA

SACGMAP

SACLOC

SACPMAP

LISTAHON

LISTAHOP

RHPANEL

RHEXP

RHMAP

SACSEQ

DBEST

UNIGENE

UNITEST

OMIM

MIMMAP

92

Public or private economies of knowledge?

it would allow them to commercialize the SRS technology and to create a market for it within the life science industries. Following its transfer to LION, the company was in a position to modify its business model with SRS as its cornerstone. To understand the development of the Lion Bioscience business model in the subsequent phase requires an appreciation of its orientation towards both the public academic sector and the commercial genomics and life science industries. The ﬁrst key issue concerns the terms under which SRS was transferred from EMBL to Lion. The licence agreement accorded LION all commercial rights to SRS, on condition that it would always be freely available to the academic community through the EBI and other major public science servers. This clearly distinguishes the spin-oﬀ of SRS from that of GeneQuiz, where no such open-ended access to academic users was guaranteed. In the case of SRS, the terms of the agreement were less of a compromise position from the Lion perspective and owed more to the realization that ‘cutting oﬀ’ public domain access could be catastrophic in their strategy to create a tradable product. The new ﬁrm viewed the ongoing relationship with public science in terms of both the opportunity to develop the product jointly and to ensure that SRS became a standard tool for data integration. Following the deal, Lion immediately opened an oﬃce in Cambridge, within easy access of the European Bioinformatics Institute at the Hinxton Campus. Thure Etzold was made Managing Director of the new Lion UK, but retained a position (25 per cent of his time) at the EBI. This location, within the Cambridge cluster, was also of strategic importance to Lion, and there was an ongoing relationship between the two organizations concerning the development and use of SRS. From the Lion perspective, EMBL’s and EBI’s use and co-development of SRS was critical to establishing the software as an industrial standard. Academic bioinformaticians moving to industry have, in almost all cases, been trained in using SRS and the transfer of their skills is typically tied to the SRS interface. Indeed, one of the main lines of collaboration between Lion and the EBI has been in the provision of training for the use of SRS, a key component in the Lion strategy for consolidating demand for the tool in their eﬀorts to create a market. So, in terms of maintaining public domain rights over the use of SRS, the spin-out proved relatively successful. However, subsequent experience regarding codevelopment of this bioinformatic knowledge resource proved signiﬁcantly more chequered. The problem stemmed from what was entailed in the agreement to make SRS freely available to academics. Free access for use as a customer was unproblematic, while free access to the tool as a co-developer of the knowledge instrument was more complicated. As SRS evolved through diﬀerent

Tools of the trade – trade of the tools

93

commercial versions, academic access to the source code became increasingly constrained. It was still possible to develop applications within the SRS framework, and provision was made for this, but these activities were limited by the removal of any opportunity to customize directly the data integration capabilities of the core SRS engine. For co-development, the relationship between Lion and public science academics became increasingly fragile. Lion made signiﬁcant eﬀorts to strengthen SRS as a public resource by bolstering the team at the Cambridge site, speciﬁcally tasked with incorporating public domain applications (including the EMBOSS suite of tools discussed below). But, within the public science institutes, concern grew over the extent to which their own research eﬀorts in developing new applications were becoming overly dependent on the SRS technology. There could be no certainty over future directions that Lion might pursue, or over the long-term sustainability of the company itself. The future development of a major technology for public database integration was seen to be at risk, with the academic community vulnerable to being cut oﬀ from future integration of the expanding universe of bio-data. We can now turn to an appraisal of the eﬀorts of Lion Bioscience to create a market for its SRS-based business to commercial clients. Straddling the divide between providing informatics to academia and creating a market with commercial clients was almost akin to intentionally doing the splits. The core product traded by Lion was the SRS engine itself, exactly the same technology that was made available free to academic scientists. In 2000, it was sold to companies for approximately $50 000, including the provision of technical support, not directly oﬀered to academic users. The market for SRS, traded under these arrangements, extended to small and medium biotechnology ﬁrms as well as larger life science companies in the pharmaceutical and agrifood sectors. But the Lion strategy was to build a larger, more integrated, bioinformatic product, with SRS at its core, but oﬀering a range of other analytical tools and knowledge management capabilities. Furthermore, as discussed brieﬂy above, the Lion strategy was based on the expectation that progressive concentration in the bioinformatic tool sector would result in the emergence of a small number of one-stop-shop providers. Lion thus aimed to market knowledge management capabilities integrated across the entire R&D process in life science ﬁrms. In this sense, it was similar to the Accelrys strategy already discussed, with the one key diﬀerence that Accelrys was predicated on cutting the umbilical cord with academia, becoming unequivocally oriented to market creation. As with Accelrys, Lion attempted to strengthen its market position through expansion. This expansion resulted in a number of acquisitions, and two signiﬁcant alliances. The knowledge product was to be developed in two major directions: integration of information handling across the full

94

Public or private economies of knowledge?

range of data required for drug development; and creating an integrated platform for knowledge management for client ﬁrms. On the one hand, this drew Lion towards pharmaceutical companies, focusing on the integration of data to include chemical entities for drug leads, lead validation data, toxicology testing from pre-clinical trials and, ﬁnally, clinical trials. On the other hand, it aimed to create a bioinformatic product that would integrate across diﬀerent computer programs, provide analysis of data, develop tools for decision making within companies, and ﬁnally to achieve overall management of all the informatic knowledge resources within client companies. The combination of these two ambitions was seen as the path towards creating a one-stop-shop. Clearly, large pharmaceutical, life science and agrifood companies, the giants of their respective industries, were the main potential clients. Quite early, Lion made two close market alliances and collaborations, one with Bayer, the other with Nestlé. The deal with Bayer was agreed in June 1999 and was worth an estimated $100 million over ﬁve years (Lion, 1999). The contract was organized for the provision of a comprehensive ‘i-biology’ service to facilitate Bayer’s gene and drug discovery processes. It included an initial equity stake in LION AG as well as fees for use of Lion’s existing IT systems, and research and set-up costs for a new subsidiary. This new subsidiary was a contractual obligation and led to the formation of LION Biosciences Research Inc. (LIBR) in Cambridge, Mass. LIBR contracted to deliver to Bayer 500 new target genes, 70 new annotations on existing Bayer-owned gene targets and an undisclosed number of gene expression markers and SNPs. A Bayer-wide intranet system was also developed to ensure that the resulting information was globally accessible to Bayer’s scientists. The intellectual property aspects of the contract were a key factor in determining the Lion–Bayer economy knowledge. Essentially, there was to be a relatively straightforward informatics–biology split between the two. Lion ultimately retained all rights over any informatics-based developments, to which Bayer had unlimited and non-exclusive access. In addition, Bayer agreed an option, after ﬁve years of the alliance, to acquire all the shares of LIBR to ‘spin-in’ the subsidiary, thereby providing them with the opportunity to wholly own an informatics capability. To retain control over the informatics component of LIBR, this part of the arrangement included provision for Lion to have pre-emptive rights to the commercial exploitation of any new informatic technologies in the event that these were in competition with Lion’s own activities and if Bayer decided to market them. It is evident that, where interdependencies occur within distributed innovation processes, close attention is paid to questions concerning the ownership of knowledge and technology. This step oﬀers an early indication of

Tools of the trade – trade of the tools

95

how LION and Bayer, at that time, perceived the most appropriate division of innovative labour, and created a speciﬁc relationship of interdependency. Furthermore, it indicates one way that an incumbent life science ﬁrm positioned itself within the market for corporate control over these knowledge resources, by securing control over the option to appropriate fully the bioinformatic capabilities of LIBR, in terms of technologies and personnel, at a speciﬁed point in the future. Signiﬁcantly, Bayer was the only major pharmaceutical company to establish a strong commercial relation with Lion. The extent of the contractual interdependency, the possibility of ultimate acquisition by Bayer, constrained Lion’s ability to expand its market to other major clients. Conversely, for Bayer, control over its own knowledge resources behind its ﬁrewalls was a powerful reason for it to assert this constraint over Lion, if only to protect its competitive position with other big pharma corporations. Lion was pulled more and more into developing informatics for pharmaceuticals, adding pharmacophore informatics (Lion, 2000), at the interface of biology and chemistry. At the same time, it shifted away from its traditional biological orientation around which SRS had developed. Consequently, Lion began a new phase of activity in the capital markets for key informatics capability as the company pursued a strategy of acquisition and alliances. Five key alliances and acquisitions successively extended Lion’s scope throughout the pharmaceutical innovation pipeline/informatic technology space. The ﬁrst, with Bayer, was the ﬁrst step in moving toward the biology–chemistry interface. The second involved a full acquisition of US ﬁrm Trega in a deal worth $35 million (Lion and Trega, 2000). The acquisition changed the shape of Lion by adding further chemistry-based capabilities to their own biology-based strengths. In particular, this included Trega’s iDEA™ Predictive ADME (absorption, distribution, metabolism and excretion) Simulation System, which simulates, in silico, how drug candidates will be processed in humans. The third addition to Lion’s functional repertoire expanded their universe into medical informatics and e-health solutions, as Lion took a minority stake in Gesellschaft für Medizinische Datenverarbeitung (GMD) by acquiring 16.1 per cent of GMD’s shares for 8.75 million Euro (Lion and GMD, 2001). Then, in July 2001, Lion announced that it would license a wide range of MDL’s industry-standard chemo-informatics applications and databases in order to develop and commercialize software applications that could interface with MDL’s widely deployed products (Lion and MDL, 2001). MDL had been established in the ﬁeld of chemo-informatics for some time, and therefore had very well established and quite entrenched relationships with large life science companies. The ﬁnal move towards a higher-level informatics platform was then targeted with the acquisition of NetGenics,

96

Public or private economies of knowledge?

previously a Lion competitor, for $17million. So, as well as constituting an expansion of technological capability, this deal removed one of Lion’s major competitors. NetGenics’ DiscoveryCenter technology was a rival of SRS, and had the advantage of being able to integrate relational and ﬂat ﬁle databases. The inability of SRS to cope with relational databases had been seen as one of its major shortcomings. The NetGenics technology had been developed in cooperation with IBM (building on the latter’s DiscoveryLink middle-ware technology). Completion of the NetGenics acquisition and gaining ownership of their integration software positioned LION, at that time, as the most comprehensive life science informatics provider globally. From its formation, through IPO and a succession of acquisitions and alliances involving other informatics providers, LION had transformed itself from a bioinformatics company to a life science informatics company. In so doing, Lion had potentially created its own market space, constituted by diﬀerent market relationships, a novel private domain of knowledge resources. We can portray the ideal universe of this new life science bioinformatic world in Figure 3.2, deﬁned by its market space, its upstream value chain, its competitors attempting to establish a diﬀerently structured market and, most signiﬁcantly, its interface with the public domain, the straddling of the market and non-market galaxies. But although, for a short time around its period of rapid expansion, the universe temporarily appeared to be experiencing its ‘big bang’, it never materialized, at least not in this form, and not for Lion. As observed, the market/non-market interface upon which the development of the core informatics depended was increasingly under strain. Although Lion established a relationship with a single food provisioning company, Nestlé, similar to that with Bayer, that too restricted the scope for subsequent market expansion. Other life science companies, moreover, preferred to develop their own knowledge and informatics integration internally, making rather than buying this core company asset. This no doubt created market opportunities for Lion’s smaller competitors, the niche companies with dedicate bioinformatic tools or small private databases. Part of the failure to create a one-stop-shop market, and undermine the competitors’ niche markets, was the sheer diﬃculty of creating interoperability between the bio- and chemo-informatic resources. It is one thing to oﬀer a range of discrete informatics products with diﬀerent functionalities, but quite another to integrate them in such a way that the whole becomes greater than the sum of the parts. Lion and the other ﬁrms failed to engineer the necessary synergies, which called into question the advantages of the integrated product. Potential clients thus purchased smaller solutions for particular purposes from niche providers.

97

Tools of the trade – trade of the tools

Food provisioning

Big agribusiness

Big pharma Dedicated biotechnology firms

Life-science informatics firm

Public sector: –databases –open source informatics –computation

LION

DbiF tools

Big computing

DbiF databases

Software

= Lion’s potential novel market = Upstream value chain = Competitors for an alternative market = Market/non-market interdependency

Figure 3.2 The life science informatics-centric view of the economic universe The other main competitive pressure came from the public domain that, in part, Lion itself was continuing to nourish. Commercial customers availed themselves of the increasing number of bioinformatic applications being made available by the public science bioinformatics community free of charge and often under an open source protocol. The scope for market creation was thus continuously being eroded at the edges. This was especially true for Lion, whose main revenue stream continued throughout to rely on sales of its basic SRS engine. This was economically unsustainable. In 2006, Lion sold its bioinformatics business for a rather paltry $5 million. The Lion universe imploded. But SRS survived this galactic event, and remains widely used in the academic sector. The story of Lion Bioscience, its birth, growth and ultimate demise, is an

98

Public or private economies of knowledge?

exemplary case of an attempt to institute a novel and speciﬁc private economy of knowledge, but one that failed to establish itself in competition with public economies of knowledge in data provision and bioinformatic tool use. Originally an enabling technology developed to facilitate genomic and proteomic science by public scientists, SRS temporarily formed the basis for the creation of a major proprietary informatics platform, before returning to its origins as a public science resource. During that period Lion grew as a private economic entity, attracting major resources in capital markets and pursuing an aggressive strategy of company acquisition itself, before contracting and ultimately withering away. Its rise and fall was heavily inﬂuenced by the perpetual ﬂux in the organization of life science innovation, involving both public and private actors, which transformed patterns of interdependency and rivalry between diﬀerent classes of economic agent. As an economic experiment, Lion failed, showing quite how diﬃcult the market formation and private appropriation of a key knowledge resource had been in this case. The vision of an economy of knowledge predicated on an integrated life science informatics platform has not yet materialized, but Lion’s collapsed universe is a contingent story, with many path-dependencies. It cannot be taken as evidence that a diﬀerently structured market universe for integrated informatics will never emerge. Tracking the Economic Trajectory of a Major Bioinformatic Tool The second detailed case study involves the changing economies of knowledge of one of the major bioinformatic tools.6 It also represents a key moment in the emergence of bioinformatics as a new discipline in the context of expanding biological data. So, in contrast to the previous case, this analysis tracks the knowledge rather than the ﬁrm, and successive economic transformations involved in the production, distribution, appropriation and use of a major new bioinformatic knowledge resource. The GCG software package was one of the ﬁrst of its kind, and involved the harvesting of many previously invented algorithms into one computational software suite, to be used especially for analysis of the new nucleotide databases produced and distributed by Genbank. As an indication of its signiﬁcance, the journal article describing the initial package in 1984 (Devereux et al., 1984) has been cited over 14 000 times – and still counting.7 The story starts in the genetics laboratory of Oliver Smithies, a prominent scientist and member of the National Academy, at the University of Wisconsin (UW). Smithies had become increasingly interested in recombinant DNA and as a consequence, some projects in his laboratory had started to produce DNA sequences. Towards the end of the 1970s, Smithies critically realized that these sequences were going to grow in

Tools of the trade – trade of the tools

99

length and number and that recent advances in computing could greatly assist in the management and analysis of the new data. It was in this context that John Devereux, who had recently dropped out of graduate school, came to work as a technician in Smithies’ laboratory with responsibilities to develop computational and mathematical capabilities to support the DNA projects. This type of arrangement was typical of the birth of bioinformatics. At this stage, the term was not in wide use and its activities were primarily supportive of the biological research programmes. But, as the importance of computation and mathematics became increasingly clear within the molecular biology community at large, so did the status of the research of Devereux and colleagues within the genetics department. It shortly became apparent that this embryonic bioinformatics activity was scaling up to a point where it was diﬃcult to support without dedicated funds, especially in a climate during the early 1980s when science funding was contracting, including resources underpinning the projects in Smithies’ laboratories. This was the context that prompted the establishment of the Genetics Computer Group (GCG) within the university, funded through an intramural grant. Ambitions were certainly not modest: We see the system becoming an important national facility for sequence analysis and we expect that it would be part of a constellation of resources that Madison would oﬀer biotechnology ﬁrms that locate in Madison. It is entirely appropriate for the University to sponsor a development of this kind, because it will create a facility that everyone handling sequence data will be able to use. We expect that industries collaborating with the University will be willing to pay substantial annual user fees and give grants for the further development of this system. We are hoping that the income generated by this facility will enhance the scientiﬁc environment for all people working in the ﬁeld. (GCG grant proposal, December, 1981)

Over the next couple of years, GCG developed the suite of programmes that would bear its name. Its particular characteristics clearly inﬂuenced its future fate in economic and institutional terms. The GCG package announced itself to the world in an academic paper published in 1984 in the journal Nucleic Acids Research (Devereux et al., 1984), introducing itself as an integrated suite of analysis techniques for research on biological sequence data. The paper provides particularly interesting reading for our purposes, resembling a technical speciﬁcation for research instrumentation and at the same time including a description of the terms under which the package could be purchased and used – a model of distribution and exchange. Table 3.2, reproduced from the original article, shows the programmes available in the GCG package. It is clear from this that a ‘comprehensive set of sequence analysis programs’ includes programmes for the management of data and for its analysis. It is also clear how the package was developed by using the existing algorithms already available in the scientiﬁc literature. The Needleman and

100

Table 3.2

Public or private economies of knowledge?

Programs available from UWGCG

Name

Function

DotPlot+ Gap

makes a dot plot by method of Maizel and Lenk ﬁnds optimal alignment by method of Needleman and Wunsch ﬁnds optimal alignment by method of Smith and Waterman

BestFit MapPlot+ MapSort Map Consensus FitConsensus Find

shows restriction map for each enzyme graphically tabulates maps sorted by fragment position and size displays restriction sites and protein translations above and below the original sequence (Blattner) creates a consensus table from pre-aligned sequences ﬁnds sequences similar to a consensus sequences using a consensus table as a probe ﬁnds sites speciﬁed interactively

Stemloop Fold*

ﬁnds all possible stems (inverted repeats) and loops ﬁnds an RNA secondary structure of minimum free energy by the method of Zuker

CodonPreference+

plots the similarity between the codon choices in each reading frame and a codon frequency table tabulates codon frequencies ﬁnds similar patterns of codon choice by comparing codon frequency tables (Grantham et al.) ﬁnds possible coding regions by plotting the ‘TestCode’ statistic of Fickett plots rare codons and open reading frames plots asymmetries of composition for one strand measures composition, di and trinucleotide frequencies ﬁnds repeats (direct, not inverted) shows the labelled fragments expected for an RNA ﬁngerprint

CodonFrequency Correspond TestCode+ Frame+ PlotStatistics+ Composition Repeat Fingerprint Seqed Assemble Shuﬄe Reverse Reformat Translate BackTranslate Spew GetSeq Crypt Simplify

screen oriented sequence editor for entering, editing and checking sequences joins sequences together randomizes a sequence maintaining composition reverses and/or complements a sequence converts a sequence ﬁle from one format to another translates a nucleotide into a peptide sequence translates a peptide into a nucleotide sequence sends a sequence to another computer accepts a sequence from another computer encrypts a ﬁle for access only by password substitutes one of six chemically similar amino acid families for each residue in a peptide sequence

Tools of the trade – trade of the tools

Table 3.2

101

(continued)

Name

Function

Publish Poster+ OverPrint

arranges sequences for publication plots text (for labelling ﬁgures and posters) prints darkened text for ﬁgures with a daisy wheel printer

Notes: + Requires a Hewlett Packard Series 7221 terminal plotter; * Fold is distributed by Dr Michael Zuker, not UWGCG.

Wunsch, and Smith and Waterman methods for optimal alignment, hold a prominent position in the suite. One of the distinctive contributions of the package involved implementation of this set of algorithms within a single computational environment, allowing researchers to combine diﬀerent methods in bespoke combinations. So the package provided signiﬁcant added functional value over the existing public domain of autonomous bioinformatic algorithms. The cumulative nature of science and technology is manifest in the way the package pulls together the collective achievement of many contributions distributed over time and space, and then recycles them in a novel production. The speciﬁc approach to software design oﬀered another key dimension of novel functional advantage. Devereux and colleagues developed a library of programming routines which could be used for multiple tools. Essentially, these routines were simple algorithms that could be called on to execute common tasks for a range of diﬀerent functions. The development and distribution of such a library would become a signiﬁcant resource for further bioinformatic tool development and with the additional availability of the source code provided a valuable resource for the expert bioinformatic community. It was this approach to making the knowledge resources accessible to the wider community that would, in the short term, account for the quick rise to popularity of the package, but would later pose signiﬁcant problems in the attempts to create a private economy of knowledge. Indeed, the journal article is eccentric for its detailed discussion of several economic matters. It speciﬁed the arrangements by which other researchers could access the package: ‘The intent of UW GCG is to make its software available at the lowest possible cost to as many scientists as possible’ (p. 393). The initial plan was to charge users a fee for the tapes containing the software and the supporting documentation. The fee was set at $2000 for non-proﬁt organizations and $4000 for industrial users. There would be no continuing fee, but the paper outlines how the GCG would

102

Public or private economies of knowledge?

establish a consortium of academic and industrial members to support the project in the future, which would entitle them to periodic updates and inﬂuence over the direction of new programming. At the outset, GCG envisaged the instituting of a speciﬁc economy of knowledge, with a rather unusual model for securing its ﬁnancial resources and providing access to users. Issues of ownership of knowledge were also addressed, adding a further dimension to the economy of knowledge. Although GCG owned the copyright to all of its software, it had been mandated to provide research tools to help researchers working in the ﬁeld and encouraged further programming eﬀorts on the source codes by users. To incentivize this process, copyright was removed for any programme modiﬁed by more than 25 per cent of the original. So this intellectual property arrangement was speciﬁcally designed to encourage an open economy of knowledge to facilitate progress in bioinformatic development within a wider community. Rapid adoption of the package within the user community in the USA and Europe followed its publication. The initial economic model was modiﬁed to take account of the rapid changes that were occurring in the bioinformatic ﬁeld. A four-way split for obtaining ﬁnancial resources was established according to whether the user was a non-proﬁt (usually academic) or for-proﬁt organization and whether the licence was for a one-oﬀ transaction or for a supported and renewable arrangement. In the one-oﬀ arrangement, the user would receive the latest GCG package for $2400 (non-proﬁt) or $4800 (for-proﬁt), with no further ﬁnancial obligation. By far the more common renewable licence, at $2000 (non-proﬁt) or $4000 (for-proﬁt), purchased incremental updates, new releases and technical support. The dominance of the annual licence fee was recognition that versions of the package were quickly obsolete with the development of ever improved tools. This arrangement demonstrated an implicit recognition that ﬁnancial support was not only to cover production and distribution of tapes and documentation, but also for the research and development work that would lead to the development of updates. GCG also became a vanguard of what would later become widely known as the open source model of software development. The software package, initially developed by the core team of the University of Wisconsin, was supported by a wider community of user–producers who were a source of signiﬁcant innovations. In part, this occurred through the channel of technical support for developments to the GCG package. In providing support, communication was facilitated in both directions, in a manner consistent with other observations about the role of lead users. Expert bioinformatic users (those involved in developing analytical packages themselves rather than those simply using GCG to analyse biological data) frequently revealed

Tools of the trade – trade of the tools

103

glitches in the package and made suggestions for solutions. Furthermore, the terms of the supported licence gave users ‘a voice in the determination of future GCG programming priorities’.8 The user dimension of this economy of knowledge presented signiﬁcant functional advantage for expert bioinformatic users, since the basic GCG package could be customized and extended to ﬁt particular requirements and to expand the range of analytical possibilities. In Europe, activities of this type, especially those of EMBnet (a virtual network of European molecular biology laboratories funded through the EMBL) were organized into a complementary package, EGCG (the E initially stood for European, but later represented Extended). EGCG, as a supplementary software tool, was available without licensing or charge to users and was made available through the EMBL network ﬁle server and ftp site. It was also submitted to GCG for distribution as unsupported software to all their subscription payers. The opportunity to further develop the GCG package in this way relied on the accessibility of the source code. A second, and complementary, economy of knowledge therefore grew up around EGCG, quite unproblematically as long as both remained essentially public domain resources. Even at this early stage of development, therefore, the software package was exchanged, distributed, used and developed in a complex manner. Some versions were free, there was proprietary ownership of the source code, but the source code was made freely available for scrutiny and allowed co-production of knowledge between original producers and users. Moreover, there were certainly multiple uses: fundamental genomics research, commercial and non-commercial, as well as for non-knowledge production activities such as drug target identiﬁcation. It was successful economic variety creation in terms of both knowledge and ﬁnancial resource ﬂows, with the package accruing annual revenues of $750 000 from 280 institutions. It is quite striking, however, that the economy of knowledge for the UW GCG more closely resembled a private ﬁrm, albeit operating an opensource policy, than a university operation. Its very success stretched both the model and the capacity of the small UW group, and quickly led to its destabilization. The sequence of events that followed oﬀers a dramatic insight into the characteristics of frictions at the public–private interface. By 1988, the rapid rise to prominence of the GCG package had started to pose problems for the continuing operation of its existing economy of knowledge, prompting Devereux to initiate a review of the group’s operations. Demand for the package had grown so much that immense pressure had been placed on the group’s ability to keep up. The required scaling up of these activities had been constrained by lack of space and personnel and

104

Public or private economies of knowledge?

had also squeezed further research and development work, so that further improvements to the package had been suspended for two years. Its success in wide diﬀusion had constrained its further development as a knowledge resource. More signiﬁcantly, its success had started to make waves in the embryonic market for bioinformatic tool provision and there were concerns within the UW group that they were running the risk of attracting criticism for unfair competition from private sector counterparts. The basis for such criticism lay in the advantages that a university-run operation might enjoy in terms of subsidy and tax exemption, enabling the software to be distributed at lower cost than private ﬁrms. A letter from Devereux (18 July 1988) expressed these concerns to Robert Bock, Dean of Faculty, and voiced the need to ‘establish that the Genetics Computer Group is doing something that really belongs within the University’ (emphasis in original). A charter for GCG to secure its expansion and viability was proposed. At the time, the issue of how university operations might prove detrimental to private sector operations was not something speciﬁc to GCG, and the University of Wisconsin had been investigating the matter throughout that year. The ‘Task Force on Competition with the Private Sector’ found that the legislative picture was muddled. New laws had been proposed, but nothing had been implemented at state level. In this confusion, the GCG review proposed that the University adopt a ‘balanced policy . . . which recognizes that private business can be adversely aﬀected by university competition, but which does not restrict the many activities which beneﬁt the private sector and public as well as the university’ (November, 1988). It was recognized that some degree of competition between universities and private enterprise would be unavoidable, and accepted that there could be circumstances where particular university activities might be curtailed in order to reduce some of the disadvantages of such rivalry. Even before the review had been published, however, the concerns of Devereux proved prescient: GCG came under direct attack from Intelligenetics (IG), itself a recently formed bioinformatics ﬁrm that had spun out from Stanford University. During August 1988, the pressure initially took the form of telephone requests to various UW oﬃcials that GCG stop distributing and supporting the package to avoid unfair competition with their own bioinformatics product, which was signiﬁcantly more expensive. The pressure ratcheted up considerably when IG’s parent company, Amoco, phoned Bock on 30 September with threats that Amoco would withdraw all research funding into UW unless UW agreed to negotiate a licence for the distribution and support of the GCG package to their IG subsidiary. This was followed up by a letter from Michael Kelly, President

Tools of the trade – trade of the tools

105

of IG, to Bock on 21 October 1988 making the positive case for a licence to IG, based on IG’s already existing reputation, size and capabilities in development, production, distribution and support. This sequence of communication, combined with the publication of the UW internal review of policy, catapulted GCG into a position where it could no longer continue to operate according to the previously successful model. In February 1989, a letter from Devereux to several prominent faculty members from the departments of animal science, biochemistry, computer science, molecular virology, oncology, pharmacy and zoology set out the basis for a new model for organizing and ﬁnancing GCG. This would involve the establishment of a new university institute for the study of theoretical and computational problems of modern biology. GCG and the revenues from its software package would become an integral part of such an institute, with the prospect of forming an NIH centre for computational biology, obtaining additional funding from the Human Genome Initiative. A national advisory group would be formed with responsibility for setting appropriate rates for users governing distribution. This type of national public science facility had been instituted in other situations and not been subject to the criticisms from private sector interests. But, barely two weeks later, it was evident that the GCG team had started to consider the spin-oﬀ option (letter from Devereux to Dean Robert Bock, 17 February 1989). Soundings from colleagues across the university had revealed mixed opinions, but for his own part, Devereux revealed the uncertainties and ﬂuidities at the public–private interface. Originally concerned that any spin-out would cut GCG oﬀ from its intellectual roots, he now believed that ‘intellectual inputs will continue to ﬂow eﬃciently into GCG from the academic community, even if GCG were organized as a private company’. From this statement, we can see how Devereux was keen to retain an open science/technology arrangement, where the ﬂows of knowledge would remain unaltered, even if ﬁnancing of production, distribution and development became commercially organized. His public science impulse is further demonstrated in his desire to retain a position at the university as well as running the new company. But, here he envisages a problem: I would like to continue to work as a University researcher and to continue to run a laboratory in the Biotechnology Center where campus researchers could come to work on sequence analysis problems with state-of-the-art equipment and software. I cannot imagine, however, how research or software development in my laboratory would not suﬀer under an unbearable (and perhaps even illegal) conﬂict of interest. I would frequently, as a University researcher, be working on problems whose solution would beneﬁt a company in which I had a strong interest.

106

Public or private economies of knowledge?

Before the issue of which model to choose for GCG was settled, the whole matter was brought to a head by a much more direct intervention from IG president Michael Kelly, on 17 March to the President of UW. The letter refers to GCG as an activity that ‘is a serious breach of ethics of the University of Wisconsin . . . where faculty members of [the] University have organised a mini-company on campus to compete with private sector business’. Kelly argued that the software package had been extensively marketed through advertising in journals and through their presence in Digital Equipment Corporation (DEC) booths at trade fairs. Furthermore, he claimed that the package was sold at extremely low prices, compared with industrial alternatives, made possible by ‘GCG utilizing either low cost University of Wisconsin graduate student labour, funding by government grants, the University of Wisconsin tax free status, or some combination of these advantages, which are unavailable to private sector businesses’. This prompted quick action within UW and a meeting was held between Dean Brock, Devereux and Maggie Smith, another key member of the GCG team. At the meeting it was agreed that there were only two serious alternatives: to spin oﬀ GCG as a new ﬁrm or to keep GCG within the university, with funding from the National Library of Medicine or the Human Genome Oﬃce, which would establish the university as an agent of the federal government in its administration of GCG. This second option would have aimed to establish GCG as a public institution with an economy of knowledge similar to PIR and Genbank. At that meeting, the assumption had been that some or all of GCG’s research, development and teaching-related activities would remain within the university and that only distribution and support would be spun oﬀ. It appears, at this stage, that there was a major reluctance to consider the IG proposal seriously. Before decisions were reached by UW over arrangements for GCG, pressure from Amoco and IG intensiﬁed further, with a letter on 8 May to Bock directly accusing UW of unacceptable practice. The letter referred to, and enclosed, a fax ostensibly proving that GCG was being marketed as a commercial product and was seen by potential customers of IG products as a direct competitor. The fax was from Imperial College, London, enquiring about the IG products, asking for direct comparisons on cost and functionality with GCG. The fax reads: A site licence for the Univ. Wisconsin package costs about $6,000 for this you get support and the source code for the programs. What is the cost of the IntelliGenetics suite of programs? If it is considerably more could you justify the extra pricing in terms of capabilities that your programs have which the Wisconsin package does not?

Tools of the trade – trade of the tools

107

This, the correspondence asserted, was clear evidence that the UWGCG activities were directly competing with IG, and that price was a major issue. In response, Bock told Amoco about the ongoing review and invited IG to make a full proposal, which they duly did. The proposal claimed that IG was the only suitable and existing ﬁrm for the worldwide marketing and distribution of the package. The proposal was for a speciﬁc organizational and ﬁnancial arrangement between GCG and IG. IG would be responsible for the production, distribution and marketing activities. It would provide support, training and updates of the package. But the GCG group at the university would continue to undertake research and development in bioinformatics, including work related to the package itself. The proposed user licence fee and royalty payments to GCG were as follows. For commercial users, the licence fee would be $7800, of which GCG would receive royalties of $4000. For academic or non-proﬁt users, the licence fee would be $5500 and royalty $2400. The proposal made a case that this arrangement would oﬀer GCG ‘a continuous royalty stream for development of new and improved software packages for the molecular biology community without the headaches of continued customer support and production and distribution’. It also stipulated that a collaborative development programme be set up between IG and GCG. So the proposal was for a new organizational conﬁguration, which would involve a split between the development of new bioinformatic knowledge from its distribution to users. Under this arrangement, the economic resources from user licences would still contribute to the bioinformatic knowledge, but, since technical support and marketing were to be coupled with distribution, the knowledge ﬂows from expert users to the GCG would be considerably weaker. The later proposal from IG did little to change the minds of UW faculty and, in the event, the decision was between the spin-oﬀ and federal supported public science models. While Devereux and others in the faculty had initially been keen on a public science model to keep the activities within UW, momentum had built up around the spin-oﬀ alternative, particularly amongst those directly associated with GCG. Several signiﬁcant concerns had emerged over the public science route. First of all, there were concerns that dependence on federal funds would place the future of GCG in jeopardy. Recent volatility in funding had created serious vulnerabilities for some of the key public science bioinformatic resources, such as BIONET and the PIR group at Georgetown. There would be no guarantees that the funds would continue in perpetuity. It was believed by the GCG team at the time that the federal funding route would place an obligation on them to relinquish any proprietary interest in the knowledge resources that would be developed under a public grant. This in itself was not perceived as a

108

Public or private economies of knowledge?

problem, since GCG had operated on an open-source model since its inception. The perceived problem was that, once the knowledge resources had been committed to the public domain, there would be no opportunity to develop a proprietary model in the future, even if federal funds dried up. The combination of these risks was precipitated into a belief that their decision to spin oﬀ amounted to a one-oﬀ opportunity. It was now or never. The saga closed, at least for a while, with the spin-oﬀ of GCG as a new company. Communications with Amoco/IG concluded with a letter from Dean Bock clearly taking some delight in explaining the situation by thanking the companies for their interest and for encouraging Devereux to pursue this path by expressing their interest in the package as a tradable entity and in their ‘conﬁdence in his ability [which had been] very helpful for obtaining the managerial and ﬁnancial backers needed to become a forproﬁt entity’. Given the quasi-ﬁrm economy of knowledge that GCG had operated in, the spin-oﬀ was initially very straightforward and for some time the nature of the exchange relationships with academic and industrial users remained almost identical. The open source model was retained with knowledge transfers in both directions, and the new company was sustained ﬁnancially by annual licence fees. Of course the user licences became more expensive, since GCG Inc. was required to pay taxes in its new economy of knowledge. But, during the 1990s, a succession of changes signiﬁcantly altered the arrangements for access to knowledge resources in the economy of the GCG package. GCG Inc. changed protocols for the distribution and exchange of bioinformatic tools in two signiﬁcant ways. The ﬁrst shift came as a response to the changing status of EGCG, the add-on applications developed by the virtual European bioinformatics community. To understand the friction that emerged, it is important to recall that EGCG had been developed on the basis of the GCG library of routines, providing diﬀerent applications, but nevertheless strongly dependent on the GCG technology. Despite this technological dependency, EGCG had been organized under a quite diﬀerent economy of knowledge. The EGCG development work had been undertaken under the auspices of many diﬀerent research projects in molecular biology across Europe. It had not received speciﬁc funding: bioinformaticians worked on the applications as a component of their broader funded research programmes, for example at the EMBL in Heidelberg. It had been initially distributed with the GCG package as unsupported additional programmes. Later, however, it had been made available through EMBNet, the European molecular biology intranet. It was the increase in the latter form of distribution that caused concern for GCG Inc. Originally, EGCG had only been made available to users with a GCG licence, but distribution of EGCG outside of this

Tools of the trade – trade of the tools

109

arrangement had grown to a point where it had become a rival of the GCG package on which it was based. This was resolved in the ﬁrst move by GCG Inc. to increase control over its intellectual property, when the company issued a new protocol insisting that applications based on the GCG libraries could only be distributed to holders of a GCG licence, eﬀectively closing oﬀ the opportunity for EGCG developers to distribute their applications without constraint and free of charge. The previously unproblematic relationship, when both were in the public domain, became unsustainable after GCG was privatized. The rights of the EGCG group to freely distribute what they had developed were curtailed. GCG could not live a double life, market and non-market. In a climate of increasing tensions between GCG Inc. and the EGCG community, the company took matters one stage further when they took the critical decision to withhold the source-code, previously made available to users of the GCG package. The rationale for this move was based on the assumption that GCG, as a for-proﬁt concern, required more stable control over its intellectual property to maintain the tradability of the product, or make company itself tradable in capital markets. To fully explain the rationale, however, requires an appreciation of the dynamics of the embryonic bioinformatics sector at that time. Provided it could demonstrate control over its intellectual property, a company such as GCG would be an obvious takeover target in a context where the fully integrated one-stopshop for informatics solutions was seen ultimately to be the dominant and successful commercial model. Moreover, retaining a public domain open source policy for the package ran the risk that other groups, such as EGCG, would write derivative programmes and this might lead to destabilizing intellectual property battles in the future. This risk would be greatly increased in the event of others following the same path as GCG by becoming private companies. ‘Black-boxing’ software by withholding the source-code produced a signiﬁcant transformation both in the character of intellectual property and in the producer–user interactions previously involved. In eﬀect, it simply and radically broke the knowledge feed-back loop. The opportunities for producing new knowledge through use were restricted in two ways. First, the communication between expert users and GCG developers was severely compromised in the absence of user access to the source code. Users could no longer identify bugs in the code or feed back their solutions. The transition marked the full separation of producers and users. Second, as a blackboxed product, the GCG package was of limited use for expert bioinformatic users. The product was non-customizable and diﬃcult to use for further development. In that way, it restricted its potential for new knowledge development by the scientiﬁc community. Unlike SRS, black-boxing severed

110

Public or private economies of knowledge?

non-market links with academia: GCG no longer had a foot in both market and non-market camps. But, arguably, the strategy served its purpose in at least one way. By 1997, GCG had been approached by several interested companies and was eventually acquired by Oxford Molecular (OM), a UK-based public company, in a deal worth $20million. However, by 2000, OM was struggling and was itself acquired by the US company Pharmacopeia, which was in the process of building an informatics subsidiary, Accelrys. We described earlier this process of building an integrated private economy of life science informatics with reference to the succession of acquisitions which led to the formation of Accelrys. Revisiting the issue now, we raise just a couple of new points. First, we can now marvel at the twists of fate that brought together erstwhile antagonists, IntelliGenetics and GCG, into the same organization. But, second, and more importantly, we can now see this succession of acquisitions in terms of the GCG package as an object of knowledge and its economic development. The Accelrys strategy, similar to and in competition with Lion Bioscience, was to create integrated life science informatics with a particular orientation towards the entire pharmaceutical R&D process. Attaining the anticipated eﬃciencies in that process relied on creating synergies between applications developed in diﬀerent domains: biology and chemistry, for example. But, as with SRS, the GCG package had been developed before application in the pharmaceutical context had fully emerged. These synergies proved elusive, just as they had for Lion. Consequently, the GCG package was traded in a number of ways. In some cases it was sold as a stand-alone package, while in others it was sold as part of an informatics systems product, albeit without the interoperability once imagined. Indeed, as of today, GCG is sold separately as the foundation of the genomic analysis software to expert bioinformaticians, while a more basic package, with a premium on usability, is sold to the wider molecular biology community. In all its guises, Accelrys sells GCG as a black-boxed product without access to the source code, traded for its functionality in sequence analysis, rather than exchanged as a knowledge resource for further bioinformatic development. By creating a sharp break between market and non-market economies of knowledge, however, GCG Inc., subsequently Accelrys, triggered a new level of competition between them. Public science was forced to respond to GCG Inc.’s withdrawal of the source code, if only because it undermined the existence and future of EGCG. The response took the form of launching a new, independent, rival called EMBOSS. The roots of EMBOSS lay in the EGCG package described earlier. Unable to simply release, let alone further develop, the EGCG applications, the main players involved in its development sought funds to produce an alternative. Initially supported by

Tools of the trade – trade of the tools

111

a Wellcome Trust grant, the European Molecular Biology Open Software Suite (EMBOSS) was written from scratch, with the aim of replicating the entirety of the EGCG oﬀering, but this time writing a completely new library of software routines to remove dependence on the GCG libraries. Development of the package was orchestrated and largely undertaken by a small team at the Sanger Centre in the UK, with inputs from expert bioinformaticians on preliminary alpha and beta versions. The ﬁrst public version was launched in 2000 (Rice et al., 2000) with the source code freely (in terms of access and cost) available and its continuing development was funded by a combined UK research council grant. In an explicit attempt to institute an ‘open’ economy of knowledge, EMBOSS was protected by the General Public Licence developed and administered by the Free Software Foundation (FSF). The history of this new practice of making knowledge public has its roots in the early 1980s, when concerns over the eﬀects of proprietary ownership of software ﬁrst surfaced. The licence establishes a speciﬁc form of ownership over software programs with the objective of distributing the source code and rights to use it as widely as possible. Freedom in this respect is deﬁned by Richard Stallman, the inventor of the new licence in the following way: Free software is a matter of the users’ freedom to run, copy, distribute, study, change and improve the software. More precisely, it refers to four kinds of freedom, for the users of the software: 1 2 3 4

The freedom to run the program, for any purpose (freedom 0). The freedom to study how the program works, and adapt it to your needs (freedom 1); access to the source code is a precondition for this. The freedom to redistribute copies so you can help your neighbor (freedom 2). The freedom to improve the program, and release your improvements to the public, so that the whole community beneﬁts (freedom 3); access to the source code is a precondition for this. (http://www.gnu.org)

So, this particular form of licence was designed to ensure that software can be freely shared and changed by users, and in particular that the source code is made freely available. These licences do not require that software be made available without charge, simply that all rights of ownership are relinquished, illustrating the possibility of exchange without private appropriation. However, in the case of EMBOSS, there is no charge to use it, so it is free in both senses of the word. A General Public Licence therefore is a particular way of instituting the ‘public’, by encouraging user–producer knowledge development and use, while restricting possibilities of use or adaptation for customized commercialization. In other words, it prevents any future possibility of a spin-out such as GCG: in a kind of tit-for-tat, the market extreme of black-boxing provoked the public domain extreme

112

Public or private economies of knowledge?

response of GPL. It limits commercial developments from a public resource, so instituting a sharper divide between the public and the private. This demonstrates that the public realm knowledge resources may be diﬀerently instituted, depending on how the relation between appropriation and use is prescribed. Interestingly, and without the legal formalities of GPL, the original economy of knowledge instituted for GCG in the very early period had objectives similar to this model, and existed before Stallman started his campaign for free software. EMBOSS grew quickly in functionality after initial publication in 2000. Freed from constraints of dependency on the GCG libraries, and with their own libraries available to the wider user community, the EMBOSS package replicated the great majority of GCG applications within a few years. Indeed, comparisons between the two packages, published on the NCBI website, reveal that they became more or less functionally equivalent. There was one further development in the licensing arrangements for EMBOSS, which draws attention to the intricacies involved in trying to institute a dynamic public domain knowledge resource and the characteristics of competition between the public and proprietary interests. Initially, all EMBOSS software had been covered under the GPL, but after a couple of years a decision was taken to apply the Lesser General Public Licence (LGPL) to the EMBOSS libraries. The preﬁx ‘Lesser’ refers to a weakening of the licence, compared to the GPL, in terms of the ‘freedom’ associated with using, distributing and modifying the software. The main diﬀerence is that the LGPL allows proprietary (that is, non-free) software to make use of the software library covered by the licence, without the insistence that the derivative work be made fully available under GPL terms. This subtle change translated into a signiﬁcant shift in the economy of knowledge for EMBOSS, by allowing ﬁrms to develop bioinformatic applications based on the EMBOSS libraries, which they would be able to sell without releasing the source code of the application itself. While the Free Software Foundation provided the LGPL, they have been at pains to advocate its use only in very speciﬁc circumstances. These conditions, and the decision of EMBOSS to pursue this route, illuminate very clearly the competitive dimension of both the FSF and EMBOSS in their rivalry with proprietary interests. So freedom of software is not something pursued in its own right, but in opposition to proprietary alternatives. The aim of the LGPL was almost to operate as a Trojan horse, maximizing the competitive advantage of open software, to destroy the market from within. This competitive objective is scarcely hidden from view: On rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this,

Tools of the trade – trade of the tools

113

non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. (http://www.gnu.org/licenses/lgpl.html)

By means of the LGPL, the developers of EMBOSS clearly wished their libraries to become de facto standards and of course the EMBOSS libraries were explicitly written to provide an alternative to the non-free GCG libraries. Furthermore, use of the LGLP would potentially maximize overall use of the libraries by allowing private ﬁrms to develop novel proprietary applications, while spreading the open-software concept, expanding the public EMBOSS economy via contact with private economies of knowledge. Viewed over the two decades of its trajectory, the GCG package, and the public rival it spawned, have undergone a continuous and intertwined process of epistemic and economic transformation. What was originally a software package provided by a quasi-ﬁrm in the public domain as an open knowledge development public resource, then split into two, with GCG becoming more thoroughly and traditionally privatized and EMBOSS emerging as a public realm resource that encouraged and opened up full public user–producer knowledge production. Historically, GCG started out as a public domain activity and came under attack from rival commercial interests, before becoming a private entity itself and them coming under attack from public domain interests. At the time of writing, it would be risky to predict whether both or either are sustainable in the long term, although it is clear that they are now each set in very diﬀerent trajectories. Accelrys has proved the commercial viability of a black-boxed source code model for GCG, and it remains as the centrepiece of its software and informatics knowledge market. It is now part of a more powerful Discovery Studio GCG, incorporating 140 interoperable programs, with interfaces designed to be easily usable for a wide range of users, academic and commercial. This demonstrates the sharp divide between producer and user involved in the construction of a protected source code, and at the same time a broadening of the potential market. By contrast, EMBOSS has struggled to attract public development funds (the UK Medical Research Council terminated its ﬁnancial support in 2005). At a critical turning point, however, it received funding until 2009 from the UK Biotechnology and Biological Science Research Council (EBI Press Release, 25 April 2006).

114

Public or private economies of knowledge?

Public sector science has been consistently reluctant to become entirely market-dependent on a key scientiﬁc research resource. Interestingly, the new funding provided for helpdesk facilities, and user-friendly interfaces, serving two distinct communities, the general user, whether public or private, constituting 80 per cent of its constituency, and expert bioinformaticians acting as producer-users, the further 20 per cent. At present, it appears that GCG and EMBOSS exist in uneasy parallel economic spaces. Whether, and on what basis, either survives only the future will tell. Unsettled Frontiers Most scientiﬁc tools are commercially produced, and without contention. The market mode is the taken-for-granted norm, yet economies of bioinformatic tools disturbed this pattern, and have not settled into established patterns. No boundaries between market and non-market activities, no clear divisions of labour or stable patterns of exchange and distribution, have been instituted. This is not because bioinformatic tools are software rather than hardware, products of mental labour rather than manufacture, although that might be a small part of the story, given costs and economies of scale. The analysis of transformations in public and private economies of bioinformatic tools in this chapter reinforces some of the conclusions drawn from the discussion of bio-data and introduces some new ones. First, in support of one of the issues raised in the previous chapter, we can extend our argument concerning ‘government failure’. In the cases of SRS and the GCG package especially, we saw how both these technologies sat uneasily in their respective academic organizations of origin. Once the tools had been conceived, the academics involved with their further development and distribution struggled to ﬁnd public funding. In the absence of the required support, both technologies were transferred from the public laboratories to form the basis for new spin-oﬀ ﬁrms. In the case of the SRS tool, there was an explicit understanding that this was the preferred option of EC oﬃcials. Therefore, it was only in the context of no government support that it was possible to create tradable products and to institute new markets for the two technologies. But, just as private bio-data provision came under attack from the public academic community, so too did the private tool providers. From the academic perspective, relinquishing control over these knowledge resources severely constrained the opportunity to make progress in the development of analytical techniques. The arguments were diﬀerent from those mobilized in the case of bio-data and it is worth dwelling on these, because they refer to technical characteristics of bioinformatic tools that played a major role in the wider structuring of public and private economies of knowledge in this area.

Tools of the trade – trade of the tools

115

Underlying our analysis is the fact that, at this stage of their development, bioinformatic tools are both scientiﬁc instruments – akin to telescopes or sequencers – and integral components of scientiﬁc theory building. They have a double, indeed, polyvalent use. If they were just tools, they would easily slot into the market trading mode. If they were just components of theory development, they would as uncontentiously be seen and funded as public science. It is their continuing polyvalence that creates the turbulence, economic and epistemic. We stressed early on in this chapter that tools comprise algorithms and their implemented manifestation in computer software programs. But there is also a technical divisibility within the software component. First, we can distinguish between the actual applications and their source code. But we can also distinguish between software libraries (of sub-routines) and applications too. While these components are clearly interdependent, it has proved possible to construct economies of knowledge around them in diﬀerent ways, producing diﬀerent degrees of diﬀerentiation. One possibility involved the publication of algorithms in academic journals, and the production and distribution of tools based on those algorithms organized through new software ﬁrms and speciﬁc types of market. Then we saw the possibility emerge of creating a divide between access to the use of applications and access to the source code. Withholding the source code signiﬁcantly altered the object of knowledge that was distributed and it was this form of private economy of knowledge that provoked a response from the academic community keen to promote a model of free access to the source code to maximize the potential for a wide community involved in further development. So, given the diversity of conﬁgurations that were covered in the chapter, it is clear again that the characteristics of particular knowledge resources do not script the formation of economies of knowledge in a deterministic fashion. However, it is equally clear that these characteristics of interdependency between knowledge components play a key role in the dynamics of transformation in economies of knowledge and have provided signiﬁcant friction between rival interests, particularly across the proprietary and open-source communities. The question of how to construct mutually reinforcing public and private economies of bioinformatic tool knowledge, through a stable division of knowledge-producing labour, has so far proved elusive. And one reason why this pattern is likely to persist for some time is that the ongoing pressure to develop new tools with enhanced performance or for diﬀerent purposes reignites the tensions time after time and in diﬀerent ways, raising new possibilities for private appropriation and new challenges for making the knowledge public. Closely related to these issues is the question of an emergent diﬀerentiation in the use of bioinformatic tools, particularly between the expert

116

Public or private economies of knowledge?

community involved in their further development and the wider group of molecular biologists interested solely in their functionality. The latter group required considerably less expertise in the workings of the algorithms, mainly to know how to interpret outputs reliably and how to use the tools appropriately. Consequently, they did not need access to the source code and development for these users focused on creating packages with enhanced usability and stability. But these packages were useless for the expert bioinformatic tool developers. Without access to the source code, they had no opportunity to customize or make improvements. To some extent, the outcome of the GCG/EMBOSS story reﬂects this diﬀerentiation, with alternative economies of knowledge oriented towards these two distinct communities of use. But it would be mistaken to assume that this is somehow a natural and stable outcome of symbiotic coexistence. For example, the recent boost in public funding for the EMBOSS package could seriously threaten the viability of the proprietary GCG package, as eﬀorts have now been directed towards producing a high usability version for distribution to the wider molecular biology community. Equally, it would be quite possible for Accelrys to develop a commercially viable economy of knowledge based on a more thorough integration of GCG into a suite of proprietary software for high usability attractive to a broad market of commercial clients. However, even in the event of a clear diﬀerentiation between constituencies of co-producers and tool users, the degree of interdependency within such a division of labour, and the rapidity of obsolescence, may well still ensure that a wholly public economy of knowledge gains competitive advantage for both types of constituency. The ﬁeld of bioinformatic tools has witnessed a wide variety of diﬀerent economic experiments in arrangements for the production, distribution, exchange and use of the key knowledge resources. There have been many diﬀerent types of bioinformatic tool corresponding to the wide range of experimental techniques and biological entities deﬁning the new genomic ﬁeld. The range of possibilities has expanded further because tools can be combined with each other or with biological databases to provide integrated packages, or with other related economic activities such as drug discovery and development. As is frequently the case with emerging commercial areas, the issue of how to cut the cake was resolved in a variety of diﬀerent ways. The resulting diversity reveals what could be developed as a tradable product and what could not. Some varieties remained viable for a short period of time before becoming untradable. In some cases ﬁrms fail, in others they reinvent their business models and product and market orientation along new lines. Overall, the ﬁrst two decades of commercial activity in bioinformatic tool provision represented a period of signiﬁcant experimentation, in terms of diﬀerent products, diﬀerent business models

Tools of the trade – trade of the tools

117

and alternative ways to institute market relationships. But the analysis has also shown how this economic experimentation extended beyond the realm of commercial ﬁrms operating in open markets. One key aspect of the experimentation concerned the ways that speciﬁc economies of knowledge transcended the public and private boundary. Furthermore, just as there has been experimentation in proprietary business, so too have there been many attempts to construct novel forms of public knowledge resource. There has been variety and intense competition both within and between the public and private domains, generating constantly developing and novel economies of knowledge.

NOTES 1. http://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtml. 2. As noted above, Bairoch was one of the pioneers in creating protein databases, and SWISSPROT. 3. Indeed, in 1998, Lion appointed the SAP founder to its board of directors (Lion, 1998). 4. This account beneﬁted in particular from interviews with Reinhard Schneider, Georg Casari and Thure Etzold, of Lion Bioscience, Peter Rice of Lion Bioscience and EBI, Iain Mattaj Director at EMBL and Janet Thornton, Director of the EBI. 5. GeneQuiz has continued as a public science resource and has been further developed at the EBI. 6. This analysis draws on interviews with John Devereux, Scott Khan, Peter Rice and Alan Bleasby. We are particularly grateful to John Devereux for providing us with correspondence and documentation relating to the decision to spin oﬀ GCG from the University of Wisconsin. 7. ISI Web of Knowledge, October 2006. 8. This is taken from a description of the licence written by John Devereux and given to the authors.

4. Collaboration and competition: the dynamics of a genome race INTRODUCTION We were in Brazil. Interviewing a Brazilian bioinformatician, Joao Kitajima, a story emerged concerning his colleague, Joao Setubal, then on sabbatical at the University of Washington, Seattle. The main focus of the research trip in Brazil had been a remarkable achievement of the sequencing of the ﬁrst genome of a plant pathogen, Xylella fastidiosa, published in Nature, as the cover page discovery (Simpson et al., 2000). The social scientiﬁc interest had been about how this achievement had been arrived at, and more particularly what distinguished the Brazilian pathway to genomics (Harvey and McMeekin, 2004). The story of what happened to Setubal in Seattle could only be picked up properly, more than a year later, on a subsequent research visit to Brazil. He revealed one perspective on an extraordinary genome race, involving probably the most globally signiﬁcant biotechnology organism, Agrobacterium tumefaciens, and its genome. It was a story that involved some major world players: the Genome Center at the University of Washington, one of the leading players, ﬁrst in the genome of yeast, but then in the Human Genome Project’s race with Celera; Monsanto, the world’s major agrofood company involved in genetic engineering; DuPont, the life-science and materials conglomerate; and the National Science Foundation, one of the major sources of public science funding in the US. We were able to track down and interview all the main protagonists of this genome race,1 although not without diﬃculties inherent in eliciting commercially or ‘politically’ sensitive information, one-sided knowledge, conﬂicts and tensions between the parties engaged in a high-stakes competition. The intensity of the rivalry, moreover, was evidenced by one of its remarkable, almost unique, aspects, namely that it ended in a dead heat and a back-to-back publication of the same genome in two papers in the journal Science (Wood et al., 2001; Goodner et al., 2001). Why publish duplicate discoveries? Were they indeed duplicates? We were discovering and revealing an overall picture of the race that was invisible to everybody, including those taking part in it. By virtue of how the race was researched, as well as because of its being such a gripping and singular historical process, the 119

120

Public or private economies of knowledge?

voices of the protagonists will be heard and in a much more strongly narrative account. Nonetheless, the story is absolutely central to the main themes of the book: the process of diﬀerentiation between public and private knowledge, their interdependence, how knowledge is made public or private property – these issues were inherent in some of the principal dynamics of the competition. The object of the race (the bug) had unique biological characteristics (as far as is known), and by sequencing its genome the boundaries between Stokes’s quadrants were being constantly broken down (Stokes, 1997). The uniqueness of the bacterium is that, in natural conditions, it transfers its own DNA into another organism’s genome – a plant – so as to express itself, feed and grow, as part of that other organism. The organism lives outside of itself. It produces a growth in the plant with which many people will be familiar, the oak apple or crown gall found on trees. Because of these unique properties, the bacterium was of interest at one and the same time to fundamental science about the functioning of DNA, genomes, metagenomics and the boundaries of organisms; to use-oriented fundamental science through an understanding of the evolutionary diﬀerentiation between symbiotic and pathogenic interactions between organisms related to the nature of causation of diseases; and to biotechnology by adapting the only ‘natural genetic engineer’ and its mechanisms to human uses, agricultural and pharmaceutical. Indeed, Agrobacterium tumefaciens is now both a ubiquitous free-living soil bacterium and a major biotechnological tool. Without these multiple dimensions, it is doubtful that the peculiar assembly of major players, both public scientiﬁc and corporate, would have become engaged in a race, or, that the race itself would have had so many twists and turns, stops and starts. Finally, the signiﬁcance of the story is just that it is a history: like many of our empirical cases, it is about a dynamic process, whose beginnings and ends are somewhat arbitrary. There was radical uncertainty, throughout the process, as to possible outcomes. The actual outcome, that two research teams both completed the sequencing of the same genome and at the same time was probably the least predictable – certainly to all the participants. Whether and what to make public and private, what bound and dissolved collaborations between public and private organizations, unfolded in a process of interaction involving many diﬀerent dynamics. Understanding those dynamics and their outcomes is the objective of this chapter.

THE COMPETITORS AND THE COMPETITION Before the race to sequence the genome of Agrobacterium tumefaciens began, the bacterium had long been an object of scientiﬁc interest, but also with a

Dynamics of a genome race

121

medical orientation, because of its association with tumours in plants, discovered as early as 1907 (Zupan et al., 2000; Binns and Thomashow, 1988; Braun, 1982). The nature of the association, how tumours were caused by the bacterium, was only ﬁnally determined in the late 1970s. A team at the University of Washington,2 Seattle, including Gene Nester, one of our protagonists, demonstrated that Agrobacterium transferred a single strand of its own DNA into a host plant cell. This then produced a ‘food’ (opines) in the plant that could only be broken down by the bacterium itself into a carbon–nitrogen source. As it induced plant cell division, and produced a food that it alone beneﬁted from, the relationship between bacterium and host was pathogenic. A further aspect, critical to the process, was that a virulence gene in the bacterium sensed a chemical signal from wounds in the plant, that triggered another virulence gene to slice the tumour-causing gene and trigger the process of its transport into the plant’s genome. The transferDNA (T-DNA) and the virulence genes were located on a plasmid, one of two chains of molecules additional to the two chromosomes contained by the bacterium. The discovery challenged fundamental assumptions about cell organization and integrity of an organism’s genome from invasion and integration by another organism’s genes (Chilton et al., 1977; Chilton, 2001). Before the advent of molecular biology, and in particular eﬀective technologies of gene mapping, much of the biology and genetics of this unique interaction between microbe and plant remained unknown. But, the emerging molecular biology of the late 1970s oﬀered new possibilities in this respect and, with it, a new public–private diﬀerentiation of interest in Agrobacterium. Quite soon, Mary-Dell Chilton left the Seattle Crown Gall Group for Washington University, St Louis, where, together with colleagues, she was involved in a race with her near Monsanto neighbours in the early 1980s, to harness the public scientiﬁc knowledge to a technology of transferring genes of human choice into plants, a Trojan horse suited to ‘gene-jockeying’ (Gelvin, 2003). Not long afterwards, she joined a biotechnology company, Ciba-Geigy, later absorbed into Syngenta. By the early-to-mid-1990s, patents were being ﬁled by several corporations including Syngenta and her close competitors and neighbours at Monsanto, with claims relating to Agrobacterium as a tool for genetic engineering, speciﬁcally adaptations of the natural gene transfer mechanism. We deal at greater length later with the patent landscape surrounding Agrobacterium as it developed. Here, it is important to signal that, prior to any genome sequencing, a certain conﬁguration between public and private knowledge involving Agrobacterium emerged with the public science discovery of the natural mechanism of gene transfer, a mechanism relatively easily adapted to genetic engineering by taking out the tumorigenic gene of the pathogen and replacing it with a gene of choice. By the time the sequencing was to

122

Public or private economies of knowledge?

take place, therefore, Agrobacterium was already a major biotechnology tool, with substantial intellectual property stakes. Whether from the fundamental science or biotechnology point of view, however, much remained to be discovered about the biology and mechanisms involved, some of which could only be approached by whole genome sequencing, gene function identiﬁcation and comparative genomics. The unusual presence of two chromosomes, one circular and one linear, had been established, but location of the key genes related to the gene transfer and transport mechanism on the plasmids was unknown (Jumas-Bilak et al., 1995, 1998). Brad Goodner, another of the eventual protagonists, having published a combined genetic and physical map of the two chromosomes (Goodner et al., 1999), already signalled in 1998 the need for whole genome sequencing to resolve outstanding questions. For a race to have occurred at all, it required the emergence of at least two runners, a shared winning post, an approximately level playing ﬁeld, and the resources to sustain the runners. Each of these four components was quite problematic. Both the eventual runners encountered diﬃculties in starting the process of sequencing the genome. Since 1979, a quite tightly-knit community of scholars, meeting once a year at the Annual Crown Gall Meetings, had been an established, attended by both public institution and corporate scientists. Brad Goodner, ﬁrst at the University of Richmond and then at Hiram College, had succeeded in acquiring a small National Science Foundation (NSF) grant for initiating a sequencing project, suﬃcient to undertake the ﬁrst 10 per cent of the genome. He reported the initial results to the community in 1997. A small college scientist, he made approaches to Gene Nester as a leading ﬁgure of the community, based in a prestigious university, the University of Washington, Seattle, but was rebuﬀed several times, and failed to seed a public science community collaborative project, whether for reasons of hubris, impracticality, or whatever else. Meanwhile, Gene Nester attempted to get public funding for his own laboratory for the whole genome project, but also failed. In view of the signiﬁcance of Agrobacterium, scientiﬁcally and biotechnologically, it was puzzling both that the initiative was so late, compared with other projects, and that it lacked public science funding support. Nester certainly considered that resistance to funding from public funders was possibly a consequence of the low priority of plant pathogens compared with other, more directly human-health related, projects within overall public science resource constraints. Possibly, also, the very ambiguity of the bacterium, the fact that it could be boxed unequivocally into a fundamental science or biotechnology tool corner, induced uncertainty as to whether the research should be supported by commercial or public funds. Whatever the case, it is important to note that an exclusively public route to the whole genome sequence appeared blocked to all parties.

Dynamics of a genome race

123

However, just at the time that Brad Goodner was initiating a smallscale sequencing project at Hiram College, so too was Monsanto, in house, and behind ﬁrewalls. In 1997, Monsanto had formed Cereon, to give organizational expression to its strategy of entering into genomics R&D. Agrobacterium was to be one of its ﬁrst major projects, a ‘ﬂagship’, according to Steve Slater, the Monsanto scientist who, as head of their bacterium genomics programme, was to lead their Agrobacterium sequencing project. Steve Slater attended a meeting at which Brad Goodner presented some initial ﬁndings from his sequencing project, and immediately proposed a collaboration between Monsanto (Cereon) and Hiram College. This private–public collaboration can be characterized in power terms, as being between a major, global, biotechnology corporation and a solo scientist in a fairly minor US college, a clear dominance of Monsanto in the relationship. This power aspect is signiﬁcant, and played a considerable role in the unfolding drama of the race. Gene Nester, having been frustrated in attempts to run an exclusively public-funded sequencing project, approached Monsanto for a possible collaboration, but was cold-shouldered on two occasions, suggesting that this was not the collaborative relationship they were seeking, despite, or maybe because of, the high proﬁle and standing of their public science counterpart. Eventually, Nester entered into lengthy negotiations with DuPont, involving protracted legal discussions, and a formal contract in which the terms of the collaboration were laid down and a division of labour agreed. The negotiations delayed the starting of their project by almost a year, a delay that, barring other developments, would certainly have proved fatal to any chances of winning. The ﬁnal clinching of the deal with DuPont was strongly promoted by its subsidiary, Pioneer Hi-bred, a seed company, which had a direct stake in intellectual property surrounding Agrobacterium, and a commercial interest in genetic engineering of plants. A key aspect of the patent landscape preceding the race was that Japan Tobacco held core patents to Agrobacterium-based technologies, licensed to Monsanto, who also obtained signiﬁcant patents to complement those on licence. It appeared to all concerned that DuPont and Pioneer Hi-Bred’s interest in the genome was to strengthen their intellectual property position in Agrobacterium. Both Monsanto and Pioneer-HiBred obtained patents related to the genetic engineering of soybean, in 1995 and 1996 respectively. Unlike many cases of the division of the spoils typical of patent landscapes, they were competing in the same market. So these were the two competing teams: Team 1, Monsanto leading the project with Brad Goodner at Hiram College; and Team 2, the University of Washington led by Gene Nester and the project-coordinator Derek Wood, together with DuPont and Pioneer Hi-Bred: two corporate, and one

124

Public or private economies of knowledge?

public science heavyweight, and one signiﬁcant but relative minnow public science facility. Before analysing the race itself and its dynamics, it is worth reviewing and contrasting the nature of the two public–private collaborations set in competition with each other. At the start, neither really knew that the other was in the race, and, unlike most races, neither could see their competitor: it was a race run, not so much blinkered, as blindfold. The arrangement between Monsanto and Hiram College was relatively informal, quickly established and, as both parties had already commenced sequencing, allowed them a running start almost a year before the other team. A key feature of the collaboration, almost identical with that of the other team, was a division of labour in which the corporate party undertook the capital-intensive, initial sequencing of DNA by high-throughput sequencing machines, whilst the main burden of labour-intensive annotation describing the functional and biological signiﬁcance of gene sequence fell to the public science partner. For this division of labour to operate, however, a distinctive and critical condition was that the public science partner had access to the databases, genomic and preliminary automatic annotation data, behind the corporate ﬁrewalls. But this access was highly controlled. Both public science parties describe this technical facility of access to secrecy-protected knowledge with frustration, as in both cases the corporate partner restricted this access for the principal public investigator to a single dedicated computer. This created a considerable bottleneck and lack of ﬂexibility to the relationship. It meant that the whole teams of annotators and ﬁnishers were reliant on just the one point of entry. So much was in common. But there were also signiﬁcant contrasts, which were to have a profound impact on the processes of competition and diﬀerentiation between public and private knowledge. In the Monsanto– Hiram collaboration, the division of labour was less sharp and, indeed, the agreement was that both parties would be fully engaged in the process of completing the genome through to its conclusion, with Monsanto contributing to its ﬁnal ﬁnishing, including depositing the complete genome sequence on GenBank, and publishing a complementary paper in a scientiﬁc journal. That was the arrangement, but one that was unsettled throughout by a persistent and radical uncertainty as to whether, in the end, Monsanto would go right to the ﬁnishing line, and indeed, go public. Much of what was commercially signiﬁcant could be gained, as we have already described in Chapter 2, without genome completion. The signiﬁcant genes could be identiﬁed, as could some key functionalities related to the DNA transfer and plant transformation process. The fact that, in the last analysis, Monsanto was involved throughout, and called the shots, made the public science partner especially vulnerable to a reneging of the agreement. Within this unequal power context, Brad Goodner’s

Dynamics of a genome race

125

access behind the Monsanto ﬁrewalls, an access that gave him the possibility of exploring the whole range of Monsanto’s genomic databases, not restricted to C58 data, was based ﬁrmly on trust that he would not abuse his freedoms. In the University of Washington–DuPont partnership, there was never a question as to the ﬁnal outcome, not least because of the terms of the contracted and formal division of labour. The contract in eﬀect made the race in part a relay race, with DuPont at ﬁrst running together with the University of Washington, contracted to deliver sequence data up to a quality standard measured by the number of samplings of the whole DNA genomic material – referred to as 6x coverage. Thereafter, the baton was handed over, and, under complementary National Science Foundation funding, the race was to be ﬁnished, both in terms of improving the sequence data quality to a coverage of 10x and of all the work of ﬁnal annotation, entirely by the public science team. The involvement of National Science Foundation funding, even when explicitly promoting public–private collaborations, prescribed publication. This resourcing and contractual arrangement, giving primary responsibility to the public science partner for genome completion, therefore formalized the initial conditions of the race. Having seen why public science parties to the collaborations sought private corporate partners, one has to ask the reverse question of why the corporate parties in both collaborations were interested in engaging with public science. It is a question that cannot be given a deﬁnitive answer, particularly because corporate strategies and interests changed over time. At the beginning of the race, the question can beneﬁt from being given a particular focus highlighting the issues at stake. The question was, why sequence the genome of the particular variety of Agrobacterium, the one that was the public-standard version used in all public experimental laboratories, C58? Why did Monsanto or Pioneer Hi-Bred become involved in sequencing a genome that was not the one they used as a biotechnology tool for genetically engineering plants, a hyper-virulent Agrobacterium biovar? These questions however also make an overall assumption that must be problematized from the outset. It was assumed (by all parties, public and private) that there was in fact not one race, but two. There certainly was a race to genome completion and publication, and the choice of C58 was the manifest of that interest. Equally certainly, there was another race being run by the corporations – or so everyone thought – to obtain formal property rights over data generated by the sequencing, genes of interest, for example. By virtue of the diﬀerence between the collaborative arrangements, the question could be rephrased in terms of whether Monsanto, for its part, would see the race through to a ﬁnishing line of publication, once it had reached an earlier ﬁnishing line of securing proprietary knowledge; and in

126

Public or private economies of knowledge?

the case of DuPont, whether it ever had any interest in the ﬁnal outcome of the race, once the baton had been handed on. Put another way, in the case of the Monsanto–Hiram collaboration the diﬀerentiation between public and private knowledge appropriation, and where the boundary fell, was undecided at the outset of the race. In the case of UW-DuPont, the division between public and private knowledge was already preﬁgured. However, a process account, the dynamics of the interaction between two overlapping races between two diﬀerently conﬁgured private–public collaborations, will reveal the fragility of even the best laid plans of bugs and men. Even at this point, it is clear that both the corporate competitors, especially the leading scientists involved at DuPont, Jean-François Tomb and his counterpart Steve Slater at Monsanto, considered that their companies had an interest in contributing to the growth of public domain, fundamental scientiﬁc, knowledge. The issue was perhaps clearer with Monsanto – at the time battered by assaults on its public reputation – with its investment in Cereon, a fundamental, genomics R&D facility.3 But, as we shall see, DuPont also engaged in signiﬁcant open public scientiﬁc activity. Although we revisit the fundamental question posed here throughout the historical account, perhaps the most signiﬁcant reason for the private engagement in public knowledge production was that, unlike sporting races, here no-one knew in advance what was to be gained by knowledge of the whole genome sequence, but everybody thought something signiﬁcant would be gained. The potential value of a public resource for private uses that were dependent on them meant that, even though the private uses could be diﬀerentiated from the public values, those could only be realized once those public resources were in place. There was also a sense that the quicker the public resource was available, the quicker the public resourcedependent private uses could be realized. Thus, in explaining the Monsanto choice of C58 and its engagement with Brad Goodner at Hiram, Steve Slater put it quite simply: ‘We might be able to look through the genome and understand better how Agrobacterium works, and use that knowledge to help us to make it a better tool.’ And again, emphasizing the unknown gains: There’s a feeling in the world of science, academic and business sides of science, that there’s this huge treasure trove. We’ve got to get into this. . . . Monsanto was no diﬀerent from anyone else, probably DuPont were saying the same thing. How are we really going to get value out of all of this genome sequence? Probably, if we can predict what genes do, we may not care so much if somebody already knows about that gene.

There is a further aspect, however, in the C58 choice. Steve Slater suggested that they could get everything they needed to know about their in-house

Dynamics of a genome race

127

biotechnology biovar from the C58 public science, while still protecting their proprietary rights over that biovar. This expresses the mutual dependency beneﬁts of a knowledge divided between public and private at certain borders. And ﬁnally, the collaboration was driven, from a Monsanto standpoint, from a clear diﬀerentiation of capabilities. Cereon had highthroughput sequencers, skills required for sequence assembly and for automated annotation. Goodner had a key knowledge derived from his research on physical genetic mapping that would enhance and accelerate the whole genome sequencing, and a capability to deliver the ﬁnal ﬁnishing annotation. Monsanto simply did not have the internal resources to go it alone, and would be unlikely to invest in acquiring those publicly available capabilities, especially if they were available essentially free of charge in a public–private collaboration.

LET THE RACE BEGIN Or rather the two races: for the race to complete and publish the whole genome sequence and the race for intellectual property provided a metalevel of competition on top of that between the two teams. Indeed, right from the beginning, the incentives of one race operated against the incentives of the other: acquiring property rights provoked delays in publication; speeding up towards publication, reduced windows of opportunity for patenting. This is critical for understanding the various phases of the race, and its signiﬁcant crises. The pacing and the phasing of the race by the two teams is summarized in Table 4A.1. More schematically, the contrast between the way the two teams ran the race could be characterized as in Figure 4.1. What is remarkable about the race to publish and complete the genome is that either team could easily have won – and by a large margin. Monsanto–Hiram had virtually ﬁnished by November 2000, a year before they eventually did so. If there had been no signals between the teams, UWDuPont could equally well have preceded or pre-empted a Monsanto– Hiram ﬁnish, also by a matter of months. The dead heat both to sequencing (14/15 August 2001) and publication (December, 2001) depended on ignorance of what the competitor team was doing at critical points, and signals to the other team triggering a change in pace at others. The early phases of the race, on the surface for both teams, were characterized by the dominance of the competition for private appropriation and patenting. Indeed, Brad Goodner’s ﬁrst presentation of early sequencing in November 2000 that occasioned the collaboration with Monsanto was followed by a gap of three months, allowing Monsanto time to

128

Public or private economies of knowledge? Monsanto/Hiram College

Goodner starts sequencing/ seeks collaboration

Collaboration to near sequence completion by November

Slow

Monsanto seeks IP rights

Race to deposit genome 14 August

Slow

Quick

Quick

1998

1999

2000

2001

Race to publish Quick

Aug 14/15 Dec 10

Slow

Stop

University of Washington seek collaboration, negotiation with DuPont and commencement of collaboration

Progress halts; break-down with DuPont

Quick Race to deposit genome on 15 August Campinas joins University of Washington

Quick Race to publish

University of Washington/Dupont/Campinas

Figure 4.1

The race to sequence Agrobacterium tumefaciens

consider ﬁling patents on gene sequences that they had already completed. Thereafter, the Monsanto–Hiram team sequenced as rapidly as possible in order to identify key genes of interest, especially those involved in plant metabolism. We did hustle along to get intellectual property ﬁled as quickly as we could, especially when we heard these guys (UW-DuPont) were also working on it . . . . We had been looking for genes that we thought would be of utility, and pulling such genes out, and in a few cases writing separate patents on these . . . The longer you have to do that before you make it public, the better chance you have of gaining as much proprietary value out of it as possible. (Steve Slater, interview)

Likewise, with the UW-DuPont team, although the pace from the start was frustratingly slow from a UW perspective for other reasons as well, there was a strong focus on obtaining property rights. But here a critical aspect of the diﬀerence in the private appropriation race for the two teams needs to be highlighted. Monsanto already had some patents, and they held exclusive licence to the key patents held by Japan Tobacco. From the outset, DuPont were looking for ways to circumvent the intellectual property centre ground held by Monsanto. We had a number of these things that were in the contract. We were interested in recalcitrant transformation, and that was obviously the main focus for both Monsanto and DuPont, to transform recalcitrant plants.

Dynamics of a genome race

129

My idea was, can we deﬁne a unique system that will bypass this (Monsanto) patent . . . I sent them a number of reports on what genes biologically are relevant. Those lists that we produced went to their legal team, and they decided whether or not to ﬁle patents. (Derek Wood, interview)

These reports had been part of the UW-DuPont contract and, by August 2000, Wood and his team had delivered them 800–900 genes with their functionalities described and annotated. Meetings with DuPont lawyers were always accompanied by UW lawyers, and UW had some kind of prospective royalty deal on the eventual patents. From the above, it is clear that both teams strongly anticipated that Agrobacterium genes, as identiﬁed in the sequencing project, would be subject to patenting. Both teams were aware of a competitive race between their corporate partners, one seeking to consolidate and advance its proprietary position, the other to circumvent and break novel ground, by patenting genes.

THE RACE CHANGES COURSE Let us freeze the race at around the beginning of November 2000. The Monsanto–Hiram team had been going full steam ahead. They had almost completed the project and, according to both Slater and Goodner, were within a couple of months of depositing the whole genome at the public database, Genbank. At that point, a major signal to the competing team was made at the 21st Annual Crown Gall Meeting, where an initial presentation was made. This was followed up by another presentation, this time attended not only by Derek Wood himself, but by Joanne Torrow, from the National Science Foundation, who had a grant proposal from UW on her desk for completion of the genome sequence. But, having got so close to the publication ﬁnishing line, and signalled as much, Monsanto then slowed down to a virtual stop, much to the dismay and anxiety of both Slater and Goodner, and unknown to their rivals. The assumption was that the race to acquire intellectual property was primary, and doubts grew as months passed as to whether Monsanto would ever pursue the second race to data deposition and publication to the ﬁnish. The UW-DuPont partnership was far from ﬁnishing. Progress had been painfully slow, and moreover the period of collaboration with DuPont was drawing to a close in December 2000 without DuPont having delivered their contracted 6x coverage. They had only produced just over 3x, removed key personnel from the work, and were becoming increasingly uncommunicative. Moreover, UW were able to trace how frequently and who in DuPont were viewing their reports on genes, and they detected little or no interest.

130

Public or private economies of knowledge?

It became clear that DuPont’s strategy to circumvent Monsanto’s intellectual property by taking the Agrobacterium route had gone out of favour, and that they were now concentrating on other genomes. This was an embarrassment and frustration to their head of bioinformatics, JeanFrançois Tomb, but it was also clear that he too was turning his interest to more generic processes of bacterial transformation, using other genomes such as E.coli. If the race had stayed on the same footing, although neither party knew this, completion and publication could well have been delayed indeﬁnitely. But now UW, believing that Monsanto was close to the ﬁnishing line, knew that, if they were to stand any chance, the whole footing of the race had to change. It took a Brazilian turn, and, at the same time took an exclusively public course. Joao Setubal had led the project in Brazil to sequence the world’s ﬁrst plant pathogen, the bacterium Xylella fastidiosa, discussed above. He had both the annotation software skills and the comparative bacterial genomic knowledge that was lacking to UW. Derek Wood approached him, early in 2001, and asked him whether it would be possible to acquire and then adapt the data currently embedded in DuPont automatic annotation software. As communications between UW and DuPont had more or less broken down by January 2001, it was decided to go behind their ﬁrewall and download the whole Agrobacterium database to Unicamp bioinformatics computers in Campinas, Brazil, without DuPont’s knowledge. I started talking to Derek in January, and in March, and over the weeks, he would be saying, Oh, Monsanto is getting ahead of us. Du Pont is not answering my phone calls. He was very anxious. I was immediately aware that if we were going to switch everything from Du Pont to my lab, we would have to do it in a somewhat secretive way, because otherwise we would meet resistance . . . Then during March and April, I started writing software that was going to read everything that was in the Du Pont system, convert it to our system, test it, and then if it was OK announce to Du Pont, not to keep it secret from Du Pont all along, but just to keep it secret until we had conﬁdence that we had everything that we needed. Because they could have said, if you are going to do it that way, we’ll turn oﬀ the computer. (Setubal, interview)

Setubal converted the data into an annotation interface that was available to the whole UW team, facilitating rapid querying and adaptation on an open collaborative basis, so also breaking the bottleneck of the single computer access with DuPont. The secret download went undetected at DuPont, who were informed some time later of the fait accompli. There was some legal rumbling, and a degree of upset, but DuPont had in any event committed themselves to other routes. But before we say goodbye to DuPont as they exited the race, it should be emphasized that this did not

Dynamics of a genome race

131

signify the actions of a corporate giant intent solely on commercial advantage from intellectual property rights. Tomb himself, as we have seen, was instrumental in assisting the formation of Uniprot, especially by brokering the negotiations with the Protein Information Resource. Moreover, if DuPont were abandoning Agrobacterium for a greater interest in E.coli, they were full sponsors and participators in the E.coli public–private consortium. For them, it was not a private–corporate proprietary knowledge versus fundamental public science issue. Meanwhile, in high anxiety, Setubal and Kitajima in Campinas and the UW team, now pressed on toward completion, the race having become realigned between the now exclusively public–public collaboration between UW and Unicamp and the public–private collaboration of Monsanto– Hiram. Slater and Goodner knew nothing of this change in course, either the split with DuPont, or the entry of the Brazilians. At this point, however, a further development ampliﬁed the competition between processes of making knowledge public and private. Having heard the Monsanto–Hiram presentation, the National Science Foundation needed to make a decision on whether to support the UW project, and award a grant. After all, there would be no point in doing so if it was duplicating an eﬀort that was already far advanced. Philip Harriman, a programme director responsible for making the award, contacted Brad Goodner and asked him how far the sequencing had progressed and whether Monsanto intended to deposit the data and publish. They [University of Washington] believed, and were able to convince the NSF, that we would never put it out, or put it out in a timely manner. How else to explain what the NSF did? I found it extraordinary that NSF called me. That never happens. To ask a question – because I didn’t have an NSF grant – where are you at? And, I was very open with them. I couldn’t give them an exact date, but I told them that was the goal, and that we had every intention to publish. Obviously UW made a strong case to the NSF that wasn’t fast enough, because NSF turned right around and funded them . . . I was up front with the NSF programme director, and I said, do I know for sure that we’ll publish? Could they prevent us from publishing? Could Monsanto say no. Well, yes. Do I think that could ever happen? No. I think we will publish. I can’t tell you when. All I can say, as soon as we possibly can. (Goodner, interview)

In the event, the NSF did award UW the necessary funds to complete the sequencing, and pay Setubal’s salary, amongst other things. Interestingly, however, they still presented to the world a picture, one they probably believed,4 that the achievement of the full genome sequencing was by a stillintact public–private collaboration. This was their interesting statement after the race had been fully run:

132

Public or private economies of knowledge?

With principal support from the National Science Foundation (NSF), researchers at the University of Washington (UW) partnered with the E.I. du Pont de Nemours Company to sequence the genome. Other collaborators include researchers from The Institute for Genomic Research, SRI International and the University of Campinas in Brazil. The resulting data will be available publicly for use by other scientists. One indication of the importance attached to A. tumefaciens is that the DuPont company had completed a ‘rough draft’ of the sequence for proprietary purposes. Maryanna Henkart, director of the NSF Division of Molecular and Cellular Biosciences, said that the university researchers’ cooperation with DuPont was important to avoid duplication of eﬀort.

The Brazilian turn and the NSF grant together re-launched the UW participation in the race. But, at least as signiﬁcantly, it had a considerable impact on the Monsanto–Hiram team, as soon as they discovered not only that UW had been awarded the grant, but also that they were now joined by the Brazilians, and, thanks to the assistance of Jean-François Tomb, also by Jonathan Eisen and Ian Paulsen from The Institute of Genomic Research (TIGR), although in a relatively minor role. Slater was able to impress on the senior Monsanto management the seriousness of the competition coming from UW, and the prize to be won if they were once more enabled to re-enter the race. The entry of the NSF into the competitive arena triggered the relaunch of the Monsanto–Hiram participation as well.

TO THE FIRST FINISHING LINE Both teams then were frantically attempting to ﬁnish ﬁrst, and extremely nervous about how close the other team were to depositing the whole genome sequence. The Monsanto–Hiram team: I was very nervous about it. We had come this far. I really had to lobby quite a bit. One on one with all the appropriate people and also in larger groups to try to get clearance to get all the resources to ﬁnish the genome and then clearance to ﬁnd a little bit of money for Brad’s lab, so he had supplies and that kind of stuﬀ, and then clearance to release the sequence. As we were getting close, and as part of the AB community so you hear rumours and the chatter was heating up about UW being close. This was helpful to me because I could say that we were sure that those guys were getting close, and we really need to get this done. I know that during this period when we slowed down, I ﬁgured that it was lost. I knew that if they had been ploughing ahead full steam, they could have beaten us easily, by six months. (Slater, interview)

The UW-Campinas team:

Dynamics of a genome race

133

There was a great deal of nervousness and tension, especially near the end. Especially I guess from July onwards, it started to become an issue of this is a competition, right. Maynard’s [Olson] a very conﬁdent person. He says, we’ll certainly be able to ﬁnish this and there’s not going to be a problem as far as competition. He’s obviously a competitive guy, he’s done the human genome, and competition is in his blood. I’m not quite so conﬁdent as I think he is, so I was more, ﬁrst thing I think that I considered was that the amount of work we were putting into it was phenomenal. It was a huge eﬀort, and at some point it occurred to us that there’s also another team on the other side, behind this black wall, who’s essentially doing the same thing – that’s a problem. (Derek Wood, interview)

How the race to deposit ended in a dead heat is relatively straightforward to explain. One of the conditions of receiving the NSF grant was that the UW released sequence data on a quarterly basis so that it would become available to the scientiﬁc community at large, as we discussed above. On the occasion of a poster-presentation of sequence data at the 10th International Congress of Molecular Plant Microbe Interactions, Wisconsin, 10–14 July, whether by mistake or not, Derek Wood not only published data showing how advanced the UW team was, but also announced the date when the whole genome would be published, 15 August. A ﬁnishing line was announced, and the Monsanto response, given their closeness to ﬁnishing, almost inevitable. ‘I remember someone at a meeting saying that the best thing would be to publish one day before they do. We ended up releasing one day before they did. It was a dead heat’ (Slater, interview). The reaction at UW, expressing a certain amount of damaged hubris for the world-class centre of genomics science, was less than that of triumphant joint winner. I think that everybody was very frustrated that the day before we had set to release, we saw them release. Obviously, they had a reason for it. We knew that there must be some advantage for them releasing from the corporate perspective. . . . That’s very frustrating for this group. Everybody in here was upset about it.

The historical peculiarity and signiﬁcance of this dead heat – quite unlike the negotiated dead heat of the rough draft of the human genome – is that it was the outcome of a process in which eventually the race to make data public predominated over the race to acquire intellectual property rights. This was by no means certain at the outset, when both DuPont and Monsanto were involved, and nor was it clear when the UW-DuPont collaboration broke down. The role of the NSF in re-igniting the competition for both teams proved quite critical, but the decision to resource UW was, as we have seen, based on partial information and assumptions about Monsanto promoted by UW.5 The NSF decision unquestionably hastened the whole process, and ensured the public deposition of the genome sequence. However, it was clear

134

Public or private economies of knowledge?

that the NSF considered that it was funding a process leading to publication by UW, not initiating a competition that resulted in Monsanto depositing as well. Moreover, there was a clear and general sentiment expressed by all those directly concerned that the competition was in one respect wasteful. There was a duplication of eﬀort, but, as we have noted, quite considerable resources in research time and public and corporate funding, spent on the same result, when it might better have been devoted at least to two diﬀerent strains of Agrobacterium. Public money would not have been spent had Monsanto not stalled. For Monsanto, as the only remaining corporate organization signiﬁcantly involved in the race, however, it was clear that throughout there were considerable internal conﬂicts, rather than any consistent strategy. The tension expresses the peculiar dependency of private corporations on public science. From a narrow commercial standpoint, there was no reason to pursue the genome through to the ﬁnishing line. But from a broader, still commercial, perspective this is not at all as clear. I think there’s a big morale piece to it. There are a lot of scientists within this company who are very good scientists who do a lot of work and don’t have the opportunity to publish in the way that an academic lab does. We’ve all come up in this culture of you do science, you publish. Science is back-breaking work and often very thankless and that’s the one thing you get. You put that paper out there and people recognise that you’ve done a good piece of work. You feel pretty good for a couple of days and then it’s oﬀ to the next two years of thankless, back-breaking work. But, it was important for Monsanto as a company to establish that we are continuing to do high quality work in this new area of genomics. It’s important for the people involved with the project to ﬁnish a project. In the end everyone understood that and agreed. (Slater, interview)

On the one hand, enhancing scientiﬁc reputation was part of Monsanto’s market positioning. On the other, employing scientists with a culture of science and ability to pursue it at the highest level requires some accommodation to publication as central to certain forms of knowledge production. The end result of the race to this ﬁrst ﬁnishing point was clear. Genome sequence was made public, genes were identiﬁed, and biological processes were better understood. After analysing the ﬁnal lap of the race, the race to publish, we will return to the issue of the public–private diﬀerentiation and interdependency of the knowledge of Agrobacterium that ﬁnally resulted.

PUBLICATION AND THE END OF THE RACE Completion of a genome sequence is, as we have seen, a distinct process of knowledge production, containing only the raw sequence data, without

Dynamics of a genome race

135

scientiﬁc analysis or complete annotation. Deposition of data onto Genbank is then a phase in both public appropriation and distribution, but data deposition only ended one part of the race. It was still possible for one party to secure publication of a scientiﬁc paper analysing the whole genome sequence of Agrobacterium in advance of the other, so eﬀectively preventing the other party from doing so. A precedent had occurred with another bacterial genome, that of E.coli, where several teams had been engaged in publishing sections of contiguous sequence in their race to complete the genome of the same, laboratory, K-12 strain. In this respect the parallels with Agrobacterium are close, except in this case the race was entirely between the public laboratories of Blattner, in the USA, and three teams in Japan led by Yura and Ishihima, Mizubuchi, and Horiuchi and Yori (Hayashi et al., 2006; Riley et al., 2006). Interestingly, the parallel continues, as subsequently discrepancies were found in the strains of K-12 being used by diﬀerent laboratories, and work continues and is still continuing in the comparative genomics of E.coli, raising a quite fundamental question as to whether a genome is ever complete, when considering populations and evolution. But, in this case, the Blattner publication came ﬁrst and stood alone. The Japanese teams diverted their eﬀorts to following up genomic variation. The initial predisposition of the UW team was to exploit an advantage and pre-empt the possibility of a Monsanto–Hiram publication. We’re on our own, we’re going to beat them. Maynard was like, oh yeah, there’s no problem. He’s very conﬁdent and there’s a reason for it, he’s very good. So he knew, in fact we would have no problem publishing before them. There would have been no problem whatsoever. We were ready, our publication was done, we gave them an extra month to ﬁnish theirs. So, that was not a problem and Steve’s aware of that. We certainly would have published before them, which means that Science or Nature would not have accepted their genome, because it was the same strain. (Wood, interview)

From the point of view of the process of making knowledge public, the key point is that there was no certainty as to what would be made public, and in the end it made a signiﬁcant diﬀerence that both papers were published, adding to the public – the scientiﬁc community’s – knowledge resource. Had UW acted competitively, that would not have happened. Joao Setubal fully expressed the eﬀects of the potential for wasteful competition: Nobody wanted to lose. There were two good teams, doing a good job, and if one team published ahead of the other, it was the work of years that goes to waste. So we wondered about setting something up with them. Maynard said no, we will ignore Monsanto. If they publish ahead of us, that’s too bad. A week after that, we got a phone call from Monsanto. (Setubal, interview)

136

Public or private economies of knowledge?

In fact, preliminary contacts had been made and interest shown by Science for the UW publication, and Nature for the Monsanto one. UW’s conﬁdence was probably merited, and if either was to make the call, it was Monsanto–Hiram’s weaker position that led them to do so. UW had assembled a powerful team for ﬁnishing and publication, compared to which, Steve Slater and his Monsanto collaborator Greg Hinkle, along with Brad Goodner and his innovative use of undergraduates, were minnows. Both in terms of numbers of scientists and in multidisciplinary resources, UW were in a diﬀerent league. Firstly, UW had the immense advantage of the presence of Joao Setubal, recognized to be critical not only for the completion, but for the comparative genomics. It was Setubal who provided a key component of the UW paper in this respect, by recognizing the closeness of a plant symbiont, Sinorhizobium meliti, to Agrobacterium. But they had also recruited Jonathan Eisen and Ian Paulsen from TIGR for their expertise in phylogenesis, and Peter Karp, from SRI International, for his contribution to biochemical analysis. These complemented their own considerable internal capabilities in genetics, regulatory systems and Agrobacterium biology. Considering this marshalling of expertise for the publication process, it is worth emphasizing both its disciplinary range and its organizational diversity: a public science institution, a major non-proﬁt organization created by Venter, and a private company, who saw potential commercial advantage from early inside knowledge of the genome. Faced with this array of talent and resources, the achievement of publication by Monsanto–Hiram appeared quite heroic to all involved. When Steve Slater contacted Gene Nester suggesting concertation of the publication process, a general consensus rapidly emerged for the beneﬁts of a back-to-back publication, with Science as the journal with which UW had an agreement. The agreement was sharp-edged, in that failure to meet a deadline released either party from the commitment, implicitly stating that, in the event of Monsanto failing to achieve the deadline, UW would go ahead alone with publication. Nonetheless, the perspective of both teams was that each would beneﬁt from the contribution of the other, and that the scientiﬁc community as a whole would be the loser in the event of a single publication. In the event, back-to-back papers appeared on 14 December 2001. From the Monsanto perspective, however, publication would enlarge the community of researchers, thereby also beneﬁting companies with their particular interests. It’s important to publish so that you can get more people working on problems that maybe you are particularly interested in. It’s also important to be a collaborative and responsible member of the scientiﬁc community, so that if somebody actually comes up with something that they think might be useful to Monsanto, for example, that they will think of calling us. I’m out there at

Dynamics of a genome race

137

meetings. I’m a scientist just like they are and its good to have those connections. (Slater, interview)

The two papers were entitled ‘The genome of the natural genetic engineer Agrobacterium tumefaciens C58’ (Wood et al.) and ‘Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58’ (Goodner et al.).6 These suggest a diﬀerent emphasis, one on the natural organism, the other on the biotechnology tool, aspects of Agrobacterium. Monsanto did insist on a prominence of the biotechnology angle as a lead to the paper, although this did not imply an overall use-orientation approach. Both papers address fundamental biological issues, with diﬀerent, almost complementary emphases. Although there was some sharing of genome sequence data, no discussions were held between the two teams as to contents. However, the Science editors and reviewers had quite a signiﬁcant impact on developing a contrast between the two papers. In broad terms, the core of the UW paper revolved around the phylogenetic and comparative genomics aspects, and an evolutionary account of the recent division between pathogens and symbionts. This led to a tentative suggestion of a taxonomic reclassiﬁcation of Agrobacterium. The editors and reviewers had strongly steered UW in this direction. By contrast, a similar argument was eﬀectively suppressed in the Monsanto–Hiram paper by the reviewers and editors, although their argument was much more strongly based on evidence of chromosomal organization and the plasmid origins of the linear chromosome. In the end, the Monsanto–Hiram paper had a strong focus on the structure and organization of the chromosome, and especially on the peculiarity of the linear chromosome. It also had a strong emphasis on the nature of the pathogenicity, the metabolic processes of Agrobacterium DNA once in the plant host. This certainly was a biological function with technological interest, but equally an issue of fundamental biological interest. The Monsanto–Hiram team, and especially Goodner, were able to demonstrate the peculiar hairpin bends at the ends of the linear chromosome – of interest here because, in doing so, he was able to show that neither team had in fact successfully completed the sequence of the ﬁnal base pairs. Technically, both published an incomplete genome, if only by a small number of non-protein expressing base pairs. Finally, Goodner had wanted to publish information concerning the opportunistic character of Agrobacterium in relation to many hosts, including humans with impaired immune systems. This pathogenicity, however, occurred without gene transfer, and hence also, without tumorigenesis. Nonetheless, Monsanto was unhappy that any reference should be made to this, given the climate of fear surrounding genetic modiﬁcation.

138

Public or private economies of knowledge?

In terms of the publication process, therefore, the result increased diﬀerentiation and complementary dimensions of Agrobacterium knowledge to emerge into the public domain, and was a process in which both strong steering from reviewers drawn from the Agrobacterium community and diﬀerent orientations and capabilities of the two teams, as well as stakeholder interests, came into play. The production process, the ﬁnal shape of the public output, is an ongoing and collective activity right up to the point of distribution and dissemination.

THE TWIST IN THE TALE Once the data from both groups had been deposited, becoming available to the wider scientiﬁc genomics community, they also became the object of scrutiny of the two groups involved in the production of the almostcomplete sequence. It was discovered that there were minor diﬀerences in the sequences, base-pairs of DNA in one that did not exist in the other. Given the probabilistic methods of determining sequence, there was a question of whether these diﬀerences were due to experimental artefact or error, or whether they were indeed diﬀerences in the two genomes. The strain of Agrobacterium tumefaciens used by the Monsanto–Hiram laboratory had been acquired from the American Tissue Culture Collection, Maryland; the UW team had acquired their strain from the RS Dickey collection. The two samples of C58 had in eﬀect evolved in two diﬀerent environments, and consequently were now displaying real genetic diﬀerences. At the time, Maynard Olson’s laboratory was engaged in an extensive programme of investigation into the intraspecies variation of another bacterium, Pseudomonas aeruginosa, that evolved diﬀerent genetic characteristics in the lungs of cystic ﬁbrosis patients, each patient having their own biovar, their own lung environment. Consequently, it was decided that a new project could arise out of the conﬂicts and competition of the old, to study intraspecies variation of Agrobacterium. This time, the project was predominantly resourced by the NSF, and led by UW, but with the collaboration of Monsanto, Hiram and Campinas. To the two variants of C58, three other Agrobacterium biovars with very diﬀerent biological characteristics were added, Agrobacterium vitis, whose host plant is primarily the grapevine, and Agrobacterium radiobacter. A public space has been created to report the progress of the new and broader collaborative alliance (www.agrobacterium.org). There has since been some movement of personnel: Steve Slater to Arizona State University, and Joao Setubal to the Virginia Bioinformatics Institute. But Monsanto allocated considerable resources and a team led by Steve Slater’s former colleague, Barry Goldman,

Dynamics of a genome race

139

with a particular focus on the grapevine Agrobacterium, no doubt because of potential biotechnology gains. The tension and interdependency between public and private production, use, appropriation and distribution of knowledge is still reﬂected in these divisions of labour and interest within the public collaboration. The race therefore ﬁnally came to an end, competition between groups of private and public scientists ceased. Erstwhile rivals from both public and private organizations were then jointly engaged in a public science enterprise to produce a wider range of genomic knowledge, stimulated initially by the discovery of genomic diﬀerentiation resulting from the previous competition.

THE DIFFERENTIATION AND INTERDEPENDENCY BETWEEN PUBLIC AND PRIVATE AGROBACTERIUM KNOWLEDGE At the outset of the race, when the competitors were two public–private collaborations, it seemed clear to all concerned that the private organizations’ interest in participation was to acquire some intellectual property. As we have seen, even the NSF, that makes publication of all data a condition of granting an award, assumed as much in relation to DuPont. But the shared assumption was much more speciﬁc than that. It was considered that the process of genome sequencing would identify genes, and that Agrobacterium genes would be subject to patenting. Everyone thought that was going to happen. The delays in the Monsanto–Hiram project were perceived as being caused by the corporate interest in keeping the window open for as long as possible in order to secure patents on the gene information being generated. Derek Wood made clear the contractual obligation to provide reports to DuPont, duly delivered with more than 800 genes of potential interest being identiﬁed. The more direct evidence, from within Monsanto, comes from Steve Slater: On the private side, we had to write patents, several patents, we were mining the genome, ﬁnding genes that looked interesting. If you wanted anything done on that, you had to convince someone that it’s worth their time. You can’t ﬁle a patent unless you’ve got utility, so the other part of what I was doing was working on things we thought were interesting. You’ve got to get enough data to show that there’s some utility in these and so I was managing that too. We spent more of our time on metabolic genes that we thought might be useful for modifying plant metabolisms in a useful way.

But no genes were ever patented, by Monsanto, DuPont or Pioneer Hi-Bred. In the formal property rights domain, the only visible trace of the genome

140

Public or private economies of knowledge?

sequencing project was a single patent ﬁling, with Steve Slater as one of the inventors and Monsanto the applicant, relating to the covalent closure of the ends of the linear chromosome. It was ﬁrst ﬁled a week before data deposition (6 August 2001),7 but is yet to be issued and, in view of its limited utility claims, may yet be abandoned. Obviously, with such commercially sensitive issues, outside interviewers can only acquire partial information. There were several possibilities. We can assume that, for DuPont and Pioneer, they simply saw another route to circumvent Monsanto, or simply to pursue diﬀerent objectives. But Monsanto certainly did not lose interest in Agrobacterium. One possibility is that the race to publish – the disappearance of the window to appropriate – happened too rapidly for Monsanto for them to undertake the extensive additional research in house to demonstrate utility. The single patent ﬁling, of rather uncertain value, might conﬁrm this interpretation. Instead, they joined the race to publish ﬁrst, because, once they knew for certain that UW was going to ﬁnish, they were at risk of coming away with nothing. Eﬀectively, the ‘making public’ that occurred on August 14/15 and then on 14 December 2001 severely restricted the scope and opportunities for patenting genes, because prior art was in the public domain, ﬁrst with data deposition, then with publication. It is clear that the window would have been held open for much longer by Monsanto had not UW relaunched and accelerated their progress. Another possibility is that they were gaining signiﬁcant biological knowledge for their own in-house biotechnology strain of Agrobacterium, and avoided patenting genes discovered from the public laboratory strains, preferring technical secrecy. One of the interesting characteristics of the plant transformation process is that it could not be reverse engineered. There was no opportunity for a commercial rival to obtain information on the Monsanto transformation process by examining genetically modiﬁed seed or crops. All that would be signiﬁcant would be the genes that were inserted in the plant, such as the infamous one used to protect Monsanto’s property rights for its pesticide, glyphosate. But these genes had nothing to do with Agrobacterium’s genes: indeed, they replaced the tumorigenic ones. Given this, maybe the market value for patents on Agrobacterium genes did not merit the cost of investing in patenting them. The knowledge of Agrobacterium genes might be important for understanding the process technology, but not necessarily critical in securing competitive advantage in commercial markets most important for genetically engineered products. It is also possible that key plant transformation process patents had already been eﬀectively acquired, and did not depend on the additional knowledge acquired by the genome sequencing projects, and speciﬁcally did not depend on the identiﬁcation of Agrobacterium genes within the whole genome. But this assumes a static ﬁnality to a technology, and this

Dynamics of a genome race

141

would leave any company vulnerable to further progress of knowledge. A more nuanced view is expressed by Steve Slater, suggesting a continuing tension between growth of public domain knowledge and growth of potential uses of knowledge for commercial purposes. I’d be hesitant to say that the fact that the genome was sequenced has had a huge aﬀect on transformation technology per se. A lot of genetics has had an aﬀect on it, but much of that was the work done before this. So, have we used it? Yes. Has it made a big eﬀect? I wouldn’t say it’s had a large eﬀect on our ability to use it as a tool at this point, largely because we just haven’t put the eﬀort into doing it. I think there’s still potential there. It’s one of those situations where it’s already working really well. So you hope that when you publish it, academic labs will take it and advance it. (Slater, interview)

In fact the patent landscape surrounding Agrobacterium (Roa-Rodriguez and Nottenburg, 2005) is interesting in terms both of what is excluded and what is included, and in terms of the way the landscape of formal patenting is structured. A patent thicket of no less than 125 key patents had grown up around the key Agrobacterium technologies of plant transformation. Japan Tobacco held the key patent for transformation of all monocots (US 5 591 616), that include most grains (wheat, rice and so on). Washington University (St Louis), Mary-Dell Chilton’s destination after leaving Nester’s laboratory at the University of Washington, Seattle, ﬁled a key patent for transforming all dicots (including, most importantly, soybean), as early as 1983, although only issued 17 years later, in 2000 (US 6 051 757). Syngenta (including Syngenta/Mogen), Chilton’s next destination, subsequently gained signiﬁcant patents in dicot transformation. Indeed, Chilton’s trajectory is an interesting example of knowledge ﬂows between universities and commercial organizations: as a knowledge bearer, she facilitated the exploitation of public domain knowledge within her new corporate context. Her participation in the discovery of the role of the Ti-plasmids in DNA transfer within universities was a critical platform of the development of the key transformation technologies based on replacing the tumorigenic genes of Agrobacterium with genes of interest, as described above (Chilton et al., 1977; Chilton, 2001). This was the public knowledge suﬃcient for the subsequently privately appropriated technologies that depended on it. However, meanwhile, Monsanto had acquired a number of soybean-related patents for Agrobacterium-mediated transformation. There was a continuing legal battle over patent interference between Syngenta and Monsanto, as the thicket of patents developed (Pray and Naseem, 2005). In broad terms, the pattern of patent ownership that constitute the 126 key patents identiﬁed by CAMBIA is clear.8 Private corporations own

142

Public or private economies of knowledge?

nearly twice as many patents as public science institutes (80 as against 43, with three not-for-proﬁt organizations), nonetheless with a signiﬁcant proportion held by public organizations. Amongst the private corporations, the main patent owners are well known: Monsanto hold 17 patents (including those through acquisition, notably of Calgene and Agracetus), Syngenta (including Syngenta/Mogen), eight; Japan Tobacco seven; and Pioneer HiBred six. Of other private corporations genetically modiﬁed ﬂowering plants such as carnations and roses are dominated by patents held by Florigene, with three US and two European patents. Otherwise, corporations are represented mainly by holding a single patent. Amongst public science institutions, the University of Toledo, Ohio, is outstanding, not only numerically, but in terms of strategically important patents. They hold ﬁve patents, and a further two if the individual scientists at Toledo, Goldman and Graves, are included. The Max Planck Institute holds three, and the signiﬁcant pioneers in discovering the DNA transfer mechanism, Shilperoort at Leiden University, two more. Aside from this broad picture of division and concentration between and within private corporate and public science patent holdings, there is a crosscutting signiﬁcant division of markets by the crops covered by patents. We have already highlighted the broad division of the spoils between monocots (Japan Tobacco) and dicots (Washington University, Monsanto and Syngenta). Japan Tobacco clearly has a strong focus on rice, whereas Monsanto spreads its patent interest across sugarbeet, soybean, wheat, cotton and brassicas. Pioneer HiBred has its main holdings in soybean and sorghum. Florigene has a ﬁeld almost to itself for ﬂowers. In the public sector, the University of Toledo covers both monocots and dicots, but speciﬁcally holds a broad and powerful patent covering Graminiae (cereal crops). The source data for this analysis of the patent landscape are CAMBIA, a not-for-proﬁt organization that promotes open science, particularly with its Biological Innovation for an Open Society (BIOS), initiative. Their intention was to demonstrate the balkanization of knowledge and, along the lines of the argument development by Heller and Eisenberg (1998), against the development of an anti-commons. Apart from documenting the parcellization of knowledge, they also promoted initiatives to circumvent the patent thickets, such as that by Broothaerts et al. (2005), who demonstrated the use of alternative bacteria on which to base transformation technologies, placing them in the public domain. Chilton (2005), as a pioneer of the technology, expressed scepticism on two fronts. Firstly, there was a question of whether any alternative could eﬀectively compete with Agrobacterium in eﬃciency; but, more importantly, she pointed to the fact the Agrobacterium patents are in general not stand-alone anyway, but are

Dynamics of a genome race

143

interdependent with numerous patents on other aspects of plant transformation, seed varieties, pesticides, traits and so on. Many technologies are embedded in other technologies, and patents in suites of patents. Nonetheless, in the context of this patent landscape, it was all the more surprising that genes were not patented as a consequence of the whole genome sequencing projects. The additional biological knowledge gained about genes provided new opportunities, especially given that patents ﬁled or issued shortly after the publication of the discovery of the gene-transfer plasmids and related biological processes had not to any extent patented genomic information. Those patents depended on the knowledge of the function and identiﬁcation of those genes responsible for gene transfer, notably the virulence genes, but are overwhelmingly characterized by the securing of IPR on technologies of genetic modiﬁcation, mostly in relation to speciﬁc traits in speciﬁc plant species. The very fact that the identity and functions of some key genes were known prior to genome sequencing may well have complicated the issue of gene sequence patentability, insofar as it might be necessary to develop novel and additional utility, as Slater suggested. The account of the race, however, demonstrates a dynamic in which growth of public domain knowledge is both essential for, and in tension with, a parallel growth of knowledge developed for appropriation, especially for science-led technologies such as the ones we are discussing. The argument could again be expressed in terms of markets entering where the public domain fails. The other key feature of the race is the demonstration of the continuously changing relationship between ‘fundamental scientiﬁc’ and ‘technological’ knowledge: there are no natural divisions within this growth of knowledge. Indeed, the contrast between the Ti-plasmid functions and its translation into a plant transformation technology, and the genome sequencing and its inﬂuence and impact on technologies of plant transformation in the absence (as yet) of related patents, demonstrates a continuously changing relationality. Above all, the knowledge of the genome became increasingly polyvalent, with multiple and overlapping uses. The genomic knowledge cannot be squeezed into any knowledge quadrants, but moves around and proliferates uses as knowledge develops: fundamental, use-oriented, use-dedicated, and so on. The race was one in which unpredictability of outcomes was inherent in a complex competition process in which the potential for both private and public gains contributed to its dynamics. It is exemplary of great turbulence over the patenting of genes and the issue of utility seen in previous chapters. Moreover, concentrating on the outcomes or endpoints as if they were indeed teleologically the function of the competition can lead to a misrecognition of dynamics of knowledge transformation. Formal–private

144

Public or private economies of knowledge?

and formal–public knowledge are only the tips of their respective icebergs. The institutions of patent-based property rights require the construction of an appropriate form of knowledge with its own speciﬁc, legally construed and tested criteria of evaluation. Patent utility is quite diﬀerent from usefulness. As we have seen, this requires further knowledge work, research and development, of a particular and dedicated kind, in order to construct a privately appropriable knowledge that is recognized as such by the appropriate authority. This formal, indeed formulaic, knowledge is a ﬁltering process, often leading to extended negotiation, reﬁning the scope of the claims. Irrespective of technical secrecy as an alternative form of appropriation, any formal proprietary knowledge rests on an extensive penumbra of private (in the sense of resident in privately-owned organizations), but non-tradable, knowledge. However, as we have also seen, precisely the same is the case for formally public knowledge. The distillation and ﬁltering, editing and reviewing involved in the publication process results in a speciﬁc knowledge form evident in the appearance of the ﬁnal article, behind which lies an equally extensive penumbra of public knowledge. Self-evidently, this is not a tacit–codiﬁed distinction, or a knowledge–information distinction, but an ongoing process of diﬀerentiation within processes of knowledge production and distribution, involving diﬀerent collectivities, diﬀerent ‘publics’, engaged in it. Moreover, between the ‘private’ penumbra and the ‘public’ penumbra the boundaries between public and private knowledge are far less discernable, often non-existent. Most scientists, wherever their organizational location and whatever their formal employment contracts, and however little they migrate between private and public organizations, inhabit this vaster hinterland behind the façade of formal public and private knowledge. The interdependence between the public and private, the dynamic of diﬀerentiation, visible only as results in the formal spaces, is undoubtedly most intense within these penumbras, and the process of diﬀerentiation is always one that creates tension across the divides, between inclusion and exclusion from the formal arena. We can conclude with perhaps the symbolic event of the outcome of the race to sequence Agrobacterium tumefaciens, the closure of Cereon, the Monsanto subsidiary established at considerable investment in the new genomics science, only a year after publication of the genome, in June 2002. This genome had been its ﬂagship project at its launch, so its dissolution demonstrates the failure of an envisioned market economy of knowledge built around genome sequencing. In the ongoing tension between processes of making knowledge public and private, one possible vision of where the boundary might lie was proved – at that point in time – economically

Dynamics of a genome race

145

unviable. In the end, the competition for publication outpaced the competition for private appropriation, and public knowledge of the genomic sequence for both public and private use prevailed. The public–private divide is formed over time, and always retains a dynamic interdependency. The institutions of patenting that deﬁne utility contribute to this boundary formation, not only between formally private and public knowledge, but within private knowledge between usefulness and formal utility. We have seen how the institution of formal utility itself is subject to change (the EST story was contemporaneous with the Agrobacterium race) further demonstrating that these boundaries are not something inherent in natural division within knowledge. For a private corporation, investments may be made to accelerate making knowledge public in conditions of uncertainty as to what may be the speciﬁc nature of the gain, but when there is nonetheless conﬁdence of a gain. There is a private interest in accelerating public knowledge, in capitalist economies of knowledge.

POSTSCRIPTOME: ASPERGILLUS NIGER9 In January 2007 (see Pel et al., 2007) a whole genome of the ‘versatile cell factory’, Aspergillus niger, was published in Nature Biotechnology. It bears some remarkable similarities to, but also signiﬁcant diﬀerences from, the trajectory of Agrobacterium knowledge. Consequently, it demonstrates how the turbulence and processes of diﬀerentiation and interdependency between public and private domains remain undiminished, even though the generation of whole genomes is by now a much more routinized scientiﬁc activity. Like Agrobacterium, Aspergillus niger, a fungus, is a remarkably polyvalent organism, even though it is probably characterized even more strongly by its biotechnology interest. For a long while, it has been exploited for its biocatalytic properties for fermentation as well as utilization of enzymes and organic acid products for many industrial uses. It is widely used in starch processing, baking, brewing, animal feed production, and the paper and pulping industries. Recently, its properties have been directed to dealing with the problem of the haze in chilled beers, or reducing some toxins occurring in baking. And, at the forefront of the new bioeconomy, it has been recruited in the US Department of Energy’s Genes to Life (GTL) programme. Here, it is viewed as an exciting new player in the conversion of lignocellulosic carbon into biofuels and bio-polymers: it is a fungus that breaks down and converts that huge potential carbon source to be found in woody matter in plants: a veritable fungus-of-all-trades. But, as the GTL programme makes clear, it is also of fundamental scientiﬁc interest, and a model organism for the new scientiﬁc paradigm of

146

Public or private economies of knowledge?

systems biology. Fungal genomics have suﬀered an even slower development than plant pathogens and bacteria, with only 120 genomes sequenced so far, as against 1400 microbial ones (Baker, 2006c). Fungal genomes are now becoming central to the understanding of carbon ﬁxing, as well as illuminating the evolutionary history of cellular processes, chromosome replication, and apoptosis (Baker, 2006a, 2006b).10 So, Aspergillus niger shares with Agrobacterium the straddling and combination of multiple epistemic use values. It is a fungus that breaks down boxes, especially woodenly rigid ones. Although there was nothing equivalent to the drama of the Agrobacterium race, nonetheless similar tensions were clearly manifest. A major European food biotechnology company, DSM, based in the Netherlands, initiated a sequencing programme and eﬀectively completed sequencing as long ago as 2001. In their case, the strain chosen was the primary industrial strain (CBS 513.88), directly related to their own commercial activity. A period of ﬁve years followed in which the genome was ﬁrewall-protected, and IPR was gathered, including a number of key patents.11 Another company, US-based Integrated Genomics, had also produced and held privately a much rougher draft of a diﬀerent strain (ATCC 9029). This strain was later purchased (at market price) by the government-funded Paciﬁc Northwest National Laboratory in 2004, but, in order to minimize duplication and to signal a more public science purpose, this project team decided that its main genomic activity was to sequence the wild strain in 2005, and it completed its sequencing in September 2006. It seems clear that the US DoE-ﬁnanced genome project either prompted DSM to ﬁnally publish their genome, or at least accelerated the process. DSM achieved the front page cover story of a whole genome sequence in Nature Biotechnology. Although a clearly identiﬁable crop of patents emanating from the sequencing project already marks some signiﬁcant diﬀerences from the Agrobacterium genome race, perhaps the most interesting contrast arises from the central coordinating role of DSM as a private enterprise, and the extensive division of labour spanning private and public domains, which is manifest in this current epoch of genomics. The whole of their genome project was directly ﬁnanced by DSM, without any signiﬁcant complementary public funding. Initial sequencing and automated annotation work was outsourced to two private companies, Qiagen and Biomax, so maintaining the project within the commercial sector at this stage. But, moving towards a publication project in September 2005, DSM developed a whole network of collaborators through the fungus-dedicated conference scene. Table 4A.1, below, demonstrates both the dominant contribution of DSM, and yet the breadth and depth of the division of labour. There were 69 authors, spanning eight countries, most of them university- or government laboratory-based.

Dynamics of a genome race

147

The paper identiﬁes 21 distinctive expertise specialities for which these external, public science-based academics were recruited. These included annotation and sequencing; metabolism and developmental biology; microarray data production and analysis; fermentation; transcriptome analysis; bioinformatics; protein analysis; phylogeny and synteny; physical and genetic maps; life cycle and reproduction; incompatability genes; cell wall functions; transcription factors; secretome; transport functions; glycosylation; extracellular enzyme spectrum; and secondary metabolism. One of the most interesting features of this extensive public–private division of labour and its coordination, was the way DSM managed the public–private interface. They invited participation by prominent academics and, in exchange for providing them with protected access to the completed genomic sequence data under a conﬁdentiality agreement, included them as participants in the publication. DSM made their privately appropriated knowledge conditionally available as an inducement to public science participation in the process of making the genome knowledge public. The publication network was divided into six groups, each responsible for a section of the eventual paper, and led by a DSM team leader. The publication of the genome was proclaimed as part of DSM’s ‘open innovation’ strategy. Out of some 14 000 identiﬁed genes, roughly half were still of unknown function. So, released from DSM control, they were placed in the public domain as an opportunity for further development by the public or commercial scientiﬁc community: ‘With the function of some 7500 genes still unknown, scientiﬁc researchers have plenty of challenges to deal with in the future’ (DSM press release, January 2007). And, as with Agrobacterium, there is a twist to this tale, and quite a similar one. Following the successive deposition of completed genomes to Genbank, and subsequent scientiﬁc publication, the US group and DSM are collaborating in a comparative genomics project. In this case, too, surprising variation between the sequenced species has suggested considerable scientiﬁc and application gains – in the public domain, but through public–private domain collaboration. As a postscript genome to the Agrobacterium story, Aspergillus niger knowledge creation and circulation is exemplary of both continuing variety creation and tensions in the diﬀerentiation and interdependencies between public and private economies of knowledge. Moreover, if Agrobacterium was a symbol of a particular phase of biotechnology and the knowledgebased bioeconomy, Aspergillus niger is pioneering the emergence of an entirely new stage of its development, notably as a representative of the displacement of petrochemical science and technologies with a new generation of biological entities and processes expanding their scope in economy and society.

148

Public or private economies of knowledge?

NOTES 1.

2. 3.

4. 5. 6.

7. 8. 9.

10. 11.

We would like to thank Joao Setubal and Joao Kitajima at the University of Campinas; Derek Wood, Eugene Nester and Rajinder Kaul at the University of Washington, Seattle; Brad Goodner at Hiram College; Steve Slater then of Monsanto; Jean-François Tomb of DuPont; Maryanna Henkart of the National Science Foundation, and Philip Harriman, retired from the National Science Foundation. Critically, the team comprised a new combination of skills and disciplines: microbiology, DNA hybridization and viral RNA biochemistry. Another signiﬁcant example of a major biotechnology company investing in genomic science facilities from the same historical period is that of Syngenta’s Genome Centre. This organization was the more remarkable in that it was sited on a public science campus, and operated a collaborative and open door policy with respect to scientists at the John Innes Centre (McMeekin and Harvey, 2002). In correspondence, Philip Harriman conﬁrms that he was kept in ignorance of the problems with DuPont, and believed that they had fully delivered on their part of the contract. ‘If I had been sure the Monsanto sequence was going to be published I would have been hesitant to recommend the Nester award, since the info would be available and we could have used the funds for something else’ (Harriman, correspondence). Of the two papers, each accompanied as required by extensive further supporting evidence available on the web, there is no doubt that the annotation and ﬁnishing of the UW paper was richer (http://www/ncbi.nlm.nih.gov/). Nonetheless, as of November 2006, their citation record was quite similar, 203 for the UW paper as against 159 for the Goodner paper. A continuation in part was subsequently ﬁled on 26 November 2001, although priority of art is established by the ﬁrst date. As we have seen, absolute numbers count for little, as a single patent can hold a critical bottleneck, for example for all dicots, or all monocots. Nonetheless, this numerical picture conﬁrms the holders of the strategic patent positions on the patent landscape. We are particularly grateful to Hein Stam, DSM coordinator of the genomic project for extensive information in answer to questions, and Scott Baker of the DoE, JGI project, from Paciﬁc Northwestern National Laboratory for information concerning their genome project. ‘The community of researchers with interest in this organism are poised to make important contributions that span across basic biological research, evolutionary biology and industrial and medical mycology’ (Baker, 2006a, S20). Those clearly identiﬁable as emanating from genome sequencing include US Patents 6 861 243 ﬁled 2002, granted 2005; 6 558 937 ﬁled 1999, granted 2003; 7 052 869 ﬁled 2003, granted 2006. DSM acknowledge six patents directly related to the genome sequencing.

Dynamics of a genome race

149

APPENDIX Table 4A.1 Chronology of the sequencing of Agrobacterium tumefaciens genome Monsanto–Hiram 1997

UW/DuPont/Unicamp

Goodner approaches Nester to collaborate on genome sequencing 1998 April E-mail correspondence Nester aims to sequence the between Goodner and genome, but has diﬃculty Nester about possibilities attracting funds of collaborating on genome 1998 August Nester approaches Monsanto at the 7th International Congress of Plant Pathology, Edinburgh Nester approaches several people in Monsanto, without success Nester approaches DuPont, Tomb, 1998 Fall several months gap; calls back to Chumley, with no decision 1998 Winter Pioneer and DuPont visit Seattle Contract negotiations from then take at least a year 1999 Summer Goodner begins creating a clone bank 1999 Summer Goodner approaches potential collaborators and funders 1999 September Goodner publishes physical map 1999 Autumn Cereon starts shotgun sequencing of genome, October 1999 Autumn Goodner gets small NIH grant and begins small scale sequencing $100K 1999 November Crown Gall Annual DuPont starts sequencing? Conference, presentation Wood starts on Agro project of ﬁrst paper. Discovers Cereon/Monsanto had nearly completed shotgun sequencing by end

150

Table 4A.1

Public or private economies of knowledge?

(continued) Monsanto–Hiram

UW/DuPont/Unicamp

November, and had a substantial clone bank 1999 November Monsanto–Goodner agreement on collaboration 1999 November– Monsanto delays start December of collaboration for a brief time to ﬁle some patents ﬁrst 2000 January Collaboration begins 2000 Early summer 2000 Summer TIGR approaches Goodner about possible collaboration on C58 2000 August Setubal arrives at UW 2000 Summer Sequence data supposed to be handed over to UW by DuPont, prior to the commencement of NSF grant, initially programmed for completion stage of the genome, including going from 6x to greater coverage 2000 Autumn NSF proposal being submitted by UW, Maynard Olson, principal investigator, with Nester and Gordon 2000 November Presented almost UW sends watching brief to complete sequence at the Crown Gall Conference Agrobacterium Conference 21st Annual Crown Gall Meeting – only a few gaps left to do Slater talks to Milt Gordon, a UW collaborator, but with no response 2000 November DuPont slows down DuPont withdraws key personnel from Agro project – Shiping Zhang DuPont contract expires at end of 2000 December year, with only 3x coverage completed at most

Dynamics of a genome race

Table 4A.1

151

(continued) Monsanto–Hiram

UW/DuPont/Unicamp

2001 January

Monsanto slowdown begins

2001 Early

UW Genome Center and Maynard Olson’s group undertake to complete the shotgun sequencing of the genome, up to 9–10x No further communications between Wood and Tomb from end Jan. to post-switch in March Monsanto present again Derek Wood attends and hears presentation by Slater/Goodner nearly complete genome at ASM and TIGR NSF (Joanne Tornow) hears conference on the presentation and becomes Microbial Genomes aware of the Monsanto meeting, Monterey, Cal. Agrobacterium project, after receiving the grant proposal, but before awarding it Goodner approached by Wood approaches Setubal for NSF (Dr Harriman) about assistance funding proposal from UW with TIGR as partners NSF ask about Monsanto progress and publication intentions Goodner inquires of NSF Gene Nester gives a talk at the whether UW has got the Plant and Animal Genomics money, and soon after Conference, San Diego, Monsanto discovers that on the Agrobacterium genome they have NSF award starts 15 March Estimated expiry 31 August 2003 By time of switch, reports by DW to DuPont had identiﬁed 800–900 genes of interest, on 3x coverage Setubal agrees to undertake annotation Transfer of data from behind the DuPont ﬁrewall to Unicamp Tomb brokers arrangement with TIGR (Paulson and Eisen) to assist completion

2001 January

2001 28–30 January (JS)

2001 February

2001 February

2001 March 2001 Early March

2001 March

2001 March

152

Table 4A.1

Public or private economies of knowledge?

(continued) Monsanto–Hiram

2001 April 2001 April 2001

2001 April– May 2001 May 2001 July

Monsanto/Hiram ﬁnish assembly late July

2001 14 August

Monsanto deposit sequence to Genbank

2001 15 August 2001 August 2001 Late August

Monsanto contacts Nature Monsanto approaches UW for joint publication, agreed on Science 2001 End Deadline to submit to September Science 2001 November Agrobacterium Annual Conference, Atlanta

2001 December

Joint publication in Science, vol. 294, 2323–8

UW/DuPont/Unicamp Setubal writes annotation software UW take over sequencing from DuPont at 3x, take it up to 10x Wood arranges collaboration with TIGR on annotation and Paulsen for transport systems, Eisen for phylogenetics and Karp for metabolic pathways, ﬁrst suggested by Tomb Setubal working ﬂat out on annotation Pat Minx from Wash U tells UW that the race is close UW presents poster setting 15 August date at the conference of the 10th International Congress of Molecular Plant Microbe Interactions, Wisconsin, 10–14 July Announces ‘the A. tumefaciens C58 genome has been sequenced’

UW deposit sequence to Genbank UW contacts Science

Deadline to submit to Science Agrobacterium Annual Conference, Atlanta, Nester’s group give ﬁrst presentation BG and SS present Joint publication in Science, vol, 294, 2317–23

153

Dynamics of a genome race

Table 4A.2

Aspergillus niger: collaboration between public and private organizations

Organization

Sector

Country

No. of contributors

Com

N

17

Kluyver Center for Genomics of Industrial Fermentation, Delft University

U

N

1

School of Biology, Nottingham University

U

UK

2

Centre for Microbial Biotechnology, Technical University of Denmark Centre for Biological Sequence Analysis

U

DK

3

Microbiology Laboratory, Fungal Genomics, Wageningen University Laboratory of Genetics

U

N

3 1

Department of Molecular Biology and Biotechnology, University of Sheﬃeld

U

UK

1

Microbiology, Utrecht University

U

N

2

DSM Food Specialties

Biomax Informatics

Com

G

2

CLC Bio

Com

Dk

1

Biological Sciences, Liverpool University

U

UK

2

Molecular Biomedical Research, Ghent University

U

B

3

Centre for Analysis of Biological Complexity, University of Manchester

U

UK

2

Architecture and Function of Biological Macromolecules, University AixMarseilles

U

F

3

Department of Microbiology, University of Groningen

U

N

4

Centre for Carbohydrate Reprocessing TNO Rug Fungal biology and pathogenicity, INRA

Govt

F

1

Swammerdam Institute for Life Sciences, University of Amsterdam

U

N

2

Microbiology Laboratory, Univ. d’Angers

U

F

1

154

Table 4A.2

Public or private economies of knowledge?

(continued)

Organization

Sector

Country

No. of contributors

Institute of Biology, Leiden University

U

N

4

CMBI, Radboud University Medical Centre

U

N

1

Institute of Chemical Engineering, Technical University of Vienna

U

A

2

Qiagen Genomics Services

Com

G

2

Helmholtz Centre for Infection Research

Govt

G

3

Department of Microbiology, University of Szeged

U

Hungary

2

CBS Fungal Biodiversity Centre

Govt

N

1

Fungal Genetics and Technology Consultancy

Com

N

1

5.

Evolving economies of knowledge

AT THE FRONTIERS OF BIOLOGICAL KNOWLEDGE At critical historic conjunctures, the public or commercial future of biological knowledge hung in the balance. Both extremes of wholesale and comprehensive commercialization or dominance of a new public resource appeared possible across a wide span of the emergent information-based science. It could have gone either way. Indeed, over the course of the last two decades of the twentieth century, boundaries between public and private knowledge shifted quite dramatically – and the future remains indeterminate. Economic ﬂux occurred in each of three ﬁelds of novel epistemic activity: the emergence of bio-data as a distinct category of knowledge; the development of informatic experimental tools; and the achievement of a complete genome, in our case, that of a bacterium of unique scientiﬁc and technological interest. The fundamental question we asked ourselves was not how the growth of knowledge contributes to the growth of the economy, but how the economic process of the growth of knowledge occurs within capitalist economies. The production and use of knowledge is seen as an integral part of the economy, and an outcome of a societal division of labour. Its growth is no less an economic phenomenon than any other aspect of the general economy of which it forms part. It may be, as Mokyr and others have forcefully argued, that there is something especially special about the contribution of the growth of the epistemic base to growth of other parts of the economy. But we have also to say that other spheres – agriculture and food production and use, communications or transport systems – are also broadly transformative through the speciﬁc contributions they make within an overall societal division of labour. Consequently, our analysis has focused on scientiﬁc and technological knowledge seen as economic phenomena in their own right, even if they are also epistemic and cultural. In this respect, we are addressing a rather diﬀerent question from that raised by Nelson’s (1989) critique of Schumpeter, a critique still driven by the underlying assumption of how scientiﬁc or technical knowledge is recruited to an ‘engine of progress’ for the capitalist production of market products and services. We are coming in on a diﬀerent angle, and not retreading that ground. By asking why, how and under what conditions, the 155

156

Public or private economies of knowledge?

division occurs between marketable and non-marketable, public and private knowledge, we are hoping to demonstrate that the multi-modal characteristic of capitalism is not just a modiﬁcation or mitigation of a market economy, let alone an optional addition, but a constitutive aspect of its political economy. In other political economies, institutional diﬀerentiation between knowledge and product/services producing activities may arise, but these are not necessarily mapped onto private–public, market–non-market divisions, at least certainly not in the same way.1 New divisions of labour create interdependencies – not separation – and asymmetric relationships between public and private knowledge. Only when divisions of labour in knowledge production arise do issues of its exchange, distribution and use develop on a societal scale. Or perhaps it would be more accurate to say that issues of use by others, exchange or distribution to others, and their modalities, arises pari passu with the diﬀerentiation in production of new types of knowledge. The emergence of new divisions of labour within the production of biological knowledge has demonstrated this point. The novel discipline of bioinformatics, the diﬀerentiation between data production and data analytic tools that we have traced, not only created new specialist domains, but at the same time witnessed the emergence of wider communities of users, themselves often quite specialized. In turn, how that new knowledge, and what aspects of it, ﬂowed between producers and users witnessed the emergence of new modes of distribution and access and, in many cases, new forms of commercial exchange or public rights. We elucidate our conclusions in greater detail below, but here the general point to be made is that the emergence of new divisions of labour within biology concurrently witnessed new processes of production, exchange, use and distribution. Through understanding these questions, we can then ask how some biological knowledge is made public, some private, and in what sense. We look at public research institutions such as universities as a necessary part of a societal division of labour, but also as an expression of the underlying dynamic creating asymmetric interdependencies between public and private domain knowledge that do not neatly or exclusively coincide with institutional or organizational divisions. We think this dynamic is essential to understanding why we have witnessed so much corporate investment by market actors in the growth of public domain knowledge, on the one hand, and why markets for knowledge have been so diﬃcult to sustain in the face of the growth of public domain knowledge, on the other. Above all, we wish to draw attention to the emergence of new forms and institutions of public knowledge, so highlighting its dynamic character, and not treating the public domain as a static space simply to be ﬁlled by more and more of the knowledge stuﬀ.

Evolving economies of knowledge

157

In short, the growth of public knowledge, in whatever its institutional form or locus, is an economic enterprise at the centre of the capitalist engine of growth. Mokyr’s account of the Industrial Revolution is profoundly misleading if projected into the contemporary epoch, partly because institutions of, and investments in, public science were relatively so fragmentary and tentative. From the seventeenth to the early nineteenth century, new forms of public domain (encyclopedias, public experimental demonstrations, development of shared mathematical languages across Europe, etc.) foreshadowed, but only embryonically, what was to become a major societal division of labour by the twentieth century. Whereas it may just be possible to characterize the early public epistemic base as the activity of a relatively small elite2 (even a handful of individuals), in retrospect these early forms of instituting public domain knowledge now appear as harbingers of an historical transformation of the nature of capitalist economic growth (Stewart, 1992; Golinski, 1989, 1992; Shapin and Schaﬀer, 1985). These were the early and tentative origins of a diﬀerentiation between public and private knowledge that only emerged as a major societal division with the advance of industrial capitalism. The production of knowledge has developed into a major public enterprise, diversely located, supported and resourced, a core component of advanced industrial capitalism. Coming from this direction, our approach takes a step back and asks how, with respect to knowledge creation, non-market or public economic modes are diﬀerentiated from market or commercially trading economic modes, rather than taking that division for granted. The question of how the non-market sector contributes to the development of the market sector is the wrong place to start from. Instead of taking for granted the identiﬁcation of formal organizations with public and private sectors – universities or incorporated companies, for example – we have delved deeper to reveal processes of diﬀerentiation between public and private modes wherever they occur. The political economy of capitalist knowledge creation does not simply divide into two camps, organizationally or in terms of resource ﬂows. We are looking at the formation of a meta-division of knowledge labour between public and private economic modes, and then the dynamics of their mutual interdependency and forms of ‘exchange’. Likewise we take a step back from dichotomies of pure versus use-oriented, scientiﬁc versus technological, to explore how (indeed also whether) diﬀerentiation occurs in the functionality of biological knowledge, assuming that the idea of knowledge without any deﬁnable function can be safely put to one side. To explore these questions, we looked at historical processes in the three chosen areas of novel epistemic activity. To answer the question of how

158

Public or private economies of knowledge?

public and private modes are diﬀerentiated and interdependent, only an historical approach can provide the answers. If a diﬀerentiation remains stable over a long period, this demands historical explanation as much as if there are rapid and structural changes. Periods of change, however, reveal processes of diﬀerentiation to empirical analysis more readily than when organizations and institutions form part of a taken-for-granted landscape. For similar reasons, it was our choice to take the creation of new biological knowledge directly as the object of our analysis, rather than enter these issues through the gateway of organizations as such. Although we think we are providing interesting insights about how organizational forms change, we are coming at that issue from a more fundamental level of economic processes. The three areas of bio-data, bioinformatic tools and genome completion were all characterized by great ﬂuidity in terms of economic process. Although at the very centre of the revolution in biological science, they manifested, more acutely than many other areas of biology, extreme disruption of established, institutionalized patterns. We wanted to see how diﬀerentiation occurred, how new boundaries were formed through periods of turbulence, in ways that are not readily examinable once patterns have become ﬁxed, relatively stabilized and institutionally embedded. In doing so we are drawing attention to the fact that ‘capitalism is restless because knowledge is restless’ (Metcalfe, 2002). So we are not claiming that all developments of (biological) knowledge create such turbulence, or that the revolution in biological science and technology is typical. We do think that these minihistories of biological knowledge creation provide some unique insights.

INSTITUTING ECONOMIES OF KNOWLEDGE Before entering into more detailed discussion, there are some broad points to be made concerning the ‘instituted economic process’ theoretical approach in relation to the empirical methods just noted. The opening point is a straightforward recognition that the revolution in biological knowledge is as much one of the scale of biological knowledge production as in the changes in epistemic practices or concepts. The genome projects and global databases may not match space exploration or experiments of submolecular physics for the mobilization of societal resources, human, physical and ﬁnancial, but, compared with previous epochs of biological knowledge, there has been a step change. There is no easy statistical measure, in terms of aggregates of state, corporate and non-proﬁt contributions, but the extent of political controversy over the re-scaling of public resources alone is powerful evidence of the historical shift. An even trickier question of quantifying the contribution of the growth of biological

Evolving economies of knowledge

159

knowledge to the economy at large is one we have scrupulously avoided. Nonetheless, whether with respect to health care or to emergent bio-based economies for fuels and materials creating new platforms for economic growth, the ‘biologizing’ of the wider economy is clearly a signiﬁcant reorientation. Our focus, however, has been on the growth of new biological knowledge as an economic process of contemporary capitalisms in its own right, the phenomenal development of biological knowledge economies. Economies of knowledge are constituted by articulations between processes of production, use, distribution and exchange or appropriation. Each of these processes involves a distinctive form of transformation, respectively of qualitative characteristics, functionality, location in place and time, and control. Taken independently, none of these processes is inherently economic, and they only become so through their articulation with each other. How these processes are articulated and disarticulated changes over time, so ‘an economy’ is instituted only when a given articulation is dynamically reproduced over a given period of time, achieving a certain stability (Harvey, 2007b; Harvey et al., 2002). Such historical changes, the continuous reconﬁgurations, are transformations with a speciﬁcally economic character. In tracing the histories of biological knowledge, we have demonstrated just how temporary and transient some articulations transpire to be. A business model built around combining public domain data within ﬁrm internally produced data, trading in genomic sequence data to pharmaceutical clients, or a database of protein sequences based on combined public and commercial subscription for access, provided two examples of transitory articulations. Tracking aspects of development of the revolution in biological science, it was possible to demonstrate modes of articulation between the four transformational processes at any point in their trajectories. For example, a biological database had rules for frequency, quality, structure of production as a condition for publication, modes of Internet access, rights of access, costs of access and rules for further use. Likewise, at a point in time, a software package was open-source, modiﬁable by a community of practitioners, distributed by snail-mail, exchanged under a diﬀerential licence for academic and commercial users. Often, competing alternative modes of articulation coexisted for shorter or longer periods of time. This was demonstrated by the major competitions for various types of genomic data between publicly funded laboratories and corporations, or between public and private alternative packages of bioinformatic tools. The genome races, and especially the race for the rough draft of the human genome, demonstrated complex articulations between public and private modes funding, production, distribution and exchange of data through licence or by rights to public access on the Internet.

160

Public or private economies of knowledge?

The coexistence of alternative modes, as well as rivalry and competition between them, is the strongest evidence that outcomes were unknown ex-ante. Above all, however, the diﬀerentiation between public and private knowledge can be seen as an outcome of how these four processes are articulated at any point in time. This analysis has two major consequences. Firstly, there is no static, pre-given or ‘natural’ division between public and private knowledge, and no permanent or universal deﬁning characteristics distinguishing public from private knowledge. Second, the public or private modes of knowledge are not captured either by how knowledge is produced collectively and openly or behind closed doors; or by how knowledge is distributed through public or private channels; or by how knowledge is controlled or appropriated, and by which social groups or organizations; or by how knowledge is used and to what purposes within any organizational setting, such as ﬁrm or university; but only by a complex and variable combination of all four of these processes of making knowledge public or private. In short, how knowledge is made public or private is subject to innovation whether in production and process, in modes of distribution and use, or, and most importantly, in the institutions or ‘social technologies’ (Nelson and Sampat, 2001) of appropriation: new rules for control over public distribution, access and use, or redeﬁning utility for private appropriation by patent, for example. The historical diﬀerentiation between public and private knowledge is thus one that needs to be viewed integrally, across and within each of these four interdependent processes. As such, issues raised by debates concerning types of knowledge (science versus technology), knowledge as an economic good, institutions and resourcing of knowledge, property rights and knowledge embodiment (tacit and codiﬁed), are signiﬁcant, but only provide partial accounts of a much broader phenomenon. To give but two examples, the ﬁrst is illustrative of the knowledge embodiment distinction between tacit and codiﬁed knowledge, the second is related to the (private) appropriability of knowledge. The creation of a new software suite of bioinformatic algorithms involves the emergence of a new division of biological knowledge labour within which a codiﬁed language is shared by some, but relatively closed to others.3 However, others may use these packages without needing to understand how they work. This division of labour, however, is more than a social division between one grouping possessing a new formalized language, and others without it, more than a question of who talks to whom. It is also an innovation in language, a newly produced language, that, in turn, in our examples, may or may not be appropriated privately by commercial enterprises or publicly by a community of scientists; may or may not be distributed without restriction; may or may not be used for many diverse purposes, whether public or private.

Evolving economies of knowledge

161

In short, although a highly signiﬁcant aspect of diﬀerentiation between public and private knowledge, communicability or stickiness of knowledge with respect to knowledge bearers is only part of the story, one that is better told within the broader and more integrated context. The second example is the treatment of certain types of knowledge, generic or fundamental knowledge in particular, as being inherently diﬃcult to appropriate commercially or make private. Again, there are signiﬁcant aspects to making knowledge private or public raised by these arguments. But there is much more to making even generic or fundamental knowledge public than the diﬃculties there may be of securing exclusivity through legal or technical means. As we have seen from our many examples, the public nature of the production of knowledge, even knowledge tools, is part and parcel of the accumulation of knowledge – in the case of bio-databases, their scientiﬁc robustness and value is ever enhanced by continuing and open-ended additions. To make a rather simple but critical point, the market frenzy around ESTs, the golden ‘keys to genes’, was at least as much undermined by the growth of new public knowledge that downgraded their signiﬁcance as by the loss of appropriability resulting from deposition of ESTs in public databases. But there is more to it than that. Quality rules, statistical reliability, temporalities of depositing data all contribute distinctive aspects to the ‘publicness’ of such databases. Moreover, access to the data is not just a given, nor the construction of Internet access an accidental or insigniﬁcant aspect of distributing knowledge. Tools for accessing such databases are themselves the result of innovation, essential to the public distribution process. The specialized academic publics of various sub-disciplines of biology and medicine may use the data for further public purposes, just as private corporations, like pharmaceutical ﬁrms, will use them for drug target identiﬁcation. So appropriability is but one of a number of critical dimensions for making knowledge public or private. And, last but not least, we have demonstrated that there was nothing inherent in the type or nature of the knowledge, at any given point in time, making it commercially inappropriable, more the active process of making it public in innovative ways across the four economic processes. Drawing out some further implications from our empirical cases, two main axes to the four processes have emerged as particularly signiﬁcant: the diﬀerentiation and articulation between production and use, on the one hand, and the articulation of distribution and exchange or appropriation, on the other. Analysing phenomena in terms of these two axes allows us to address directly the interrelationship between the organizing principles of an economy: the evolution of divisions of labour, and varieties of modes of coordination and control.

162

Public or private economies of knowledge?

TWO AXES: PRODUCTION AND USE/DISTRIBUTION AND APPROPRIATION One of the main features of the revolution in biological knowledge has been the emergence of new types of knowledge, of which three are particularly signiﬁcant: bio-databases, genomes and analytical tools. In economic literature, much has been asserted concerning the indivisibility of knowledge (for example, Dasgupta and David, 1994): many people can have halfbaked theories, but one person cannot have one half of a theory, another the second half. The knowledge objects we have explored have manifested multiple possible divisions, and perhaps the most signiﬁcant has been the separation of various types of bio-data, and their institutionalization, from theories of gene expression, evolution, cell function, metabolic processes, and so on. Moreover, historically bioinformatic tools were at ﬁrst conjoined with the databases, and they only gradually and partially became diﬀerentiated and detached from one another. As we observed, the tools and the data remain interdependent, but equally, they can be distributed separately and amongst diﬀerent users. Knowledge is made divisible, and diﬀerent specialisms and uses emerge. Many of the tools are modular in structure, amenable to the decoupling and recombination of components. The fears, quite justiﬁable, over the balkanization of genomes rested on their very divisibilities, the scattering of genes, even fragments of genes, between diﬀerent producers and users of knowledge. Whole genomes, as we have seen, are quite ambiguous in their ‘wholeness’: on the one hand, although it might be possible to determine the whole genome of a single individual organism, the whole genome of a population of a species is far more problematic. On the other hand, a genome of an individual organism may only be a component of the metagenomics of an entirely interdependent community of organisms. The human genome itself is not organically self-suﬃcient, and does not contain all the genes it needs to function. The human mouth alone is host to a community of over 400 bacteria, with complementary genes: we need combinations of all those co-evolved genes. What is the ‘wholeness’ of a genomic organization? It is an opening up rather than closing down question. The separation of data, genomes and tools as distinct epistemic forms has in turn stimulated new divisions of labour to emerge between producers and users of these knowledge forms. For bioinformatic software tools, whether open source or not, a division between tool developers and tool users became increasingly pronounced. Moreover, a signiﬁcant part of tool development became increasing ‘user-friendliness’, even when communities of users remained quite small. In the terms of Cohen and Levinthal (1990), this aspect of diﬀerentiation diminishes the demands on the absorptive

Evolving economies of knowledge

163

capacity of users with respect to scientiﬁc tools. The tools are constructed in such a way that they increasingly reduce the required knowledge capacities of their users. At the same time, increasing the use of tools across diﬀerent databases – interoperability – also broadened the range of uses and consequently the potential communities of users. A standard package for accessing and comparing genomic data, BLAST, now has a broad community of users, academic and corporate. Likewise with bio-data of various kinds and complexities, specialists in data production and management have emerged, as distinct from a much broader community of users. Adding to and qualitatively reﬁning and standardizing databases is a continuing process engaging specialized data producers. The epistemic value of any existing data is enhanced as data is added, or as rules for assessing data are developed. The case of ESTs is a typical case in point. The original EST databases were constructed from fragments of nucleic acid sequence, related to known or unknown genes. As more and more were added, it became clear that ESTs had been insuﬃciently understood, so that continued data accumulation had not only diminishing returns, but increasing redundancy. This led to the construction of a non-redundant database of ESTs, where the inputted data were also redeﬁned, and subject to new epistemic regulation and standardization. This remains an ongoing process. In terms of genomic databases, only a small proportion of organisms have been characterized, and new genes are constantly being added to the pool, so retrospectively enhancing existing deposited data available to an expanding community of users of such data. Their activities may be directed towards diﬀerent ends, drug development, medical diagnosis or whatever. With the Agrobacterium story it was shown how the momentary closure on publication of the genome immediately initiated further comparative genomics of variation and speciation, so revising and enhancing the published genome. The spectrum of uses of one genome is broadened by the addition of others to the pool of genomes in databases. Broadly, three modes of diﬀerentiation between producers and users can be distinguished: a. b. c.

The production of knowledge outputs for further similar knowledge and its continuing growth and accumulation. The production of knowledge for use in dissimilar, often complementary, knowledge production. The production of knowledge for multifarious uses involved in the creation of non-knowledge goods and services.

The peculiarity of the forms of knowledge whose trajectories we have followed is that they have polyvalent uses. Bio-data can be used to enhance

164

Public or private economies of knowledge?

existing bio-databases or for the development of further similar knowledge, as well as being used by others for quite varied purposes. The same is the case for the Agrobacterium (and many other) genomes. The knowledge of genomes can be used to develop evolutionary theory in combination with other genomes, for example, or for a wide variety of biotechnology tools. For many of these new forms of biological knowledge, it is not possible to pin them down to any one type of knowledge, fundamental or applied, useoriented or curiosity-driven. No doubt, this is a major reason why they have been sites of tension, and objects of conﬂict, for economies of knowledge. Moreover, during their trajectories, many of them develop additional uses, and change direction. Bio-data, genomes and bioinformatic tools may be particularly open to polyvalence of use, although they are certainly be no means unique in this respect. Furthermore, polyvalence does not imply openness to any use: knowledge of a genome, for example, is of little use in engineering a motorcar, however much the motorcar is designed for an organism with certain mental and physical capacities. Our research has not aimed to demonstrate the changing use-sets of particular knowledges, but we can be sure that they remain quite open to varied uses. Who knows what uses the genome of a thermophylic extremophile may be put to over the next few decades? But the critical polyvalence for economies of knowledge is the dual use of knowledge for the further production of similar knowledge, and knowledge for other purposes, such as drug discovery, identiﬁcation of genes for diseases, and so on. We locate this at the core of the turbulence in economies of knowledge. For polyvalence of use, alternative (complementary or competing) divisions of labour at the same time open up alternative or competing possibilities of distribution and appropriation. This is the point of necessary articulation between the two axes. Alternative processes of distribution and exchange, one market, the other non-market, are brought into potential conﬂict. On the one hand, the division between producers and users of certain knowledge outputs creates the possibility – and no more – of market exchange. On the other, if those same outputs are at the same time inputs for further similar knowledge production, a diﬀerent regime of public distribution and appropriation may – no more than a possibility – emerge. For, as a condition for further knowledge production, openness of source codes for software and continued and unrestricted expansion of bio-data and the population of sequenced genomes, requires distribution and access of similar knowledge amongst the specialized publics of co-producers. We have demonstrated that these processes of public distribution and exchange are by no means automatic and, indeed, in all our cases, there has been substantial innovation in the forms of distribution and exchange. That is, they are not given as manna from heaven,

Evolving economies of knowledge

165

or expressions of a time-transcendental, scientiﬁc–methodological code of behaviour, nor can they be derived from Mertonian quasi-markets for public reputation. We will return to the point, widely ignored, of the signiﬁcance of innovation in making knowledge public through the instituting of new modes of distribution and exchange. But here we focus on the fact that polyvalent use gives rise to the possibility of competing and alternative modes of distribution and exchange for the same knowledge outputs. We want to signal the potential for conﬂict that arises from the way in which divisions between production and use are articulated with distribution and exchange. It is for this reason that these are the two main axes of instituted economic process within economies of knowledge in capitalist political economies. Indeed, in each of our three empirical foci, this potential for conﬂict was evident, and deﬁned the nature of the conﬂicts and turbulence that surrounded them. The uncertainty of outcomes at the points where we picked up on the stories, and the state of play at the point where we have left them, are evidence of a persistent tension which we believe to be at the heart of knowledge production, use, distribution and exchange within capitalism. It is this tension, and how it is dynamically played out, that is at the core of the shifting boundaries between public and private knowledge. So, having signalled the necessity of their articulation, we now turn to consider the other axis to the IEP framework, distribution and appropriation, as conditioned and enabled by divisions of labour between producers and users. It has often been remarked that markets expand as division of labour increases (North, 2005a, 2005b): transactions proliferate if producers specialize in certain products previously also produced by others. Although we have ordered this discussion by taking production and use ﬁrst, it could equally well be argued that distribution and exchange drive the emergence of divisions of labour. One can approach the analysis equally well either way round. As a preliminary remark, there can be no doubt that the development of the Internet revolutionized the distribution of knowledge generally, and biological knowledge in particular. There is one speciﬁc consequence of this new mode of distribution to which we have already alluded. Internet distribution facilitated the separation of the distribution of data from that of software analytic programs. At the outset, whatever the mode of exchange, market or non-market, software circulated with data on magnetic tapes delivered by mail. But there was no technological determinism involved here. For, in the ﬁrst place, that separation was also driven by developments of digitized biology. The scale and number of bio-databases exploded, and software packages developed accordingly. Increasingly, however, their mutual development occurred on twin, relatively separate, tracks. More fundamentally,

166

Public or private economies of knowledge?

digitized and computerized biology was a development that required a signiﬁcant transformation within biological methods, irrespective of modes of distribution, even though beneﬁting from them. Biology was not digitized because of the Internet, but took place in its benign context, and took every advantage of it. The uneven process of digitization and computerization of some aspects of biology demonstrate the point: a benign environment for digitization does not mean that all biology will be digitized and computerized, or that it occurs immediately as a consequence. In the second place, Internet distribution was neutral as to whether knowledge was exchanged in market and non-market modes. Reduction in costs or acceleration in speed of distribution of itself did not either create new knowledge markets or facilitate the institution of a public resource. From an economic point of view, therefore, the articulation between distribution and exchange is critical to understanding what is involved in making knowledge public, one that requires a much more nuanced and developed concept of ‘public’ and ‘private’, market and non-market. We have argued that, at the core of property rights, public or private, is the issue of control, how, and by what social entity. As will be discussed further below, control may or may not be founded on formal legal rights, such as IPR, itself an historically evolving institution. In much discussion of this issue, however, the public character of knowledge is deﬁned negatively as the absence of control: non-exclusive rights to access or disposal, or nonrivalrous uses (David and Dasgupta, 1994; Demsetz, 1967; Kaul and Mendoza, 2003; Heller, 1998). If something is not under the exclusive control of an agent with individual property rights, then it is public, open to anyone and everything. By contrast, we have argued for a positive account of control, whether for private or public goods, including control over distribution, access and use. Diﬀerent types of public goods involve historical changing forms of control and variously organized controlling agents, sometimes governmental, but by no means always. In relation to the public character of public science, as it is instituted through exchange and distribution, a core feature involves the control over standards, the establishment of norms of validation and the development of supranational norms by a global scientiﬁc community.4 To illustrate the force of this approach, we might contrast the contents of a journal such as Science or Nature, and a blogsite of an individual blogger. For the latter, it is clear that there is non-exclusive, nonrivalrous access and use, and dissemination is free, costless to the ‘user’. For the former, the vehicle of knowledge distribution (as opposed to the content itself) may be controlled by a commercial agent, with exclusive rights to distribution by subscription. But few would contest that the former could contain a wide variety of heterogeneous, idiosyncratic and personal

Evolving economies of knowledge

167

content. Although freely and openly distributed, this content is not a public good, as controversies on what is and is not permitted or even publicly controllable in these distribution channels illustrate. By contrast, publication in a journal under speciﬁc control conditions makes the resultant knowledge a public good. This is the case even though scientiﬁc journal production and distribution is a multi-million dollar industry. Again, this is an historically changing institution: the PLOS5 initiative (Malakoﬀ, 2003), where authors pay for their publications, and ‘consumers’ have free access, suggests a diﬀerent articulation between distribution and exchange. However, such a re-articulation has immediately stimulated controversy over its consequences for quality regulation and control, so supporting the argument for positive accounts of control. What is involved is a regime of reciprocation governed by shared norms and standards amongst communities of specialists, to enable outputs to be inputs in further knowledge production, and reciprocal recognition through citation of participation in the generation of public knowledge. There are mutual obligations to abide by the rules of the game by those who participate in production of scientiﬁc knowledge as a public good. As rules change, and as social organizations sustaining them develop, new forms of public good emerge too. Before exploring the issues in greater detail with respect to our three empirical ﬁelds, a broad and surprising conclusion from the evidence is that, in the event of competition, and provided human and material resources are made available, growth of public knowledge undermined markets for knowledge of a similar kind. This was famously so in relation to the rough draft of the human genome – and even more conclusive for the completed human genome by which time commercial competitors had retreated from the competition. With the active development of nonmarket distribution and exchange institutions of genomic and other biodatabases, new public forms of knowledge eventually emerged across a very broad range of bio-data, and at the same time witnessed the disappearance of nascent commercial markets for such data. A more mixed picture is evident for bioinformatic tools, but even here, dominant public domain tools have emerged, and commercial markets have only been established for tools where the producer–user division of labour has been sharpened in such a way as to prevent or restrict dual use of the kind described above. However controversial, this supports an argument that markets for knowledge develop only in the context of limitations or restrictions on the necessary human and material resources to support non-market production, exchange and distribution of public knowledge. We have seen many instances of signiﬁcant provision of such resources from the corporate sector, complementing governmental resourcing of university-based research, whether by support of their own research scientists, of university research or through

168

Public or private economies of knowledge?

consortia combining corporate, non-proﬁt and governmental resourcing. For polyvalent knowledge growth, where new knowledge outputs are used as inputs in the production of further knowledge, the potential set of other uses expands, both in quality and in range. As Mokyr has suggested, the epistemic base of society as a shared and public resource is critical to the development of non-epistemic uses of knowledge. What he emphasized less was the implications that has for an understanding of capitalism, especially that it places the growth of public domain, non-market, economies of knowledge at its core. The epistemic base depends on public modes of distribution and exchange, as well as on a continuing expansion of resources to support its growth. That the growth of public domain resources dedicated to the production, distribution and exchange of knowledge is critical to the expansion of the set of potential commercial and market uses of knowledge makes it less surprising that commercial enterprises invest in it. Their potential for innovation, hence their existence, depends on it. But that is only a broad conclusion, which will beneﬁt from further reﬁnement after the diﬀerent trajectories marking the emergent divisions between public and private economies of knowledge are considered, in the light of the above analysis of the articulation of economic processes, production and use, distribution and exchange. Each of the three empirical areas, bio-databases, bioinformatic tools and the speciﬁc case of a key bacterial genome, were characterized by extreme ﬂuidity in terms of the emergence of a diﬀerentiation and interdependence between public and private economies of knowledge. For bio-data, in the context of the major political, economic and scientiﬁc battles over the human genome, we examined a number of trajectories of bio-databases, ESTs, SNPs, genes, genomes and proteins. In each case there were nascent markets and, in each case, various forms of public domain database emerged to dominate their economic organization. In the case of bioinformatic tools, a quite diﬀerent pattern emerged, with a clearer divide between public and private bioinformatic tools surviving economically in parallel, although, it must be emphasized, with a dynamic interdependence not only between the tools themselves, but also in relation to the bio-databases. At certain stages, it might have appeared possible that fully commercial markets created by the major players such as Incyte, DoubleTwist and Celera, combining trade and licensing of both comprehensive bio-databases and the relevant associated bioinformatic tools, would dominate the landscape, along with the market provision of the hardware of sequencers, microarray technologies, and the like. And then, in the more focused study of the genome race for Agrobacterium tumefaciens, again both the outcome of multiple genes under DuPont or Monsanto patents, and a non-publication, technical secrecy-protected whole genome, appeared as distinct possibilities.

Evolving economies of knowledge

169

In this case, as a result of dynamic competition and interaction between diﬀerent modes of economic organization, a double publication arose, to be followed by a university–corporate ﬁrm collaboration to expand the Agrobacterium genome database through species comparative genomics. But, here too, it is quite clear that private corporations produced and retained genomic knowledge internally, as well as establishing competitive IPR positions over diﬀerent aspects of the bacterium’s biotechnology applications. This was even more pronounced in DSM’s creation of a boundary of public and private knowledge of the Aspergillus niger genome.

EVOLVING ECONOMIES OF KNOWLEDGE Comparing the mini-histories of these three empirical ﬁelds, we will focus on our two main axes of economic processes, the producer–user diﬀerentiation including the formation of polyvalent uses, and the distribution–exchange articulation. Let us ﬁrst look at the emergence of producer–user divisions. If we consider nucleic acid sequence data, there was a proliferation of diﬀerent types of databases, depending on the length and level of integration of strings of sequence, and associated functionalities. The high-throughput, and shotgun methods of producing data for assembly, enhanced the range and composition of databases. A distinction emerged (and was contested) between a rough draft and a ﬁnal complete genome. There were also important discriminations between assembled and annotated data. Many of these diﬀerentiations can be seen in terms of diﬀerentiations of use, and emergence of polyvalence. Both rough drafts and Expressed Sequence Tags promised rapid identiﬁcation of genes and, hence, over a certain period, appeared to pharmaceutical companies as a type of knowledge output that was closely related to potential therapeutic technologies. A rough draft of a genome, with enough statistical reliability to identify disease-related genes, or ESTs which provided the promise of a quick route to gene identiﬁcation, made the production of such knowledge a commercial proposition. The case of Single Nucleotide Polymorphisms was even clearer: by deﬁnition, a variation in a single base pair related to a phenotypical disease expression was the kind of data that appeared close to market utility. The uses to which these knowledges might be put, for drug targeting or gene therapy, for example, attracted a community of pharmaceutical companies with major investment potential, and with a golden promise of accelerating the drug-development pipeline. But, at the same time, genomic, EST or SNP data was of critical use for the production of further knowledge. A rough draft was a rough draft, only

170

Public or private economies of knowledge?

a phase in a process of knowledge accumulation. It should be remembered that, at the rough draft stage, there were still controversies over the anticipated number of human genes in the genome ranging from 30 000 to 100 000 plus. The passage from a rough to a complete draft was not a trivial matter, especially in the still unresolved issues over identiﬁcation of genes, and understanding of genes. Likewise with ESTs and SNPs: they too contributed to ongoing processes of data accumulation and improvement. As we saw, the construction of a non-redundant set of ESTs proved a qualitative step change, and a theoretical re-analysis of their signiﬁcance. With SNPs, multiple and associated base-pair mutations progressively enriched and complicated the understanding of genetic determination of disease, as well as contributing to understandings of processes of mutation and genetic variation as an evolutionary process. During certain phases which were highlighted in the historical accounts, it was clear that there were tensions and controversies surrounding alternative uses. Diﬀerent interests, related to diﬀerent uses, competed for mobilization of resources for production of such knowledge. Eventually, however, it became clear that, for genomic, EST, SNP and protein databases, the informational value of any sub-set of the data was enhanced by open and continuous additions of data, under shared rules and quality standards. The historic case of ESTs clearly had a formative inﬂuence: the gold rush came to an end when acquired assets were discovered to be relatively valueless. Private production of a sub-set of an EST database was of diminishing returns, particularly as the use of ESTs for unique gene identiﬁcation became progressively eroded. The balkanization of data was proving counterproductive, even for the balkanizing agents. What ceased, however, was antagonistic competition between alternative uses. What emerged, and this is the signiﬁcant point, was a relative stable polyvalence of use. The production of data, whatever its provenance in terms of resourcing or organization, beneﬁted both scientiﬁc and commercial use more when it was pooled, uniﬁed and regulated. It was better for drug target identiﬁcation or disease-elated gene identiﬁcation uses when it was better for scientiﬁc analysis uses. This did not eliminate all markets for such data: specialized niches for SNPs and ESTs, with dedicated data related to particular diseases or gene traits, persisted. But, in the broad picture, even these were in a context of much producer–user diﬀerentiation based on non-competitive, but multiple, uses. The historical development of bioinformatic tools presents a strong contrast, but one that can equally well be analysed in terms of producer–user diﬀerentiation. In the early stages, these tools, especially of sequence alignment and similarity searching, involved the production of algorithms and their operationalization in computer programs. The testing and validation

Evolving economies of knowledge

171

of algorithms, especially their theoretical validity for purposes such as univocal identiﬁcation of genes or protein sequences, still required their operationalization into computer programs before they could be put to use as tools. Eﬀectively, without ready-made computer programs, there was a small community of algorithm and program developers, with little division of labour between producers and users. Eventually, as computer programs incorporating algorithms became available, a clear diﬀerentiation occurred between tool producers and a wide community of molecular biologists, which broadened further with multiple uses made of digitized bio-data. A gateway bioinformatic tool, such as BLAST, was open to polyvalent use, providing ﬁrst access to GenBank, the EBI and other genomic databases. At the time, it appears that there was little question that this key resource, especially once available on the Internet, was ever other than public. Any matching of genetic sequence, for any purpose, employed BLAST as a ﬁrst resource: it was the universal, standard, screwdriver in the toolbag. But our two case studies of the trajectories of SRS and the GCG package demonstrate that diﬀerentiation of producers and users, or, as it were, between knowledge useful to further similar science and knowledge useful to other purposes, is made and not given. As one example, persistent tensions marked the development of SRS, either as a framework software for undertaking scientiﬁc research or as embedded in software that was oriented towards drug target validation, lead drug discovery and integration of chemo-informatic, pre-clinical and clinical trial databases. On the one hand, SRS was being continuously developed by a community of user– producers, whilst, on the other, diﬀerentiation was being created between an in-house software group and a pharmaceutical clientèle. The evidence suggests that precisely the irresolution of this tension led to the eventual collapse of the second alternative, and the stagnation of the ﬁrst. The contrasting case of GCG conﬁrms this interpretation. The software package itself established a sharp division between producers and users quite early in its development, notably by closing oﬀ the development of the source code to all but a small group of company developers. As it evolved, moreover, this division became more entrenched and broadened, as web–interface friendly versions expanded the range of possible uses and the communities adopting them. In parallel, and indeed to an extent in competition with this, EMBOSS, an alternative software package, more highly dedicated to its continued open development for scientiﬁc uses, reinforced the boundaries of a producer–user community. As we suggested, under conditions of competition, it remains an open question as to whether parallel development is an economically sustainable form of polyvalence of use.

172

Public or private economies of knowledge?

Finally, the genome for Agrobacterium tumefaciens was a battleground from which the race to publish ﬁnally emerged victorious, by whom the race was to be won, in this account, being less signiﬁcant. The completion of the genome then became a knowledge input to further development of genomic knowledge, comparing evolutionary divergence between laboratory strains, and between these and other strains of the bacterium. This knowledge in turn was a general epistemic base that enhanced the understanding of biotechnology uses of super-virulent strains used for plant transformation by genetic engineering. The division and interdependency between biotechnological utility and scientiﬁc use for knowledge development was restored, and the potential conﬂict between the two uses, so clearly manifest at the outset of our story, was resolved. As a consequence of the whole genome sequencing projects, many new genes were identiﬁed related to DNA transfer processes and their regulation, knowledge speciﬁcally useful to the biotechnology of genetic engineering, but at the same time of use to further scientiﬁc knowledge accumulation, and the development of more generic understanding of transfer processes. But if resources for sustaining this genome completion project had not been forthcoming – from Monsanto or the NSF – quite a diﬀerent outcome remained a possibility towards the very end of our history. Moreover, it was always also possible that two parallel forms of knowledge production, one dedicated to understanding biotechnology engineering problems, the other to the evolution of pathogenicity from symbiosis, could have developed, without speaking to each other. Taking an overall view of these diﬀerent empirical examples, it is clear that the producer–user division of labour, and indeed the diﬀerentiation and interdependence between uses of particular types of knowledge production, is an historical process. Outcomes depend on allocation of resources, and are not inscribed in any transcendental essence of knowledge production. Moreover, it is clear that, for some knowledge production at least, there is no clear-cut division between use of knowledge for further knowledge production and that for non-knowledge production activities. There is always a complex intertwining between uses of knowledge ‘products’. The concept of polyvalence of use embraces not only this multiorientation potential of knowledge production, but also the wide range of possible diﬀerentiations and interdependencies, as well as separation, competition and disjunction between uses. The historical analysis of the emergence of divisions of labour is thus a core component of understanding the development of knowledge in capitalist economies. A core component, but not the only one. Using this approach, we can now turn to to analyse the evolving and competing articulations between exchange and distribution. At the height

Evolving economies of knowledge

173

of the turbulence surrounding bio-data of various descriptions, both market and non-market modes appeared to be equal contestants, with dire predictions as to the consequence of the thoroughgoing marketization of knowledge. A striking feature of the early phases of the formation of new bio-databases was the relative absence of quality norms and standardization. The shotgun technique raised a storm of controversy as to whether it would ever result in reliable data, eliminating gaps and repeats in DNA sequence. Tissue sampling, and variable expression of DNA in development, raised similar issues for ESTs: how reliable was the technique of reverse engineering DNA sequences from expressed RNA in brain tissues? In almost every area of digitized bio-data production, similar issues of reliability, comparability and ontology emerged. In the big picture, however, all these new forms of bio-database raised a challenge of a new kind to the pre-existing public norms of science. Quality standards for microarray data have been regulated by the Microarray Gene Expression Data Society; the Macromolecular Structure Database, part of the worldwide Protein DataBank (wwPDB), has been engaged in a continuous process of data regulation for deposition and publication (Tagari et al., 2006). Although a simpliﬁcation, the standard format of a biology science publication prior to the data revolution would involve evidence (data) being presented in the context of reporting experimentation, subject to peer review, in the contemporary paradigm of scientiﬁc publication. Data as such, independently of experimental hypothesis testing, presented an entirely new issue for public scientiﬁc validation. The norms simply were not there. The key point to make is that norms are made, and, if new forms of knowledge arise, old norms often prove inappropriate, so requiring the development of new norms. Peer reviewing of data deposition was out of the question, especially given the quantity and speed at which new data was being produced by the new technologies. The making of norms often involves laborious construction, deliberative and strategic international negotiation within groupings of data producers. Institutional clout of leading organizations (the European Bioinformatics Institute, Genbank and so on) contributes to the politics of instituting new norms of appropriation. Our study of the uniﬁcation of US-based and European-based protein databases, PIR and SWISSPROT, demonstrated how a global scale of public appropriation was instituted. At the same time, the merger of these two databases created a single global data standard, and unrestricted Internet access, with free exchange replacing the European two-tier charging institution. One of the most remarkable innovations in the institution of public knowledge concerned the regulation of the temporalities of data deposition (Joly and Mangematin, 1997). The Bermuda rules required raw DNA

174

Public or private economies of knowledge?

sequence data above a low threshold of continuous sequence to be deposited on Genbank on a 24-hourly basis in the human genome project (Bentley, 1996). It is questionable how far such public control extended to genome sequencing in general, as it quite clearly has not applied in many genome projects not involving extensive international collaboration. Nonetheless, it exempliﬁed how the international community of scientists involved in such collaborations actively controlled how data was made public, to ensure that neither commercial interests nor academic reputational advantages interfered with the completion of an international public project. Mertonian rules for public science had proved inadequate for the process of both collaborative production and reputational incentives appropriable by priority of publication. Moreover, as issues of quality arising from instantaneous raw data deposition and of use of such data by scientists other than those directly engaged in the international human genome project surfaced in the course of the following years, by 2000 the National Human Genome Research Institute had considerably revised the conditions, quality norms and temporality requirements (http://www. genome.gov/10000910). A further example of variation of public control over temporalities was provided by the SNP Consortium, protecting information from being privately appropriated, by imposing a short delay of publication while further work was undertaken. It adopted an instrument designed for private appropriation, namely, the instrument for claiming prior art (the Statutory Invention Registrations), for the purpose of public appropriation, ensuring that SNPs would form part of a public database even before they were eventually published. We know that, at the same time as new institutions of public control were emerging for publicly accessible databases, freely distributed by Internet, alternative commercial and market forms of appropriation and distribution were competing – and antagonistically. There was a question as to whether databases would be predominantly publicly or privately appropriated and distributed, and hence also how they would be resourced, by a redistributive mode through taxation, or via exchanges in knowledge markets. Various forms of commercial appropriation were applied to a range of bio-data, patents on genes, ESTs and SNPs ﬁguring prominently among them. Secrecy was also a dominant mode of appropriation, combined with exchange and distribution through licensing, either exclusively to a particular customer, or to a limited range of customers. A conspicuous new feature of the commercial appropriation landscape was that major governmental players, such as the NIH, were actively acquiring private property rights, as were university laboratories, stimulated no doubt by the Bayh–Dole Act. Additionally, major companies were expanding their own in-house ‘epistemic base’ by the internal development of mirror databases,

Evolving economies of knowledge

175

without any necessary prospect of trading in bio-data. These ‘knowledge assets’ were privately appropriated, under the control of the ﬁrm, and protected by ﬁrewalls, yet without any anticipation of market valuation, in product, capital or labour markets. A central feature of this period in the 1990s, typiﬁed by Celera’s strategy during the human genome race, was the absorption of public databases and their integration with internal data, enhancing their value for commercial licensing. But the term ‘antagonistic competition’ describes a competitive process where private appropriation takes place at the expense of public appropriation, or vice versa, rather than a process which results in a diﬀerentiation and interdependency between knowledge objects under private and public control. Patenting of ESTs, SNPs or genes during the genome races, or any kind of bio-data, entailed placing in the public domain bio-data under private control, whatever the nature of the organization exercising that control. This was the critical edge of antagonistic competition, as it pre-empted public appropriation of data in the public domain. Private mirror-databases presented no such threat to the creation of public domain databases. Data could be kept in-house, but thereby did not prevent eventual publication if similar data was developed in the public domain, under public control. Yet, as our empirical analysis of the evolution of bio-databases has demonstrated, by and large, public appropriation and distribution emerged as the dominant institutional economy of knowledge. Certainly, many genes have been patented across many species, but not as widely as anticipated. As the examples of Agrobacterium tumefaciens or Aspergillus niger demonstrated, even those genomes most evidently of commercial interest were published under norms of public science, and distributed through free access genomic databases. The number of patented genes – let alone ESTs or SNPs – is comparatively minuscule compared to publicly controlled genomic data, and identiﬁed genes. Although we will return to this point after considering the evidence on bioinformatic tools, the emergence of new forms of public control of biodata provides a key to the explanation of this broad outcome. The establishment of norms of data quality, the regulation of the temporalities of data deposition, ensured above all that the knowledge accumulation process was continuous: the outputs of data producers became the inputs of further data production in the formation of all the public databases. This has two major consequences. Bio-databases are thereby continuously enhanced in their quality, as norms and standards are reﬁned and developed. Polyvalence of use – securing the use of that data for further data production and for non-knowledge production uses – broadens its basis and scope, so beneﬁting both commercial and public science organizations.

176

Public or private economies of knowledge?

Second, bio-databases are open to continuous revision and addition: the more that is inputed, the more epistemic value is added to that which is already in the database. A bio-database in the public domain is always unﬁnished business: as we have seen, even the concept of a complete ﬁnal draft of a genome is only the marking of a phase, albeit an important one. There is one further aspect to consider in the competition between private and public modes of appropriation and distribution of databases. In many cases, early bio-databases struggled to obtain public resourcing, and this resulted in the examples of variable charging by university-based units (quasi-market exchanges) for data to academic and commercial organizations. Database development was not universally considered to be properly a public domain form of scientiﬁc activity. As public domain modes of appropriation progressively expanded the scope of bio-databases, undermining the possibility of commercially marketed and resourced growth of such databases, increasingly redistributive resourcing grew to be the norm, so ensuring the stability and growth of the new bio-database institutions. In other words, the diﬀerentiation between domains that are publicly resourced through redistributive taxation, or through allocation by corporations of resources for production of knowledge for which they obtain no direct, or even calculable, commercial returns, occurs pari passu with the emergence of new forms of publicly or privately appropriated knowledge. Eventually, the legitimacy – even epistemic superiority and polyvalency of bio-databases – resulted in the growth of redistributive resourcing. Although this is somewhat of a chicken and egg issue, as with chickens and eggs, the explanation lies in evolutionary diﬀerentiation: redistributive resourcing and public knowledge appropriation co-evolved. New forms of public domain appropriation resulted in the expansion primarily of resourcing by state taxation, even if there are also many signiﬁcant examples of redistributive resourcing by commercial organizations, from the giant Merck to the relative minnow DSM, and the institutionally innovative development of consortia. Similar instabilities and tensions around the nature of resourcing strongly marked the early stages of the major bioinformatic software tool instruments. To an extent, the issue has persisted longer, reﬂecting the continuing tensions and instabilities around the producer–user diﬀerentiation discussed above. Part of the tension resides in the nature of the epistemic objects: both GCG and SRS had a dual aspect of instruments for analysing and manipulating existing data and instruments whose design is continuously being modiﬁed as a part of new data creation, so developing new standards, norms and ontologies of bio-data. This tension, to an extent, is ongoing, although we argued that, towards the end of the trajectory, the diﬀerentiation between a commercial GCG package and a scientiﬁc

Evolving economies of knowledge

177

research tool, EMBOSS, may result in a parallel track development, each with distinctive utilities, modes of appropriation and distribution. From a state taxation resourcing perspective, bioinformatic tool development did not ﬁt into the normal pattern of research funding, and continues to confront issues of legitimacy. Part of the turbulence, therefore, is that new epistemic forms present challenges to existing norms of resourcing, appropriation and distribution. Although not necessarily peculiar to bioinformatic tools, there has been considerable institutional innovation, both formal and informal, in modes of appropriation and distribution, both private and public. As with most innovation processes, some innovations endure, others disappear. There was almost continuous experimentation in modes of appropriation and distribution. In the case of sequence analysis technologies, for example, GCG experimented with open source development, where users were enrolled in continuous codevelopment both under a cost-recovery charging system from a universitybased organization and, when ﬁrst trading, on a for-proﬁt basis after becoming an incorporated company. Black-boxing of source code eventually terminated this, which can be seen as a moment that was critical both in redeﬁning producer–user divisions and in instituting a mode of private appropriation for its bioinformatic tools common to many software markets. As the historical evidence demonstrated, however, this private appropriation can only be seen in the context of emergent and innovative forms of public appropriation, as an equivalent bioinformatic tool, EMBOSS, was developed under General Public Licence, where public controls were put in place that actively limited subsequent use, even newly developed uses, from being placed under private commercial control. This novel use of a formal instrument designed for copyright purposes thereby established a quite new extent of public control, eﬀectively restricting opportunities for future commercial software innovation on the back of software development in the public domain. Under this appropriation regime, open sourcing of EMBOSS’s source code ensured that the producer–user model of software instruments provided a developing research instrument for data creation within the bioinformatic scientiﬁc community, and an ongoing, unﬁnished process of knowledge accumulation under public control.6 The twintracking of diﬀerent producer–user relations developed in parallel with new modes of appropriation and distribution.

FORMAL APPROPRIATION INSTITUTIONS Our analysis so far has largely been formulated with respect to how the public–private divide is instituted as an economic process, in terms of the

178

Public or private economies of knowledge?

dynamics of producer–user diﬀerentiation, and modes of distribution and appropriation. Formal institutions, speciﬁcally the role of law in intellectual property rights, have been present in this discussion but only in the background. Recent debates in economic sociology, however, have rightly insisted on the need to include law directly in economic analysis, rather than as a secondary adjunct. Some see norms and laws as emergent from economic processes and agents (Greif, 2000; Sugden, 1989), whereas others have argued that law is directly constitutive of economic phenomena (Swedberg, 2003; Nee and Swedberg, 2005; Hodgson, 2007). Laws of contract, property rights, inheritance and incorporation of ﬁrms are seen as foundational to capitalist political economies (Weber, 1978), and, in that sense, they formally institutionalize central features of these economies. Critically, legal processes of institutionalization, how laws are passed and enforced, have their own distinctive characteristics: they interact with economic phenomena, but are not directly economic phenomena themselves. Put bluntly, laws may formally institute money as legal tender, but the making of laws itself is not primarily a money-making activity, at least in principal, and in most modern democratic systems. There are laws for markets, not markets for laws. In this perspective, the formal laws of intellectual property rights – patent and copyright law – would be seen as centrally constitutive of the public–private divide in relation to markets for knowledge. Just as has been argued for any other type of market transaction (Demsetz, 1967; Heller, 1998), clarity and precision in property rights is essential for market functioning, and this is argued also to be the case for any economy aiming to maximize the value of its knowledge capital (OECD, 2003, 2006). Bringing legal institutionalization now into the foreground of our analysis, two main points can be made, stating them before exploring them in greater depth. First point: the striking feature of capitalist economies of knowledge is the imbalance between the relatively informal institutionalization of public appropriation compared with the formal institution of private intellectual property rights. We have already referred to a broad tendency in the literature that public goods are deﬁned negatively as the absence of private control, exclusion or rivalry. This is also a feature of law: the boundary between public and private is legally deﬁned by what is appropriable privately. Although IPR legal frameworks are oriented to deﬁning private appropriation, ipso facto, they thereby deﬁne knowledge as a public good in terms of what is not privately appropriable. Part of the whole conceptual problem around public knowledge is entangled in a legal concept of utility, for example, characterizing public as opposed to private knowledge as lacking precise utility. In the legal, as well as the economic theory sphere, the public domain is constituted as the non-private, non-utility, domain.

Evolving economies of knowledge

179

Second point: legal institutions change continuously in interaction with modes of economic institutionalization. To say that law is partially constitutive of economic processes is one thing, foundational quite another. We will here highlight the turbulence in formal legal institutions arising from the turbulence in biological economies of knowledge. Law is not a prior condition of economic activity, but is constantly evolving in adaptation to new economic forms of appropriation and distribution.7 So, law is signiﬁcantly but only partially constitutive, not prior-foundational. We need therefore to retain the distinction between modes of legal and economic institutionalization, whilst insisting on their co-evolution. In our analysis of public appropriation, it is clear that there are wellestablished formal rules, but without legal force. In scientiﬁc journal protocols, perhaps the most striking example of interaction and adaptation of these rules was the separation of data deposition (including methodological tests of validity) from journal article publication, both temporal and locational. Data was made public in databases prior to, and independently of, publication of journal articles based on that data. Negotiations took place between database producers and journals, whether for protein or DNA sequence, or other bio-data. These changes in protocols were therefore adapted to new modes of knowledge production, analysed above. Likewise, the Bermuda rules were essentially instituted by a self-governing collaborative group of international scientists. Similar rules were adopted for other public science genomic projects, but they were far from binding, and certainly did not become universal for all genome projects – the Agrobacterium story is one tiny piece of evidence of that. Infringement of norms of public appropriation and distribution become the focus of scandals and controversies, but not litigation, so conspicuous in the recent history of IPR over bioinformatic information, instruments and technologies.8 Disputes over priority and recognition are a consequence,9 and no doubt form part of the moral regime necessary for the enforcement of rules. The negotiated resolution of the Agrobacterium tumefaciens race, and joint publication of back-to-back articles in Science, is a model of conﬂict avoidance, pre-empting the possibility of subsequent disputes over public rights. However, perhaps the most striking evidence of the domination of private over public formal rights can be seen in the NIH patenting of ESTs, the use of Statutory Invention Registration, General Public Licensing and GAIN certiﬁcates. In each of these cases, legal instruments designed to institute private property rights were diverted to the alternative purpose of formalizing public property rights. Although this is less clear-cut in the case of the NIH ﬁling of patent applications – it remained unclear what NIH would have done had the applications been successful – one possible

180

Public or private economies of knowledge?

outcome would have been equivalent to a public ‘patent buy-out’ (Kremer, 1998), making the ESTs publicly and freely available, and pre-empting commercially motivated patenting, especially by non-US organizations. The use of the patent law instrument of SIRs by the SNP Consortium in order to ensure public legal rights over priority over all newly discovered SNPs, however, eﬀectively hi-jacked IPR to the purposes of preventing commercial or private IPR. Similarly, as copyright legislation is more typically used for acquiring IPR over software, the GPL ‘adapted’ this legislation by inserting clauses into a copyright provision restricting opportunities for subsequent commercial copyrighting of all software derivatives. GPL is eﬀectively a public copyright, owned by the Free Software Foundation. These examples of inventive re-routing of private IPR for public IPR are powerful evidence of the absence of formal legal instruments of public knowledge appropriation. The other major form of public appropriation involves ‘bureaucratic’ regulatory controls, rather than legally formalized codes. Many governmental research funding organizations (NIH, NSF, the US National Research Council, UK Research Councils and so on) develop rules and attach conditions to the provision of resources for knowledge production. These may stipulate appropriate forms of distribution and publications, even specifying quality standards and sites or modes of distribution. There has been a constant evolution of these bureaucratic controls and institutions. They exert a changing public control over knowledge production, its outputs and distribution. Enforcement of these bureaucratic modes of public appropriation, however, is quite another matter, relatively underdeveloped in contrast to legal responses to infringement of private property rights. Much relies on good will and trust within research communities ﬁnanced by public monies. The revolution in biological science and technology stimulated experimentation in forms of public appropriation in the vacuum of formal and legal institutional norms, where one might expect more degrees of freedom. Nonetheless the same is true of the increasing challenge to the relatively densely ﬁlled space of patent legislation and formal private IPR. Interestingly, Eisenberg, who has tracked the patenting controversies surrounding genomic information over two decades (Eisenberg, 1989, 1991, 2000a, 2000b, 2002, 2003, 2006; Eisenberg and Nelson, 2002), has recently argued that patenting of genes only became legally controversial during the period that we have identiﬁed as the emergence of polyvalence of use for bio-data in the 1990s. She speaks of the ‘mixed mission’ of genes used as research tools and diagnostic tests, and the shift that occurred when ‘patenting genes started to look less like patenting end products, and more like patenting scientiﬁc information’ (Eisenberg, 2002, pp. 1382, 1383). In the

Evolving economies of knowledge

181

early phases, before automated DNA sequencing, and before the mass production of data for bio-databases became a diﬀerentiated scientiﬁc activity, technologies of gene cloning related to speciﬁc therapeutic proteins of deﬁned utility. However one regards patenting of genes in retrospect (Thomas et al., 2002; Andrews, 2002), during this early phase, there appeared to be no challenge to established patent law, which simply treated these claims as they would claims on chemical molecules. As the ﬁrst test to challenge established legal institutions of patenting, ESTs have exempliﬁed how the critical criteria of utility and nonobviousness were destabilized by the revolution in the production of biodata with polyvalent use. With ESTs, it was seen how the initial NIH patent claims were broad, and rejected on all ﬁve counts of patentability (utility, non-obviousness, disclosure, replicability and novelty). Increasingly, however, the focus of the challenge was narrowed down to utility and, in particular, how the Brenner v Manson decision taken in a previous epoch,10 enshrining ‘substantial and speciﬁc’ utility, would be applied to the newly emergent bio-data. As EST patent ﬁlings multiplied, it became clear that they stretched legal interpretations of utility to breaking point. In reaction, Revised Utility Examination Patent Guidelines were issued by the US Patent Oﬃce in 1999, and consolidated by the implementation of new guidelines in 2001. The matter did not rest there, as test cases were pursued in courts, culminating in In re Fisher (May 2005) when the United States Courts of Appeal for the Federal Circuit upheld a decision against Monsanto’s patent claims of ESTs derived from maize tissue, made on the grounds of their utility as research tools only (Zuhn, 2005; Davis et al., 2005). The Appeals Court rejected the claim, arguing that they were not raising the bar of utility, but conﬁrming the requirement for substantial and speciﬁc use. ESTs were deemed by law to be inputs for further research, rather than instruments of research. The process can be seen as one of continuous legal redeﬁnition of the concept of utility. If this core feature of patent law was continuously tested and redeﬁned during this period, so too was a critical exemption from the force of private appropriation by patent, the freedom to use information published in patents for ‘experimental use’ for further knowledge production (Eisenberg, 1989). In eﬀect this exemption legally established a utility diﬀerentiation between use for further knowledge production and use of knowledge for commercial purposes for products or services. The assumption behind this exemption was that non-proﬁt organizations, especially universities, could use patented knowledge or techniques or copied equipment, provided they did so in the course of research. The Bayh–Dole Act, however, destabilized the exemption inasmuch as it had been seen to attach to the organization rather than the activity. Post Bayh–Dole, university

182

Public or private economies of knowledge?

research was now strongly reoriented to commercial exploitation and securing income from patenting university research (Eisenberg, 1989, 2003; Walsh et al., 2003, 2005). Two quite contrary legal developments around the experimental use exemption demonstrate unﬁnished and still contested grounds for legal institutionalization. On the one hand, the notorious Madey v Duke decision in 2002 characterized any research undertaken by universities as ‘legitimate business objectives’, so removing exemption for experimental use. In eﬀect, the decision blurred the diﬀerentiation between use for further knowledge production and use for production of commercial goods and services. The contrary decision was taken in the Merck v Integra Lifescience Supreme Court decision,11 where Merck, a commercial enterprise, was deemed to have used patented knowledge for experimental use in preparing clinical trials for FDA approval, a not directly commercial end. Even after 20 years, therefore, as Eisenberg has remarked, far from achieving a relatively mature stage of readjustment and stability, ‘the patent system is struggling to clarify the ground rules for patenting DNA sequences, while years’ worth of patent applications accumulate at the PTO’ (Eisenberg, 2000b, p. 784). These legal institutional developments, critical as they are for setting the boundaries between privately appropriable and non-appropriable knowledge, suggest quite a complex interaction with economically instituted processes of appropriation. It must always be emphasized that legal appropriation accompanies and complements technical appropriation and subsequent licensing and, as we have highlighted, the competition with public appropriation and distribution. For those that argue that law is constitutive of economic activity (Swedberg, 2003; Hodgson, 2007), it is important to recognize that law is not prescriptive of economic behaviour. The existence of private intellectual property rights does not impel actors to avail themselves of them. Nobody was obliged or constrained (even stimulated or incentivized) to patent ESTs as such, as against other forms of genomic data. How and why the practice of patenting ESTs arose are additional questions relating to the selection of practices in complex environments. Suddenly, there was an explosion of EST patent ﬁling in the mid-1990s, almost as an experimental testing of the limits of patent law, rather than an automatic consequence of patent law. As Walsh, Cho and Cohen’s survey suggests, the Madey v Duke decision did not result in commercial ﬁrms pursuing scientists in universities for breaching patents through experimental use. In practice, the exemption for experimental use persisted unchallenged. Scientists’ behaviour on the ground – in universities – continued much as before (Walsh et al., 2005). These historical examples suggest that, as new kinds of knowledge object emerged, diﬀerent actors often pursuing conﬂicting interests availed

Evolving economies of knowledge

183

themselves of existing legal instruments. By extending their application, laws were stretched to the point where they were seen to conﬂict with, and contradict, some of their own normative codes and principles. Rules were then adjusted and, given their formal legal character, court cases also followed, gravitating upwards to courts of ﬁnal appeal. A speciﬁc process of legal change and institutionalization ensued, once more framing economic activities of appropriation and distribution. And so the legal institutionalization process advances, passing through periods of relative stability and change. There is a continuous to-ing and fro-ing between the non-formal instituting of economic practices and their formal institutionalization in legal framing instruments. The dynamic of legal reform has a speciﬁcity, a diﬀerent scansion and temporality to the dynamic of economically instituting economies of knowledge. Markets collapse, with ﬁnancial impact and force. Laws are rescinded or modiﬁed, precedents are established or overlaid by court decisions, through judicial and political processes. Both dynamics, in interaction, contribute to the division between public and private knowledge. Laws do not script economic processes, and economic processes do not determine laws. Surveying the processes of appropriation (economically instituted and formal–legal) and distribution, perhaps the most important conclusion is that, whether speaking of private property or of public good, the institutions and nature of commercial property and public appropriation are continuously changing. Secondly, with regard to knowledge, its modes of appropriation and distribution are quite speciﬁc, following their own historical trajectories. Diﬀerent spheres of economic activity develop their own modes of appropriation and distribution, economically and legally instituted. Capital assets and labour are strongly contrasted in their historical evolution in this regard, and within the provision of goods and services also newly emergent modes of appropriation and distribution are widespread and diﬀerentiated. We need to abandon standardized, universal or static conceptions of public good and private property.

KNOWLEDGE AND THE MULTI-MODALITY OF CAPITALIST ECONOMIES An IEP approach, therefore, although focusing centrally on changing conﬁgurations of economic processes, also thereby invites complementary analyses of legal, political and other instituted processes. Staying largely within the speciﬁcally economic domain, however, four main theses have been advanced in this book. Firstly, if growth of knowledge is in part itself economic growth and in part contributory to general economic growth in

184

Public or private economies of knowledge?

most societies, capitalist political economies create a divide between public and private economies knowledge. It is a persistently unstable ‘internal frontier’ of capitalism. Second, the boundaries of that societal division are constantly shifting and developing and, as Eisenberg and Nelson (2002) have argued, are a source of ‘fruitful tension’. We have pushed this argument a bit further by suggesting that this tension is itself a central dynamic in the growth of economies of knowledge, and the economy more broadly. We have seen competition and conﬂict, frontier wars, antagonistic creation and destruction of market spaces and public domains, as expressions of this ‘fruitful tension’. Third, if capitalist knowledge economies are in signiﬁcant part non-market, public modes of production, distribution, appropriation and use, then capitalist economies must be seen as multimodal, driven by interacting dynamisms, market and non-market. Capitalism is not driven by a single engine of progress, nor can it be characterized in terms of a market motor set in motion by combustion of a ‘knowledge fuel’. A central feature of its constitution is its non-market dynamism, with its own historical force, albeit in interaction with market dynamisms. And so, fourth, turbulence in knowledge, the emergence of new forms of knowledge and the creation of new divisions of knowledge labour all create turbulence at the frontiers of the economic divide between public and private knowledge. This book has been focusing on the shifting and contested boundaries, rather than those activities, epistemic and economic, that are more institutionally separated and distanced. By looking at emergent new divisions of labour, and new types of knowledge, we analysed how economies of knowledge around particular knowledge objects were instituted. At the boundaries, there was a kind of no-man’s-land where either public or private forms of production, appropriation, distribution and use could invade and dominate. When exploring the producer–user division, with respect to diﬀerent types of knowledge, there was evidence of competing and conﬂicting uses, knowledges with settled polyvalent use, and parallel developments of knowledge for diﬀerent uses, the twin- or multi-track phenomenon. At the same time, innovations in modes of distribution and in the institutions of appropriation, public and private, also resulted in conﬂicting alternatives, sometimes eventually stabilizing in a division of the spoils, at others in domination by one or the other. Distinctive economies of knowledge were instituted around objects of knowledge when they continuously reproduced themselves over time, historical conﬁgurations articulating the four processes in a coherent manner, regeneratively resourced either by market transactions or by redistributive channels. Particular emphasis has been placed on the emergence of new public economies of knowledge, around databases, genomes and software, partly because so little attention

Evolving economies of knowledge

185

has been paid to a positive analysis of the public domain, and partly because, against prevailing wisdom and indeed political and legislative intention, public economies of knowledge have grown and developed new forms. But, of course, given the ‘great divide’ is within capitalist political economies, however acute the frontier frictions between public and private economies at times of turbulence, there is also interdependency between them. Polyvalence of use of the public resource, in Mokyr’s terms the ‘epistemic base’, deﬁnes the speciﬁc character of the public within capitalist economies. There is a meta-division of labour, articulated by diﬀerent economic modes, without superordinate coordination. Where to forge the divide? It seems strange that, in the thick of great conﬂicts, at the pinnacle of human knowledge, the US Court of Appeal debated whether a fragment of DNA was similar or dissimilar to a microscope, hence whether it was either open or not to private appropriation. This historically persistent rift constantly manifests itself in tremors with novel seismic signatures. Turbulence and conﬂict arose precisely because these bits of bio-data were both at the same time: research tools and scientiﬁc data, the microscope and what is seen through the microscope. Polyvalence of use of knowledge points to the fact that there is no natural divide and, more importantly, no immanent binary opposition inscribed in knowledge. Division between public and private knowledge is an ever-changing historical artefact of capitalism. Public economies of knowledge, once diﬀerentiated and instituted, are resources equally for further expansion of the public economy of knowledge, for use for technical knowledge creation, and for production of non-knowledge products and services in markets. Strings of DNA, analysed and compared by software tools, can be both public-for-the public economies, and public-for-the-market economies. Above all, this is an historical account of how the public–private divide emerges and is continuously subject to change and variation. An instituted economic process analysis grasps this creation and destruction of institutional forms of production, distribution, appropriation and use that underpin what makes knowledge private and public, market and non-market. These are not ﬁxed categories or static explanations, whether applied to epistemic or to economic forms, to propositions or to property. Rather, processes of division and interdependency between public and private economies of knowledge are continuously generating new forms, new experimentations with epistemic practice and economic organization. Past decades have witnessed a rich variety where turbulence in knowledge creation was simultaneously turbulence in the economic organization of capitalism.

186

Public or private economies of knowledge?

NOTES 1. 2.

3.

4.

5. 6.

7.

8. 9.

10. 11.

We can speculate that the diﬀerentiation between knowledge and non-knowledgeproducing activities occurs diﬀerently as a consequence of the presence or absence of such mapping, but this is a question beyond the scope of this book. Mokyr (2002) frequently emphasizes the small handful of individuals creating the epistemic base (ibid., ch. 2, especially, pp. 66, 291) and the diminishing costs of accessing it for a wider community of ‘fabricants’, a division of labour that was surely being transformed even by the middle of the nineteenth century. We here develop the point made by Nelson and Winter (1982), and their argument that tacitness of knowledge is always a matter of degree reﬂecting the extent to which languages and representations are shared by social groupings, rather than notions of tacitness that suggest intrinsically incommunicable knowledge. The early history of this process of creating new forms of control by variously constituted publics from the seventeenth century has been explored: the exhibition experiment, the publication of notes of experimentation, and, with the growth of chemistry, the narrowing down of the ‘public’ at large to the community of the discipline and professional norms of validation (Law and Williams, 1982; Schaﬀer, 1983; Shapin and Schaﬀer, 1985; Collins, 1988; Stewart, 1992; Golinski, 1992. See also Callon, 1994.) The Public Library of Science: the ﬁrst journal to appear in this arrangement was PLoS Biology. As already noted, however, and described in a previous chapter, prevailing public norms of not resourcing software development as a research activity have obstructed its ﬁnancing under a redistributive regime, and EMBOSS has struggled in its ambition to secure a stable resource stream. In the very diﬀerent sphere of the evolution of labour law, Deakin and Wilkinson (2005; Harvey, 2007b) have demonstrated how legal codes co-evolve with changes in economic organization of labour markets and ﬁrm organization, also insisting that legal processes and dynamics have their own speciﬁc modalities. Eisenberg cites eight major cases between 1990 and 1996, many involving Genentech (Eisenberg, 2000b). The non-recognition of Rosalind Franklin in the discovery of the helical structure of DNA is a classic example (Maddox, 2003). The race between Guillemin and Schally in founding cerebral endocrinology (Wade, 1978), or the competing claims for discovery of the HIV virus, demonstrate the ferocity of claims for recognition and priority in publication. 383 US 519 (1966). 545 US (2005), Supreme Court, 13 June.

Appendix: list of interviewees The organizational aﬃliations are those at the time of interview Paolo Arruda, Unicamp, Campinas, Brazil Peter Artymiuk, Sheﬃeld University, UK Michael Ashburner, European Bioinformatics Institute, Hinxton, UK Terry Attwood, University of Manchester, UK Scott Baker, Northwestern Paciﬁc National Laboratories, USA Richard Baldock, Medical Research Council, Human Genetics Unit, UK Ewan Birney, European Bioinformatics Institute, Hinxton, UK Walter Blackstock, Cellzome London, UK Alan Bleasby, Human Genome Mapping Project, Hinxton, UK Erich Bornberg-Bauer, University of Manchester, UK Andy Brass, University of Manchester, UK Simon Brocklehurst, Cambridge Antibody Technology Ltd, UK David Broomhead, University of Manchester, UK Georg Casari, Cellzome, Heidelberg, Germany Michael Cole, Amersham, UK John Devereux, Genetics Computer Group, Madison, USA Patrick Devitt, GlaxoSmithKline, Stevenage, UK Alan Doyle, Wellcome Trust, UK Will Dracup, NonLinear Dynamics, UK Richard Durbin, Sanger, UK Thure Etzold, LION Bioscience and European Bioinformatics Institute, Cambridge, UK Sergio Salles Filho, FINEP, Rio de Janeiro, Brazil Tom Flores, Accelrys, Cambridge, UK Norman Freshney, Oﬃce of Science and Technology, DTI, UK John Garrison, Accelrys, San Diego, USA Carol Goble, University of Manchester, UK Nick Goldman, European Bioinformatics Institute, Hinxton, UK Brad Goodner, Hiram College, USA Martin Gouldstone, Conﬁrmant, Abingdon, UK Philip Harriman, Microbial Genomics Programme, National Science Foundation, USA

187

188

Public or private economies of knowledge?

Maryanna Henkart, Microbial Genomics Programme, National Science Foundation, USA Kim Henrick, European Bioinformatics Institute, Hinxton, UK David Hodgkinson, Quintessa, UK Charlie Hodgman, GlaxoSmithKline, Stevenage, UK Leroy Hood, Institute for Systems Biology, Seattle, USA David Ingram, CHIME, UK Scott Kahn, Accelrys, San Diego, USA Rajinder Kaul, University of Washington Genome Center, Seattle, USA Angus King, Ace Biosciences, Odense, Denmark Joao Kitajima, Unicamp, Campinas, Brazil Andrew Lyall, Oxford GlycoScience, Abingdon, UK Paul Mathews, European Bioinformatics Institute, Hinxton, UK Iain Mattaj, European Molecular Biology Laboratory, Heidelberg, Germany Phil McHale, MDL Informatics, San Leandra, USA Diane McLaren, Medical Research Council, UK Joao Meidanis, Unicamp, Campinas, Brazil Colin Miles, Biotechnology and Biological Sciences Research Council, UK Steve Muggleton, Imperial College, London Gene Nester, University of Washington, Seattle Steve Oliver, University of Manchester, UK Helen Parkinson, European Bioinformatics Institute, Hinxton, UK David Parry-Smith, Biofocus plc, UK Jose Fernando Perez, FAPESP, Sao Paulo, Brazil Chris Ponting, MRC Functional Genetics Unit, Oxford Chris Rawlings, Oxagen Ltd, UK Fernando Ribeiro, FINEP, Rio de Janeiro, Brazil Peter Rice, European Bioinfromatics Institute, LION Bioscience, Cambridge, UK Anaclaudia Rosera, University of San Paulo, Brazil Reinhard Schneider, LION Bioscience, Heidelberg, Germany Joao Setubal, Unicamp, Campinas, Brazil Malcolm Skingle, GlaxoSmithKline, Stevenage, UK Steve Slater, Monsanto (Cereon), Cleveland, USA Hein Stam, DSM, Netherlands Halldor Stefansson, European Molecular Biology Laboratory, Heidelberg, Germany Susie Stephens, Sun Microsystems, UK Mark Swindells, Inpharmatica, London, UK Lesley Thompson, Engineering and Physical Sciences Research Council, UK Janet Thornton, European Bioinformatics Institute, Hinxton, UK Jean-François Tomb, DuPont, Wilmington, USA

Appendix

David Willshaw, University of Edinburgh, UK Olaf Wolkenhauer, University of Manchester, UK Derek Wood, University of Washington, Seattle, USA Alisdair Wotherspoon, Biotechnology Directorate, DTI, UK David Zirl, IBM, San Jose, USA

189

Bibliography Adams, M. and C. Venter (1996), ‘Should non-peer-reviewed raw DNA sequence data release be forced on the scientiﬁc community?’, Science, 274 (5287), 534–6. Adams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnik, M.H. Polymeropoulos, Xiao Hong, C.R. Merril, A. Wu, B. Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie and J.C. Venter (1991), ‘Complementary DNA sequencing: expressed sequence tags and Human Genome Project’, Science, 252, 1651–6. Adler, R.D. (1992), ‘Genome research: fulﬁlling the public’s expectations for knowledge and commercialization’, Science, 257, 908–14. Altschul, S.F., W. Gish, W. Miller, E.W. Myers and D.J. Lipman (1990), ‘Basic local alignment search tool’, Journal of Molecular Biology, 215, 403–10. Anderson, C. and P. Aldhous (1991), ‘Secrecy and the bottom line’, Nature, 354, 96. Andrade, M.A., N.P. Brown, C. Leroy, S. Hoersch, A. de Daruvar, C. Reich, A. Franchini, J. Tamames, A. Valencia, C. Ouzounis and C. Sander (1999), ‘Automated genome sequence analysis and annotation’, Bioinformatics, 15, 391–412. Andrews, L.B. (2002), ‘Genes and patent policy: rethinking intellectual property rights’, Nature Review Genetics, 3, 804. Ansorge, W., B. Sproat, J. Stegemann, C. Schwager and M. Zenke (1987)‚ ‘Automated DNA sequencing: ultrasensitive detection of ﬂuorescent bands during electrophoresis’, Nucleic Acids Research, 15 (11), 4593–602. Apweiler, R., A. Bairoch, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopz, M. Magrane, M.J. Martin, D.A. Natale, C. O’Donovan, N. Redaschi and L.-S. Yeh (2004), ‘UniProt: the universal protein knowledgebase’, Nucleic Acids Research, 32, 115–19. Arrow, K.J. (1962), ‘Economic welfare and the allocation of resources for invention’, in R.R. Nelson (ed.), The Rule and Direction of Inventive Activity, Princeton: Princeton University Press. Attwood, T.K. (2000), ‘The Babel of bioinformatics’, Science, 290 (5491), 471–3. Attwood, T.K. and D.J. Parry-Smith (1999), Introduction to Bioinformatics, Harlow: Prentice-Hall. 191

192

Public or private economies of knowledge?

Auth, D.R. (1997), ‘Are ESTs patentable?’, Nature Biotechnology, 15, 911–12. Bairoch, A. (2000), ‘Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times!’, Bioinformatics, 16 (1), 48–64. Bairoch, A., R. Apweiler, C.H. Wu, W.C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang, R. Lopz, M. Magrane, M.J. Martin, D.A. Natale, C. O’Donovan, N. Redaschi and L.-S. Yeh (2005), ‘The universal protein resource (UniProt)’, Nucleic Acids Research, 33, 154–9. Baker, S.E. (2006a), ‘Aspergillus niger genomics: past, present and into the future’, Medical Mycology, 44, S17–S21. Baker, S.E. (2006b), ‘What’s new in the Aspergillus niger genome?’, Presentation to the Advances Against Aspergillosis Conference, Athens, Greece, 25 February. Baker, S.E. (2006c), ‘An accelerated fungal genome sequencing program for energy security and bioenergy’, Fungal Genomics Steering Committee. Balmer, B. (1996), ‘Managing mapping in the Human Genome Project’, Social Studies of Science, 26 (3), 531–73. Bentley, D. (1996), ‘Genomic sequence should be released immediately and freely in the public domain’, Science, 274 (5287), 533–4. Binns, A.N. and M.F. Thomashow (1988), ‘Cell biology of Agrobacterium infection and transformation of plants’, Annual Review of Microbiology, 42, 575–606. Bio-IT World (2002), Interview, ‘John Craig Venter Unvarnished’, 12 November. Boguski, M.S. (1995), ‘The turning point in genome research’, Trends in Biochemical Science, 20, 295–6. Boguski, M.S., C.M. Tolstoshev and D.E. Bassett (1994), ‘Gene discovery in dbEST’, Science, 265, 1993–4. Botstein, D. (1999), The Biomedical Information Science and Technology Initiative, National Institute of Health Report, June, http://www.nih. gov/about/director/060399.htm. Braun, A. (1982), ‘The history of the crown gall problem’, in G. Kahl and J. Schell (eds), Molecular Biology of Plant Tumors, New York: Academic Press, pp. 155–210. Brazma, A. and H. Parkinson (2004), A Quick Introduction to Elements of Biology – Cells, Molecules, Genes, Functional Genomics, Microarrays, European Bioinformatics Institute. Brazma, A., R. Robinson, G. Cameron and M. Ashburner (2000), ‘Onestop shop for microarray data’, Nature, 403, 699–700. Brenner, S. (1990), ‘The Human Genome: the nature of the enterprise’, Human Genetic Information: Science, Law and Ethics, Ciba Foundation Symposium Papers, 149, 6–12.

Bibliography

193

Broothaerts, W. et al. (2005), ‘Gene transfer to plants by diverse species of bacteria’, Nature, 433, 629–33. Bush, V. (1945), Science: The Endless Frontier, Washington, DC: Government Printing Oﬃce. Butler, D. (2001), ‘Are you ready for the revolution?’, Nature, 409, 758–60. Callon, M. (1994), ‘Is science a public good?’, Science, Technology and Human Values, 19 (4), 395–424. Cannon, G.C. (1987), ‘Sequence analysis on microcomputers’, Science, 238, 97–104. Caskey, C.T. and A.R. Williamson (1996), ‘Merck, SmithKline and patents’, Nature, 381, 360. Chilton, M.-D. (2001), ‘Agrobacterium. A memoir’, Plant Physiology, 125, 9–14 January. Chilton, M.-D. (2005), ‘Adding diversity to plant transformation’, Nature Biotechnology, 23 (3), 309–10. Chilton, M.-D., M.H. Drummond, D.J. Merlo, D. Sciaky, A.L. Montoya, M.P. Gordon and E.W. Nester (1977), ‘Stable incorporation of plasmid DNA into higher plant cells: the molecular basis of crown gall tumorigenisis’, Cell, 11, 263–71. Clasquin, B., N. Moncel, B. Friot and M. Harvey (eds) (2004), Wage and Welfare. New Perspectives for the Analysis of Changes in Employment and Social Rights in Europe, Brussels: Peter Lang, reprinted 2006. Cohen, W.M. and D.A. Levinthal (1990), ‘Absorptive capacity: a new perspective on learning and innovation’, Administrative Science Quarterly, 35, 128–52. Collins, H.M. (1988), ‘Public experiments and displays of virtuosity, the core-set revisited’, Social Studies of Science, 17, 689–713. Cook-Deegan, R. (1994), The Gene Wars. Science, Politics and the Human Genome, New York: Norton. Coriat, B. and F. Orsi (2002), ‘Establishing a new intellectual property rights regime in the United States. Origins, content and problems’, Research Policy, 31, 1491–507. Coronini, R., M.-A. de Looze, P. Puget, G. Bley and S.V. Ramani (2003), ‘Decoding the literature on genetic variation’, Nature Biotechnology, 21, 21–9. Cowan, R., P.A. David and D. Foray (2000), ‘The explicit economics of knowledge codiﬁcation and tacitness’, Industrial and Corporate Change, 9 (2), 211–53. Dasgupta, P. and P.A. David (1994), ‘Toward a new economics of science’, Research Policy, 23, 487–521. David, P.A. (1993), ‘Intellectual property institutions and the panda’s thumb: patents, copyrights, and trade secrets in economic theory and

194

Public or private economies of knowledge?

history’, in M. Wallerstein, M. Mogee and R. Schoen (eds), Global Dimensions of Intellectual Property Rights in Science and Technology, Washington, DC: National Academy Press. David, P.A. (2001a), ‘From keeping “nature’s secrets” to the institutionalisation of “open science” ’, Discussion Papers in Economic and Social History, 23, July, University of Oxford, pp. 1–23. David, P.A. (2001b), ‘Tragedy of the public knowledge “commons”?’, Global Science, Intellectual Property and the Digital Technology Boomerang, MERIT. David, P.A. (2004), ‘Can “open science” be protected from the evolving regime of IPR protections?’, Journal of Institutional and Theoretical Economics, 160 (1), 9–34. Davis, P.K., J.J. Kelly, S.P. Caltrider and S.J. Heinig (2005), ‘ESTs stumble at the utility threshold’, Nature Biotechnology, 23 (10), 1227–9. Deakin, S. and F. Wilkinson (2005), The Law of the Labour Market. Industrialisation, Employment and Legal Evolution, Oxford: Oxford University Press. De Solla Price, D. (1984), ‘The science/technology relationship, the craft of experimental science, and policy for the improvement of high technology innovation’, Research Policy, 13, 3–20. Demsetz, H. (1967), ‘Toward a theory of property rights’, American Economic Review, 57 (2), 347–59. Devereux, J., P. Haeberli and O. Smithies (1984), ‘A comprehensive set of sequence analysis programs for the VAX’, Nucleic Acids Research, 12, 387–95. Doll, J.J. (1998), ‘The patenting of DNA’, Science, 280, 689–90. Doolittle, R.F. (1981), ‘Similar amino acid sequences: chance or common ancestry’, Science, 214, 149–59. Doolittle, R.F., M.W. Hunkapiller, L.E. Hood, S.G. Devare, K.C. Robbins, S.A. Aaronson and H.N. Antoniades (1983), ‘Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet derived growth factor’, Science, 221, 275–7. Dworkin, G. (1997), ‘Should there by property rights in genes?’, Philosophical Transactions, 352 (1357), 1077–86. Eisenberg, R.S. (1989), ‘Patents and the progress of science: exclusive rights and experimental use’, University of Chicago Law Review, 56 (3), 1017–86. Eisenberg, R.S. (1991), ‘Genes, patents, and product development’, Science, 257 (5072), 903–8. Eisenberg, R.S. (2000a), ‘The public domain in genomics’, unpublished manuscript available at www.law.nyu.edu/ili/conferences/freeinfo2000/ abstracts/eisengberg.html.

Bibliography

195

Eisenberg, R.S. (2000b), ‘Re-examining the role of patents in appropriating the value of DNA sequences’, Emory Law Journal, 49, 783–800. Eisenberg, R.S. (2002), ‘Why the gene patenting controversy persists’, Academic Medicine, 77 (12), 1381–7. Eisenberg, R.S. (2003), ‘Patent swords and shields’, Science, 299, 1018–19. Eisenberg, R.S. (2006), ‘Patents and data-sharing in public science’, Industrial and Corporate Change, 15 (6), 1013–31. Eisenberg, R.S. and R.R. Nelson (2002), ‘Public vs. proprietary science: a fruitful tension?’, Daedalus, 131 (2), 89–101. Ellis, L.B.M. and T.K. Attwood (2001), ‘Molecular biology databases: today and tomorrow’, Drug Discovery Today, 6 (10), 509–13. Ellis, L.B.M. and D. Kalumbi (1998), ‘The demise of public data on the web’, Nature Biotechnology, 16, 1323–4. Ellis, L.B.M. and D. Kalumbi (1999), ‘Financing a future for public biological data’, Bioinformatics, 15 (9), 717–22. Etzold, T., A. Ulyanov and P. Argos (1996), ‘SRS: information retrieval system for molecular biology data banks’, Methods in Enzymology, 266, 114–28. Fields, C., M.D. Adams, O. White and J.C. Venter (1994), ‘How many genes in the human genome?’, Nature Genetics, 7, 345–6. Flattmann, G.J. and J.M. Kaplan (2001), ‘Patenting expressed sequence tags and single nucleotide polymorphisms’, Nature Biotechnology, 19, 683–4. Foray, D. (2004), The Economics of Knowledge, Cambridge, MA: MIT Press. Gelvin, S.B. (2003), ‘Agrobacterium-mediated plant transformation: the biology behind the “gene-jockeying” tool’, Microbiology and Molecular Biology Reviews, 67 (1), 16–37. Gilbert, W. (1991), ‘Towards a paradigm shift in biology’, Nature, 349, 99. Gillis, A.M. (1992), ‘The patent question of the year’, Bioscience, 42, 5. Goldman, B. (1998), ‘SNPs: patent la diﬀérence’, Signals, 19 (5), http:// www.recap.com/signalsmag.nsf. Golinski, J. (1989), ‘A noble spectacle. Phosphorus and the public culture of science in the early Royal Society’, Isis, 80, 11–39. Golinski, J. (1992), Science as Public Culture: Chemistry and Enlightenment in Britain, 1760–1820, Cambridge: Cambridge University Press. Goodner, B. et al. (1999), ‘Combined genetic and physical map of the complex genome of Agrobacterium tumefaciens’, Journal of Bacteriology, 5160–66. Goodner, B. et al. (2001), ‘Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58’, Science, 294, 2323–8.

196

Public or private economies of knowledge?

Green, P. (1997), ‘Against a whole-genome shotgun’, Genome Research, 7, 410–17. Greif, A. (2000), ‘The fundamental problem of exchange: a research agenda in historical institutional analysis’, European Journal of Economic History, 4, 251–84. Grindrod, P. (2001), UK Bioinformatics for Functional Genomics: Watching the Detectives, Numbercraft. Gudmundur, A.T. and L.D. Stein (2003), ‘The SNP Consortium website: past, present and future’, Nucleic Acids Research, 31 (1), 124–7. Hamilton, D.P. (1992), ‘Venter to leave NIH for greener pastures’, Science, 257, 151. Hardin, G. (1968), ‘The tragedy of the commons’, Science, 162 (3859), 1243–8. Hardin, G. (1998), ‘Extensions of the “Tragedy of the Commons” ’, Science, 280 (5364), 682–3. Harvey, M. (2007a), ‘Instituting economic processes in society’, in M. Harvey, R. Ramlogan and S. Randles (eds), Karl Polanyi: New Perspectives on the Place of the Economy in Society, Manchester: Manchester University Press. Harvey, M. (2007b), ‘Instituted or embedded? Legal, ﬁscal and economic institutionalisation of markets’, CRIC Discussion Paper 82. Harvey, M. and A. McMeekin (2002), UK Bioinformatics: Current Landscapes and Future Horizons, DTI Report, London: DTI. Harvey, M. and A. McMeekin (2004), ‘Public–private collaborations and the race to sequence Agrobacterium tumefaciens’, Nature Biotechnology, 22 (7), 807–10. Harvey, M. and S. Randles (2002), ‘Market exchanges and “instituted economic process”: an analytical perspective’, Revue d’Economie Industrielle, 101, December, 11–30. Harvey, M., S. Quilley and H. Beynon (2002), Exploring the Tomato: Transformations in Nature, Economy and Society, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Harvey, M., R. Ramlogan and S. Randles (eds) (2007), Karl Polanyi: New Perspectives on the Place of the Economy in Society, Manchester: Manchester University Press. Hayashi, I. et al. (2006), ‘Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W311’, Molecular Systems Biology, 2006.2007 doi:10.1038/msb4100049. Heller, M.A. (1998), ‘The tragedy of the anticommons: property in the transition from Marx to markets’, Harvard Law Review, 111 (3), 621–88. Heller, M.A. and R.S. Eisenberg (1998), ‘Can patents deter innovation? The anticommons in biomedical research’, Science, 280 (5364), 698–701.

Bibliography

197

Hodgman, T.C. (2000), ‘A historical perspective on gene/protein functional assignment’, Bioinformatics, 16, 10–15. Hodgson, G. (2007), ‘The enforcement of contracts and property rights: constitutive versus epiphenomenal conceptions of law’, in M. Harvey, R. Ramlogan and S. Randles (eds), Karl Polanyi: New Perspectives on the Place of the Economy in Society, Manchester: Manchester University Press, pp. 95–129. Honoré, A.M. (1961), ‘Ownership’, in A.G. Guest (ed.), Oxford Essays in Jurisprudence: a Collaborative Work, Oxford: Oxford University Press. Hood, L.E., M.W. Hunkapiller and L.M. Smith (1987), ‘Automated DNA sequencing and analysis of the human genome’, Genomics, 1, 201–12. Hunkapiller, T., R.J. Kaiser, B.F. Koop and L. Hood (1991), ‘Large-scale and automated DNA sequence determination’, Science, 254 (5028), 59–67. Jeﬀerson, T. (1967), The Jeﬀerson Cyclopedia, vol.1, ed. J.P. Foley, New York: Russel and Russel. Johnson, B., E. Lorenz and B.-A. Lundvall (2002), ‘Why all this fuss about codiﬁed and tacit knowledge?’, Industrial and Corporate Change, 11 (2), 245–62. Joly, P.B. and V. Mangematin (1997), ‘How long cooperation in genomics is sustainable’, in P. Wheale et al. (eds), Science Technology and Human Values, Aldershot: Avebury. Jumas-Bilak, E. et al. (1995), ‘Study of the organisation of the genomes of Escherichia coli, Brucella miletensis, and Agrobacterium tumefaciens, by insertion of a unique restriction site’, Microbiology, 141, 2425–32. Jumas-Bilak, E. et al. (1998), ‘Unconventional genomic organisation of the alpha subgroup of the protobacteria’, Journal of Bacteriology, 180, 2749–55. Kaul, I. and R.U. Mendoza (2003), ‘Advancing the concept of public goods’, in I. Kaul, P. Conceicao, K. Le Goulven and R.U. Mendoza (eds), Providing Global Public Goods, Oxford: Oxford University Press. Kitts, A. and S. Sherry (2002/2006), ‘The single nucleotide polymorphism database (dbSNP) of nucleotide sequence variation’, National Center for Biotechnology Information Handbook, http://www.ncbi.nlm.nih.gov. Kremer, M. (1998), ‘Patent buyouts: a mechanism for encouraging innovation’, Quarterly Journal of Economics, 10 (1), 1–29. Kuhn, T.S. (1962), The Structure of Scientiﬁc Revolutions, Chicago: University of Chicago Press. Law, J. and R.J. Williams (1982), ‘Putting the facts together: a study in scientiﬁc persuasion’, Social Studies of Science, 12, 535–58. Levin, R.C., A. Klevorick, R.R. Nelson and S.G. Winter (1987), ‘Appropriating the returns from industrial research and development’, Brookings Papers on Economic Activity, vol. 3.

198

Public or private economies of knowledge?

Lion Bioscience (1998), ‘LION Bioscience AG appoints SAP founder and managing director from Boston consulting group for its board of directors’, Press release, December, Heidelberg/Berlin. Lion Bioscience (1999), ‘LION Bioscience and Bayer enter US$100M research alliance’, Press Release, Heidelberg (Germany) 24 June. Lion Bioscience (2000), ‘LION Bioscience and Bayer pioneer linkage between chemistry & genomics to speed life science discovery research’, Press Release, Heidelberg (Germany), 16 October. Lion Bioscience and GMD (2001), ‘LION Bioscience supports expansion of Gesellschaft für Medizinische Datenverarbeitung’, Press Release, Heidelberg (Germany) and Munich (Germany), 21 February. Lion and MDL (2001), ‘LION Bioscience AG and MDL Information Systems, Inc. Sign Software and Reseller Agreements’, Press Release, Heidelberg (Germany) and San Leandro (USA), 19 July. Lion Bioscience and Trega (2000), ‘LION Bioscience to acquire Trega Biosciences’, Press Release, Heidelberg (Germany) and San Diego (USA), 27 December. Lipman, D.J. and W.R. Pearson (1985), ‘Rapid and sensitive protein similarity searches’, Science, 227, 1435–41. Maddox, B. (2003), Rosalind Franklin: The Dark Lady of DNA, New York: Harper Collins. Malakoﬀ, D. (2003), ‘Opening the books on open access’, Science, 302, 550–54. Marshall, E. (1994), ‘A showdown over gene fragments’, Science, 266 (5183), 208–10. Marshall, E. (1999), ‘Drug ﬁrms to create public database of genetic mutations’, Science, 284, 406. Marshall, E. (2001), ‘Bermuda rules: community spirit with teeth’, Science, 291 (5507), 1192–3. Marshall, E. (2002), ‘Clear-cut publication rules prove elusive’, Science, 295, 1625. Marx, J. (1983), ‘onc gene related to growth factor gene’, Science, 221, 248. McMeekin, A. and M. Harvey (2002), ‘The formation of bioinformatic knowledge markets: an “economies of knowledge” approach’, Revue D’Economie Industrielle, 101 (4), 47–64. McMeekin, A., M. Harvey and S. Gee (2004), ‘Emergent bioinformatics and newly distributed innovation processes’, chapter 11 in M. McKelvey, J. Laage-Hellman and A. Rickne (eds), The Economic Dynamics of Modern Biotechnologies: European and Global Trends, Cheltenham, UK and Northampton, MA, USA: Edward Elgar. Meldrum, D. (1995), ‘The interdisciplinary nature of genomics’, IEEE Engineering in Medicine and Biology, July/August, 443–8.

Bibliography

199

Meldrum, D. (2000a), ‘Automation for genomics, part one: preparation for sequencing’, Genome Research, 10, 1081–92. Meldrum, D. (2000b), ‘Automation for genomics, part two: sequencers, microarrays, and future trends’, Genome Research, 10, 1288–303. Metcalfe, J.S. (2001), ‘Institutions and progress’, Industrial and Corporate Change, 1 (3), 561–86. Metcalfe, J.S. (2002), ‘Knowledge of growth and the growth of knowledge’, Journal of Evolutionary Economics, 12, 3–15. Metcalfe, J.S. (2007), ‘Innovation systems, innovation policy and restless capitalism’, in F. Malerba and S. Brussoni (eds), Perspectives on Innovation, Cambridge: Cambridge University Press. Michelman, F.I. (1982), ‘Ethics, economics, and the law of property’, in J.R. Pennock and J.W. Chapman (eds), Nomos, 24, 3–40. Mokyr, J. (2002), The Gifts of Athena. Historical Origins of the Knowledge Economy, Princeton: Princeton University Press. Moody, G. (2004), Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine and Business, New York: Wiley. Moore, S.K. (2001), ‘Making chips to probe genes’, IEEE Spectrum, March. Mowery, D.C. and B.N. Sampat (2001), ‘University patents and patent policy debates in the USA, 1925–1980’, Industrial and Corporate Change, 10 (3), 781–814. Mowery, D.C. and A.A. Zeidonis (2002), ‘Academic patent quality and quantity before and after the Bayh–Dole Act in the United States’, Research Policy, 31, 399–418. Mowery, D.C., R.R. Nelson, B.N. Sampat and A.A. Ziedonis (2001), ‘The growth of patenting and licensing by US universities: an assessment of the eﬀects of the Bayh–Dole Act of 1980’, Research Policy, 30 (1), 99–119. Mullis, K.B. and F.A. Falcona (1987), ‘Speciﬁc synthesis of DNA in vitro via a polymerase-catalyzed chain reaction’, Meth. Enzymology, 155, 335–50. National Research Council (1999), Finding the Path: Issues of Access to Research Resources, Washington, DC: National Academic Press. National Research Council (2003), Sharing Publication-related Data and Materials: Responsibilities of Authorship in the Life Sciences, Washington, DC: National Academic Press. Nee, V. and R. Swedberg (eds) (2005), The Economic Sociology of Capitalism, Princeton: Princeton University Press. Needleman, S.B. and C.D. Wunsch (1970), ‘A general method applicable to the search for similarities in the amino acid sequence of two proteins’, Journal of Molecular Biology, 48 (3), 443–53.

200

Public or private economies of knowledge?

Nelson, R.R. (1959), ‘The simple economics of basic scientiﬁc research’, The Journal of Political Economy, 67 (3), 297–306. Nelson, R.R. (1989), ‘What is private and what is public about technology?’, Science, Technology and Human Values, 14 (3), 229–41. Nelson, R.R. (1990), ‘Capitalism as an engine of progress’, Research Policy, 19, 193–214. Nelson, R.R. (2004), ‘The market economy and the scientiﬁc commons’, Research Policy, 33 (3), 455–71. Nelson, R.R. (2006), ‘Reﬂections on “The Simple Economics of Basic Scientiﬁc Research”: looking backward and looking forward’, Industrial and Corporate Change, 15 (6), 903–17. Nelson, R.R. and B. Sampat (2001), ‘Making sense of institutions as a factor shaping economic performance’, Journal of Economic Behaviour and Organization, 44 (1), 31–54. Nelson, R. and S. Winter (1982), An Evolutionary Theory of Economic Change, Cambridge: Harvard University Press. Nightingale, P. and P. Martin (2004), ‘The myth of the biotech revolution’, Trends in Biotechnology, 22 (11), 564–9. North, D.C. (2005a), ‘Capitalism and Economic Growth’, in V. Nee and R. Swedberg (eds), The Economic Sociology of Capitalism, Princeton: Princeton University Press. North, D.C. (2005b), Understanding the Process of Economic Change, Princeton: Princeton University Press. O’Rourke, K. (2006), ‘A Phred for every platform? Next generation sequencers assess quality of data’, GenomeWeb News. OECD (2003), Patents, Innovation and Economic Performance. Conference Proceedings, Paris: OECD. OECD (2006), Creating Value from Intellectual Assets, Paris: OECD. Oliver, S. (2002), ‘To-day, we have naming of parts . . .’, Nature Biotechnology, 20, 27–8. Olson, M. (1998), Testimony of Maynard V. Olson, House Committee on Science, Subcommittee on Energy and Environment, 17 June, http:// house.gov/science/olsen_06-17.htm. Olson, M.V. (2002), ‘The Human Genome Project: a player’s perspective’, Journal of Molecular Biology, 319, 931–2. Ouzounis, C.A. and A. Valencia (2003), ‘Early bioinformatics: the birth of a discipline – a personal view’, Bioinformatics, 19, 2176–90. Pavitt, K. (2001), ‘Public policies to support basic research: what can the rest of the world learn from US theory and practice? (and what they should not learn)’, Industrial and Corporate Change, 10 (3), 761–79. Pel, H.J. et al. (2007), ‘Genome sequencing and analysis of the versatile cell factory Aspergillus niger CBS 513.88.’

Bibliography

201

Pevzner, P.A. (2004), ‘Educating biologists in the 21st century: bioinformatics scientists versus bioinformatics technicians’, Bioinformatics, 20 (14), 2159–61. Polanyi, K. (1957), ‘The economy as instituted process’, in K. Polanyi, C.M. Arensberg and H.W. Pearson (eds), Trade and Market in the Early Empires, New York: The Free Press. Polanyi, M. (1967), The Tacit Dimension, New York: Doubleday. Pop, M., S.L. Salzberg and M. Shumway (2002), ‘Genome sequence assembly: algorithms and issues’, IEEE Computer, 35 (7), 47–54. Poste, G. (1995), ‘The case for genomic patenting’, Nature, 378. Powell, W.W. and J. Owen-Smith (1998), ‘Universities and the market for intellectual property in the life sciences’, Journal of Policy Analysis and Management, 17 (2), 253–77. Pray, C.E. and A. Naseem (2005), ‘Intellectual property rights on research tools: incentives or barriers to innovation? Case studies of rice genomics and plant transformation technologies’, AgBioForum, 8 (2&3), 108–17. Reichardt, T. (1999), ‘It’s sink or swim as the tidal wave of data approaches’, Nature, 399, 517–20. Rice, P., I. Longden and A. Bleasby (2000), ‘EMBOSS: The European Molecular Biology Open Software Suite’, Trends in Genetics, 16 (6), 276–7. Riley, M. et al. (2006), ‘Escherichia coli K-12: a cooperatively developed annotation snapshot – 2005’, Nucleic Acids Research, 34 (1), 1–9. Roa-Rodriguez, C. and C. Nottenburg (2005), ‘Agrobacterium-mediated transformation of plants’, CAMBIA Technology Landscape Paper (http://www.bios.net/Agrobacterium). Roberts, L. (1991a), ‘Gambling on a shortcut to genome sequencing’, Science, 252, 1618–19. Roberts, L. (1991b), ‘Genome patent ﬁght erupts’, Science, 254 (5029), 184–6. Roberts, L. (1992), ‘NIH gene patents, part two’, Science, 255, 912–13. Rosenberg, N. (1992a), ‘Scientiﬁc instrumentation and university research’, Research Policy, 21, 381–90. Rosenberg, N. (1992b), ‘Economic experiments’, Industrial and Corporate Change, 1, 181–203. Schaﬀer, S. (1983), ‘Natural philosophy and public spectacle in the eighteenth century’, History of Science, 21, 1–43. Schuler, G.D. et al. (1996), ‘A gene map of the human genome’, Science, 274 (5287), 540–46. Sender, A.J. (2000), ‘Bioinformatics boom: gone bust’, Genome Technology, 18, 36–42. Shapin, S. and S. Schaﬀer (1985), Leviathan and the Air-Pump. Hobbes, Boyle and the Experimental Life, Princeton: Princeton University Press.

202

Public or private economies of knowledge?

Simpson, A.J.G. et al. (2000), ‘The genome sequence of the plant pathogen Xylella fastidiosa’, Nature, 406, 151–7. Smith, L.M., J.Z. Sanders, R.J. Kaiser, P. Hughes, C. Dodd, C.R. Connell, C. Heiner, S.B.H. Kent and L.E. Hood (1986), ‘Fluorescence detection in automated DNA sequence analysis’, Nature, 321, 674–9. Smith, T.F. (1990), ‘The history of genetic databases’, Genomics, 6, 701–7. Smith, T.F. and Michael S. Waterman (1981), ‘Identiﬁcation of common molecular subsequences’, Journal of Molecular Biology, 147, 195–7. Stephan, P.E. (1996), ‘The economics of science’, Journal of Economic Literature, 34, 1199–235. Stewart, L. (1992), The Rise of Public Science: Rhetoric, Technology and Natural Philosophy in Newtonian Britain, 1660–1750, Cambridge: Cambridge University Press. Stiglitz, J. (1999), ‘Knowledge as a global public good’, in I. Kaul, I Grunberg and M. Stern (eds), Blobal Public Goods – Interntional Cooperation in the 21st Century, Oxford: Oxford University Press. Stokes, D. (1997), Pasteur’s Quadrant: Basic Science and Technological Innovation, Washington, DC: Brookings. Strohman, R.C. (1997), ‘The coming Kuhnian revolution in biology’, Nature Biotechnology, 15, 194–200. Sugden, R. (1989), ‘Spontaneous order’, Journal of Economic Perspectives, 3 (4), 85–97. Sulston, J. and G. Ferry (2003), The Common Thread. Science, Politics, Ethics and the Human Genome, London: Corgi Books. Swedberg, R. (2003), ‘The case for an economic sociology of law’, Theory and Society, 32, 1–37. Tagari, M., J. Tate, G.J. Swaminathan, R. Newman, A. Naim, W. Vranken, A. Kapopoulou, A. Hussain, K. Henrick and S. Velankar (2006), ‘E-MSD: improving data deposition and structure quality’, Nucleic Acids Research, 34, 287–90. Teece, D.J. (1986), ‘Proﬁting from technological innovation: implications for integration, collaboration, licensing, and public policy’, Research Policy, 15, 285–305. Thomas, S.M., M.M. Hopkins and M. Brady (2002), ‘Shares in the human genome – the future of patenting DNA’, Nature Biotechnology, 20, 1185–8. Thorisson, G.A. and L.D. Stein (2003), ‘The SNP Consortium: past, present and future’, Nucleic Acids Research, 31 (1), 124–7. Venter, J.C. (1998), Prepared Statement to US House of Representatives Committee on Science. Venter, J.C. and M.D. Adams (1998), ‘Shotgun sequencing of the human genome’, Science, 280 (5369), 1540–44.

Bibliography

203

Wade, N. (1978), ‘Guillemin and Schally: the years in the wilderness’, Science, 200, 279–82; 411–13; 510–12. Walsh, J.P., A. Arora and W.M. Cohen (2003), ‘Working through the patent problem’, Science, 299, 1021. Walsh, J.P., C. Cho and W.M. Cohen (2005), ‘View from the bench: patents and material transfers’, Science, 309, 2002–3. Waterﬁeld, M.D., G.T. Scrace, N. Whittle, P. Stroobant, A. Johnsson, A. Wasteson, B. Westermark, C.H. Heldin, J.S. Huang and T.F. Deuel (1983), ‘Platelet-derived growth factor is structurally related to the putative transforming protein p 28sis of simian sarcoma virus’, Nature, 304 (5921), 35–9. Waterston, R. and J.E. Sulston (1998), ‘The human genome project: reaching the ﬁnishing line’, Science, 282 (5386), 53–4. Weber, M. (1978), Economy and Society: An Outline of Interpretive Sociology, ed. G. Roth and C. Wittich, Berkeley: University of California. Wilbur, W.J. and D.J. Lipman (1983), ‘Rapid similarity searches of nucleic acid and protein data banks’, Proceedings of the National Academy of Science, 80, 726–30. Wood, D. et al. (2001), ‘The genome of the natural genetic engineer Agrobacterium tumefaciens C58’, Science, 294, 2317–23. Ziman, J.M. (1994), Prometheus Bound: Science in a Dynamic Steady State, Cambridge: Cambridge University Press. Zuhn, D.L. (2001), ‘DNA patentability: shutting the door to the utility requirement’, Marshall Law Review, 34, 973–99. Zuhn, D.L. (2005), ‘Case questions fundamental questions of patentability requirements of nucleic acid molecules’, Patently-O, http://patentlaw. typepad.com/patent/2005/05/post.htm. Zupan, J., T.R. Muth, O. Draper and P. Zambryski (2000), ‘The transfer of DNA from Agrobacterium tumefaciens into plants: a feast of fundamental insights’, The Plant Journal, 23 (1), 11–28. Zweiger, G. (2000), Transducing the Genome. Information, Anarchy, and the Revolution in the Biomedical Sciences, New York: McGraw-Hill.

Index ABI 40, 43, 44, 73 Accelrys 85, 93, 110, 113 Adler, R.D. 48, 69 Aﬀymetrix 39, 70 Agilent 39 Agracetus 142 Agrobacterium tumefaciens 119–20 Campinas University, Brazil 128, 131, 132–3, 138 competition 120–45 DuPont 64, 68, 70, 119, 123–33, 139, 140, 149–52, 168 genetic sequencing 121–2, 123–5, 126–39, 143–5, 149–52 intellectual property rights 123, 125, 127, 128, 129–31, 139–40, 143 knowledge protection 124–5, 126–7, 135 labour, division of 122, 123, 124 Monsanto see Monsanto patents 121, 123, 127–9, 139–44 Pioneer Hi-bred 123–4, 125, 139, 140, 142, 149 public/private knowledge 121, 122, 123–4, 125–45 publication 119, 124–5, 127, 129, 130, 133–9, 145, 152, 163, 172, 175 R&D funding 125, 131, 132, 133, 134, 172 race to complete 127–39, 172 sequence diﬀerences 138–9 Washington University Genome Center 119, 121–38, 140, 150, 151, 152 Altschul, S.F. 76 Amersham Pharmacia 40, 68 Amoco 104, 106, 107, 108 Andrade, M.A. 88 Andrews, L.B. 181 Ansorge, W. 68 Applied Biosystems Inc. 42

Apweiler, R. 63 Aspergillus niger 145–7, 153–4, 175 Astra Zeneca 58 Attwood, T.K. 16, 48, 60, 71, 72–3, 75 Auth, D.R. 54 AXYS Pharmaceuticals 69 Bairoch, A. 60, 61, 63, 81 Baker, S.E. 146 Balmer, B. 2 Basic Local Alignment Search Tool (BLAST) 76, 77, 78–9, 81, 163, 171 division between producers and users 78–9 early development 76–7 at the NCBI 77, 81 polyvalent use 171 similarity searching 76, 77 Bayer 58, 94–5, 96 Bentley, D. 55, 174 Bermuda rules 55–7, 58, 73, 173–4 Binns, A.N. and M.F. Thomashow 121 bio-data and analysis, separation from 43, 163 common public norms of data deposition 44–5 and competition 41, 43, 45–6, 167 data production in bioinformatics 22–3, 24, 73, 75–80, 162 deﬁnition 37–40 DNA structure see DNA structure dual public/private dynamic 40–41, 43–4 economies 83 in genomics see genomics globalization of 38, 39, 173 hardware and software, relationship between 41, 43–4 mirror databases 175 205

206

Public or private economies of knowledge?

private instrumentation developers 39 protein databases see protein databases public domain and bioinformatics 90–93, 97–8, 101, 103, 109, 111–14, 168 publication see publication quality procedures 44–5, 174, 175 shared norms and standards 44 Single Nucleotide Polymorphism (SNP) see Single Nucleotide Polymorphism (SNP) and software development 37–8 speciﬁcity of 39 standards control 66 technology 39, 40–45 biofuels 145 bioinformatics BLAST (Basic Local Alignment Search Tool) see Basic Local Alignment Search Tool (BLAST) case studies 88–117 competition in 74, 90–92, 95–8, 104, 106–7, 110, 112–13, 167 comprehensive analysis systems (one-stop-shop) 83–6, 90, 93–5, 96, 109, 110 copyright law 29 and data production 22–3, 24, 73, 75–80, 162 digitization of 165–6, 173 economies of knowledge 33 EMBOSS tools 93, 110–13, 116, 171, 177 Free Software Foundation (FSF), General Public Licence 111–13, 179–80 Genetics Computer Group (GCG) see Genetics Computer Group (GCG) in-house capabilities 86 institutional context 53, 77, 177 intellectual property rights 85, 86, 94, 96, 102, 109 knowledge ownership 101–4, 111–12 knowledge transfer 76–7, 89, 90, 92, 105, 107–8, 164–5

labour, division of 20, 79–80, 81–2, 94–5, 156, 162–3 Lion Bioscience see Lion Bioscience mergers and acquisitions 84–6, 93–4, 95–6 and microcomputers 78–9, 80–81 National Center for Biotechnology Information (NCBI) 53, 59, 77, 81 and nucleotide sequencing 73, 74, 75, 98 onc gene discovery 78 one-stop-shop 83–6, 90, 93–5, 96, 109, 110 open source model of software development 102–3, 107–8, 109 and patent protection 3, 82 private legal property rights 25 protein sequencing 75–6, 77–8, 80, 81, 88–9 public domain databases 90–93, 97–8, 101, 103, 109, 111–14, 168 R&D funding 84, 86, 99, 102, 104, 106, 107–9, 113–14 research algorithms 72, 73, 74–80, 81, 98, 99–101, 160, 171 resources 15–16 sector redundancies 86 sequence analysis programs 99–101 Sequence Retrieval System (SRS) see Sequence Retrieval System (SRS) sequence similarity searching 75–6, 77, 80, 81 software libraries and applications, distinguishing between 115 software programs 72, 77, 78–81, 83, 88–114, 160, 162, 165–6, 171 source-code, withholding (blackboxing) 109–10, 111–12, 113, 115–16, 177 and technical knowledge accumulation 76–7 tradable databases 24 user licences 101–2, 103, 105, 107, 108–9, 110 bioinformatics tools commercialization of 80–87, 93–5, 96, 97–8, 103–17 deﬁnition of 72–5 economic trajectory of 98–117

Index

207

historical context 75–87, 159, 170–71 market domination 84–6, 93–4 privately appropriated 74 programming routines library 101 public development of 78–9 public/private boundaries 86–7, 103–13, 115, 162–3 biological science legislative changes in patent and copyright 27–8 ‘naming of parts’ 16, 22 see also genomics Biomax 146 BIONET 107 Bioscout 88–9, 90 bipolar disorder 70 Birney, Ewan 45 Blattner 100, 135 Boguski, M.S. 53 Braun, A. 121 Brazma, A. 65 breast cancer 70 Brenner, S. 47 Brenner v Manson 69, 181 Bristol–Mayer Squib 58 Broothaerts, W. 142

competition 31, 90 Agrobacterium tumefaciens 120–45 and bio-data 41, 43, 45–6, 167 in bioinformatics 74, 90–92, 95–8, 104, 106–7, 110, 112–13, 167 and public/private knowledge 167, 168–9, 176 COMPSEQ (later PC/Gene) software 60–61, 63, 83 computational knowledge encryption 28, 44 ﬁrewalls 28, 52, 175 see also software computational science 18, 19, 20 Conﬁrmant Protein Atlas 70 Cook-Deegan, R. 2, 38, 52, 68, 69 copyright GAIN Data Use Certiﬁcate 29, 179 General Public Licence (GPL) 29 Genetic Association Information Network 29 legislative change 27–8 SWISSPROT 62–3 see also intellectual property rights; patents Coronini, R. 57

Calgene 142 Caltech 40, 42 CAMBIA 141, 142 Campinas University, Brazil 128, 131, 132–3, 138 Cannon, G.C. 80 capitalism 2, 17 and knowledge exchange 24, 31 multi-modal economies of knowledge 31, 183–5 and public and private knowledge division 3–4, 7, 10, 16, 156–9, 165, 168 Casari, Georg 89, 117 Caskey, C.T. and A.R. Williamson 2 Celera 25, 41, 62, 66, 68, 74, 83, 87, 168, 175 Human Genome Project 119 and SNP Consortium 59 Cereon 123, 126, 127, 144, 149 chemico-informatics 95, 96 Chilton, Mary-Dell 121, 141, 142 Cold Spring Harbor Laboratory 58

Dasgupta, P. and P.A. David 5, 8, 11, 12, 162, 166 data production see bio-data David, P.A. 5, 12, 13–14, 25, 62, 166 Davis, P.K. 181 Dayhoﬀ, M. 60, 64 Devereux, J. 98, 99, 101, 103, 104, 105, 106, 107, 108, 117 DiscoveryLink technology 96 DNA structure amino acids in 74–5, 75–6, 78 cDNA (Expressed Sequence Tags) 47–55 codons in 74–5, 88, 100 data production 38 double helix reconstruction 41 electrophoresis 42 and mRNA 47 polymerase chain reaction (PCR) 68 property rights eligibility 28, 185 sequencing 68, 75–6, 98–9, 182 Ti-plasmids 141, 143 see also genomics

208

Public or private economies of knowledge?

DNAStar 80, 81 Doll, J.J. 2 Doolittle, R.F. 77 DSM 146–7, 154 DuPont 64, 68, 70, 119, 123–33, 139, 140, 149–52, 168 Dworkin, G. 2 E.coli 130, 131, 135 economic organization experiments 31 Eisen, Jonathan 132, 136, 151 Eisenberg, R.S. 2, 13, 14, 25, 28, 49, 53, 58, 142, 180, 181, 182, 184 Ellis, L.B.M. 39 EMBNet 103, 108 EMBOSS tools 93, 110–13, 116, 171, 177 Etzold, T. 89, 90, 92, 117 Europe protein database takeover 2–3 SWISSPROT see SWISSPROT European Bioinformatics Institute (EBI) 59, 60, 89, 92, 171, 173 European Commission 55 European Molecular Biology Laboratory (EMBL) 40, 43, 81, 88, 89, 90–92, 103 European Molecular Biology Open Software Suite (EMBOSS) 93, 110–13, 116, 171, 177 Expressed Sequence Tags (ESTs) 2, 38, 46, 47–55, 57, 82, 161, 163, 169–70, 174 data restrictions (ﬁrewalls) 52 dbEST 53, 54 knowledge growth dynamic 53–5 maize claim 54 patents 2, 47, 48–54, 179–80, 181, 182 public domain software algorithm (BLAST) 51 public–private divide 47–55, 67 as research tool 51 singletons 54 substantial utility restrictions 54, 55 Unigene database 50, 53 weaknesses in scientiﬁc research 48, 51, 52, 170

Fields, C. 69 ﬁrms competitive advantage and knowledge proﬁtability 28 incentives for knowledge ownership 12, 30 NGO funding 26 university research funding 13 see also organizational dimensions Flattmann, G.J. and J.M. Kaplan 54 Florigene 142 Free Software Foundation (FSF), General Public Licence 111–13, 179–80 fruit ﬂy 74 fungal genomics 145–7, 153–4 GAIN Data Use Certiﬁcate 29, 179 GCG software 98, 171, 176–7 Gelvin, S.B. 121 GenBank 38, 43, 106, 173, 174 Agrobacterium tumefaciens 124, 129, 135, 147, 152 EST database 50, 53, 54, 59 nucleotide database 98 SNPs 58 Gene Ontology Network 22 Genequiz (Bioscout) 88–9, 90, 92 Genesweep 71 Genetic Association Information Network 29 Genetics Computer Group (GCG) 81, 98–116 acquisition by Accelrys 110 commercialization 106–7 copyright 102 diﬀerential fees 101, 102 EGCG 103, 108–9, 110–11 EMBOSS 111–14 ﬁrst publication 99 GCG software 98, 171, 176–7 and Intelligenetics (IG) 104, 106–7 open source 102 Geneva Bioinformatics (GeneBio) 62, 63 Genoﬁt 61 Genome Sequencing Programme, Broad Institute 44 Genomica 84

Index genomics ABI sequencers 40, 43, 44 Agrobacterium tumefaciens see Agrobacterium tumefaciens annotation 46 Aspergillus niger 145–7, 153–4, 175 balkanization of 2, 162, 170 Bermuda rules 55–7, 58, 73, 173–4 clone-by-clone sequencing 41, 42, 43, 181 collaboration and competition 119–54, 174–5 comparative 43 data competition and conﬂict 45–6, 49, 50, 52, 167, 168–9, 170 data quality standards 56, 66–7 database of complete genomes 46 databases 22–3, 29, 37, 38, 40–42, 43, 44, 49, 50, 52–3, 55 DNA see DNA structure Drosophola melanogaster 43 E.coli 130, 131, 135 economic uncertainty 48–9, 51, 53 Expressed Sequence Tags (ESTs) see Expressed Sequence Tags (ESTs) 454 sequencers 44 fruit ﬂy 74 fungal 145–7, 153–4 gene cloning 49 gene constitution disputes 38 genetic mutation 38 high-throughput sequencers 39, 40, 68 Human Genetic Variation Database (HGVbase) 59–60 Human Genome Initiative 105, 106 Human Genome Project 37, 40, 43, 48, 49, 59, 119, 174 Human Genome Sciences (HGS) 50, 52, 54, 82 inﬂuenza virus genome, Haemophilus inﬂuenzae 43 Institute of Genomic Research (TIGR) 50, 52, 54, 56, 132, 136, 150, 151, 152 instrumentation technologies 40–42 intellectual property rights 120, 146 international cooperation 49, 55–6 and molecular biology 121

209

National Human Genome Research Institute 56, 174 nucleic acid sequence data production 38, 40, 41–2, 43, 44, 169 onc gene 78 patents 2, 3, 46, 146, 175, 180, 181 PCR technology 42 peer review, absence of 56 Pioneer Hi-bred 123–4, 125, 139, 140, 142, 149 polymerase chain reaction (PCR) 39, 42 raw data publication 56 scientiﬁc knowledge uncertainty 48, 51, 52, 53, 54 sequencers, new generation 44 Single Nucleotide Polymorphism (SNP) see Single Nucleotide Polymorphism (SNP) statistical reliability tests (‘phred’ and ‘phrap’) 44, 56, 73 tradable databases 24 ‘unﬁnished’ sequence data, separate ﬁling of 56 Unicamp 130, 131 Washington University Genome Center 119, 121–38, 140, 150, 151, 152 whole genome sequencing 54, 122 whole human population 68 whole shotgun method (WGS) 41, 42–3, 44, 74, 169, 173 Xylella fastidiosa plant pathogen 119, 130 yeast projects 49, 119 see also biological science Genset 69 Germany, Human Genome Project 55 Gillis, A.M. 49 GlaxoSmithKline 60 GMD 95 Goldman, B. 2, 138, 142 Goodner, B. 119, 122, 123, 125, 126, 127, 129, 131, 136, 137, 149, 150, 151 government funding 11–12, 25–6, 30, 51, 52, 58, 65, 107–8, 114, 146, 176, 180

210

Public or private economies of knowledge?

Green, P. 42, 44, 68, 73 Grindrod, P. 37 GSK 52, 58, 61 Gudmunder, A.T. and L.D. Stein 59

universities 12, 26, 28 see also copyright; patents IntelliGenetics 61, 81, 83, 104–5, 106, 107, 108, 110

Harvey, M. and A. McMeekin 2, 37, 119 Helicos Biosystems 44 Heller, M.A. and R.S. Eisenberg 14, 142 Hodgman, T.C. 75, 78 Hoechst 58 Hoﬀman–La Roche 58 Hunkapiller, T. 40, 68 Hyseq 69

Japan DNA Databank 59 E.coli genetic sequencing 135 Human Genome Project 55 International Protein Information Database (JIPID) 60, 63 National Database 38 Japan Tobacco 123, 128, 141, 142 Johnson & Johnson 52 Joly, P.B. and V. Mangematin 55, 173 Jumas-Bilak, E. 122

IBI-Pustell 80, 81 IBM 96 IBS 39 iDEA Predictive ADME Simulation System 95 In re Fisher 50, 69, 181 Incyte 50, 52, 54, 62, 66, 83, 87, 168 inﬂuenza virus genome, Haemophilus inﬂuenzae 43 Institute of Genomic Research (TIGR) 50, 52, 54, 56, 132, 136, 150, 151, 152 instituted economic process (IEP) 16–19, 21–5, 29, 45 bioinformatics 53, 77, 177 public and private economies of knowledge 29, 30–34, 156, 157, 158–61, 165, 183–4 transformational process 16–17 Integrated Genomics 146 intellectual property rights 9, 12, 13–15, 27 Agrobacterium tumefaciens 123, 125, 127, 128, 129–31, 139–40, 143 anti-commons 14 bioinformatics 85, 86, 94, 96, 102, 109 commons 13–14 decision-making powers 14, 15 DNA structure 28, 185 genomics 120, 146 legal infringements 27–8 and public/private knowledge 166–7, 178–83

Karp, Peter 136 Kitajima, Joao 131 Kitts, A. and S. Sherry 59 knowledge anti-commons 14, 15, 142 appropriation strategies 6–7 availability timing 23 codiﬁed 8–9, 15, 16, 22 collective sharing 24 commons 13–14 diﬀerentiation 18 distribution 3, 17, 18–19, 21–3 economies, evolution of 5–7, 31–4, 155–85 market appropriation of 6, 21, 23–4, 30 ownership and bioinformatics 101–4, 111–12 production 3, 11–13, 19, 20–22 protection, Agrobacterium tumefaciens 124–5, 126–7, 135 redistributive resource model 25 science versus technology 9–11 tacit 8–9, 16, 22 turbulence in 1–35 knowledge transfer 6–7, 10, 17–18 bioinformatics 76–7, 89, 90, 92, 105, 107–8, 164–5 and capitalism 24, 31 labour, division of 4, 10, 16, 17, 19, 32–3

Index Agrobacterium tumefaciens 122, 123, 124 bioinformatics 20, 79–80, 81–2, 94–5, 156, 162–3 and data production 38 and public/private knowledge 156, 157, 160, 161, 162–72 and specialization 18 language, shared (code formation) 16, 22, 44, 160 legal institutions 25, 27–9 institutions of appropriation 28 turbulence in 27–8 see also instituted economic process (IEP); patents Leiden University 142 Levin, R.C. 6, 28 licences, bioinformatics 101–2, 103, 105, 107, 108–9, 110 life science informatics 96, 97, 98, 110 Lion Bioscience 84, 85, 86, 88–98, 110 acquisition strategy 93, 95 Bayer alliance 94 dedicated bioinformatics ﬁrm 88, 94 failure 96–8 integrated software platform 95, 95 one-stop-shop 93, 96 see also Sequence Retrieval System (SRS) Lipman, D.J. and W.R. Pearson 78, 80 Madey v Duke University 12, 28, 182 Marshall, E. 2, 25, 48, 52, 53, 55–6, 57 Martinsreid Institute for Protein Sequences (MIPS) 60, 63 Marx, J. 78 Max Planck Institute 142 MDL 95 Meldrum, D. 40, 41, 43 Merck 2, 26, 50, 53, 55, 57, 67, 69, 84 Merck v Integra Lifescience 182 Microarray Gene Expression Data Society 65, 173 MicroGenie 80 Micromolecular Structure Database 65 Millennium Pharmaceuticals 69 Monsanto 50, 54, 68, 119, 121, 123 Agrobacterium tumefaciens 119, 121, 123–31, 133–42, 149–52, 168, 172, 181

211

with Hiram College 123–31, 132, 135–7, 138–9, 152 Moody, G. 38, 43, 60, 68, 69, 70, 74, 75, 76 Mowery, D.C. 12, 26 MRC 50 Mullis, K.B. and F.A. Falcona 39, 68 Myers, Gene 74 National Biomedical Research Foundation (NBRF) 60 National Center for Biotechnology Information (NCBI) 53, 59, 77, 81, 112 National Institute of Health 2–3, 29, 37, 42, 48–52, 54, 55, 63, 64, 105, 174, 179–81 National Science Foundation (NSF) 119, 129, 131, 132, 133–4, 138, 139, 150, 151, 172 NDJB 43 Needleman, S.B. and C.D. Wunsch 76, 99–101 Nelson, R.R. 2, 5, 7, 8, 9, 11, 12, 13, 26, 31, 53, 155, 160, 180, 184 Nester, Gene 121, 122, 123, 136, 141, 149, 152 Nestlé 94, 96 NetGenics 95–6 NGO funding 26 Nightingale, P. and P. Martin 86 Novartis 58 Nusbaum 44, 68 Olson, Maynard 1, 38, 133, 138, 150, 151 onc gene 78 organizational dimensions 12, 17, 19, 21 knowledge appropriation 28, 30–31 see also ﬁrms O’Rourke, K. 44, 68 Ouzounis, C.A. and A. Valencia 75 Oxford Molecular (OM) 110 Patent Cooperation Treaty 50 patents Agrobacterium tumefaciens 121, 123, 127–9, 139–44 and bioinformatics 3, 82

212

Public or private economies of knowledge?

buy-out 180 data 6, 46 and economic incentives 27 ESTs 47, 48–54 Expressed Sequence Tags (ESTs) 2, 47, 48–54, 179–80, 181, 182 GAIN Data Use Certiﬁcate 29, 179 genomics 2, 3, 46, 146, 175, 180, 181 infringement disputes 28 international laws 49 and knowledge communication 22, 24, 27 legislative change 3, 27–8 natural and artiﬁcial knowledge, distinguishing between 69 Reach Through Licensing Agreements 52 rules of full disclosure 15 SNPs 57, 58, 59 Statutory Invention Registry (SIR) 58, 179, 180 universities 26, 28 see also copyright; intellectual property rights Paulsen, Ian 132, 136, 151, 152 Pavitt, K. 7 Pevzner, P.A. 79 Pﬁzer 50, 52, 58 Pharmacopeia 110 Pioneer Hi-bred 123–4, 125, 139, 140, 142, 149 Polanyi, Karl 8, 16, 25 polymerase chain reaction (PCR) 39, 42 Poste, G. 53 Powell, W.W. and J. Owen-Smith 26 Pray, C.E. and A. Naseem 141 private knowledge see public/private knowledge property rights see intellectual property rights Protein Databank (wwPDB) 65, 173 protein databases 19, 22, 24, 29 COMPSEQ (later PC/Gene) software 60–61, 63, 83 EMBnet support 61 global 39, 60, 63, 64–5, 131 International Protein Information Database of Japan (JIPID) 60, 63

Martinsreid Institute for Protein Sequences (MIPS) 60, 63 microarray chips 68 partial cross-subsidization 62–3 Protein Information Resource (PIR) 60–61, 63, 64, 106, 107, 173 public/private access 61, 62–3, 64–5, 67 quality norm development 65, 66–7 sequencing 38, 75–6, 77–8, 80, 81, 88–9 software and boiological, separation of 61, 65 SWISSPROT see SWISSPROT TrEMBL database 63 UniProt 39, 60, 63, 64, 131 US competitive advantage 64 public/private knowledge Agrobacterium tumefaciens 121, 122, 123–4, 125–45 bio-data 40–41, 43–4 biologizing of economy 159 boundaries of bioinformatics tools 86–7, 103–13, 115, 162–3 and capitalism 3–4, 7, 10, 16, 156–9, 165, 168 and competition 167, 168–9, 176 copyright law 29–30 database access 161 distinguishing characteristics, lack of 160 distribution and appropriation 162–9, 172–7 Expressed Sequence Tags (ESTs) 47–55, 67 formal appropriation institutions 177–83 historical diﬀerentiation 159–61 and innovation 165, 168 instituted economic process (IEP) 29, 30–34, 156, 157, 158–61, 165, 183–4 see also instituted economic process (IEP) and intellectual property rights 166–7, 178–83 key debates 4–15 knowledge economies, evolution of 31–4, 155–85

Index and labour, division of 156, 157, 160, 161, 162–72 polyvalence of 21, 164, 165, 168, 169, 171, 175, 180, 185 production and use 162–72 protein databases 61, 62–3, 64–5, 67 public rights over knowledge resources 25 see also knowledge publication 12, 13, 22 Agrobacterium tumefaciens 119, 124–5, 127, 129, 130, 133–9, 145, 152, 163, 172, 175 Aspergillus niger genome 147 Celera breach of normal codes 25 databases 44 journal publication rules 29, 39, 167 peer review 29, 39, 173 and premature data release 56 research algorithms 77, 81, 98 Qiagen 146 Rice, P. 111, 117 Roberts, L. 48, 49, 69 Rosetta Inpharmatica 84 Rutgers University 59 Sanger 44, 68 SAP 84 Schneider, Reinhard 89, 117 science balkanization of 14 privatization of 12 quadrants (Stokes) 10, 11, 16 scientiﬁc commons 14, 15, 25 Searle 58 Sender, A.J. 86 Sequence Retrieval System (SRS) 89–93, 95, 96, 97, 171, 176 academic and commercial access 92–3 acquisition by LION 90–92 European Bioinformatics Institute (EBI) 92 European Molecular Biology Laboratory (EMBL) 88–9 one-stop-shop 93 origins 88–9 producer–user diﬀerentiation 171

213

Setubal, Joao 119, 130, 135, 136, 138, 150, 151, 152 Simpson, A.J.G. 119 Single Nucleotide Polymorphism (SNP) 2, 38, 46, 58–9, 169–70, 174 non-exclusivity of rights of use 58 patents 57, 58, 59 public appropriation of database 57–60 sequencing 73, 74, 75, 98 SNP Consortium Ltd 57–9, 180 SNPEntrez 59 SKB 50, 53, 54 Slater, Steve 123, 126, 129, 131–4, 136–7, 138–41, 143, 150 Smith, L.M. 42, 68 Smith, T.F. and M.S. Waterman 76, 101 Smithies, O. 98–9 SmithKline Beacham 52, 82 software bioinformatics 72, 77, 78–81, 83, 88–114, 160, 162, 165–6, 171 development and bio-data 37–8 encryption 28, 44 ﬁrewalls 28, 52, 175 and hardware, relationship between 41, 43–4 libraries and applications, distinguishing between 115 open source model development 102–3, 107–8, 109 source-code, withholding (blackboxing) 109–10, 111–12, 113, 115–16, 177 Solexa Bustard sequencer 44 SRI International 136 Staden 44 Stallman, Richard 111 Stanford Human Genome Centre 58, 70 Sulston, J. 1, 2, 38, 40, 43, 44, 48, 68, 69 Sweden, Karolinska Institute 59–60 SWISSPROT 60, 61, 91, 173 copyright 62–3 funding problems 61–2, 64 public domain, return to 63, 64, 65 Syngenta 121, 141, 142

214

Public or private economies of knowledge?

Tagari, M. 65, 173 technology quadrants (Stokes) 10, 11, 16 Teece, D.J. 6, 28 Thomas, S.M. 181 Thorisson, G.A. and L.D. Stein 58 Tomb, J.-F. 126, 130, 131, 132, 149, 151 Trega 95 UK Biotechnology and Biological Science Research Council 113 European Informatics Institute 38 Medical Research Council 55, 113 Unicamp 130, 131 Unigene 50, 53 UniProt 39, 60, 63, 64, 131 United States Bayh–Dole Act (1980) 3, 12, 26, 39, 174, 181 GenBank see GenBank Genes to Life (GTL) programme 145–6 Madey v Duke University 12, 28, 182 National Institute of Health 2–3, 29, 37, 42, 48–52, 54, 55, 63, 64, 105, 174, 179–81 National Research Council 25 National Science Foundation 119 patent guidelines, revised 181 Protein Information Resource (PIR) 60–61, 63, 64, 106, 107, 173 public science policy 63 University of Toledo 142 Wisconsin Genetics Computer Group (GCG) 81, 98–116 universities access rights 25–6 Campinas University, Brazil 128, 131, 132–3, 138 data production prototypes 40–41 dry science 19 government funding 25–6 intellectual property rights 12, 26, 28 knowledge resource competition 23–4 Leiden University 142

Madey v Duke University 12, 28, 182 NGO funding 26 patents 26, 28 research funding 13, 181–2 research groups, new ﬁrms from 24 Rutgers University 59 Toledo University 142 Washington University Genome Center 119, 121–38, 140, 150, 151, 152 Washington University School of Medicine (St Louis) 57–8 see also individual research and universities Upjohn 50 USPTO 50 Varmus, Harold 48, 50, 51 Venter, J.C. 1, 2, 42, 43, 47–8, 50, 52, 54, 56, 68, 69, 136 venture capital funding 24 Walsh, J.P. 182 Washington University Genome Center 119, 121–38, 140, 150, 151, 152 Washington University School of Medicine (St Louis) 57–8 Waterﬁeld, M.D. 77 Waterston, R. and J.E. Sulston 40, 43 Watson, James 42, 48 Wellcome Trust 55, 111 Whitehead Institute for Biomedical Research 57 Wilbur, W.J. and D.J. Lipman 77 Wood, Derek 119, 129, 130, 133, 135, 137, 139, 149, 151, 152 Xylella fastidiosa plant pathogen 119, 130 yeast projects 49, 119 Zuhn, D.L. 49, 54, 69, 181 Zupan, J. 121 Zweiger, G. 1, 38, 54, 68, 69, 70