This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Topics in Australasian Library and Information Studies Series editors: Dr Ross Harvey and Dr Stuart Ferguson This series provides detailed, formally refereed studies of a wide range of topics and issues relevant to professionals and para-professionals in the library and information industry and to students of library and information studies. All titles are written from an Australasian perspective, drawing on professional research and experience in Australia, New Zealand and the wider PaciÞc region. Proposals for publications should be addressed to the series editors ([email protected]; [email protected]). Number 20 Research methods for students, academics and professionals: Information management and systems. 2nd edition Kirsty Williamson et al. Number 19 Collection management: A concise introduction John Kennedy Number 17 Australian library supervision and management. Abridged student edition Roy Sanders Number 15 Organising knowledge in Australia: Principles and practice in libraries and information centres Ross Harvey Number 14 The other 51 weeks: Marketing for Australian libraries Lee Welch Number 13 A most delicate monster: The one-professional special library Jean Dartnall Number 12 Disaster recovery for archives, libraries and records management systems in Australia and New Zealand Judith Doig Number 9 Libraries in Australia Peter Biskup, with Doreen Goodman
INFORMATION MANAGEMENT A consolidation of operations, analysis and strategy
Michael Middleton
Topics in Australasian Library and Information Studies, Number 18
Centre for Information Studies Charles Sturt University Wagga Wagga New South Wales
National Library of Australia cataloguing-in-publication data _____________________________________________________________ Middleton, Michael R. (Michael Robert), 1947- . Information management : a consolidation of operations, analysis and strategy.
Bibliography. Includes index. ISBN 1 876938 36 6. 1. Information science. I. Charles Sturt University. Centre for Information Studies. II. Title.
Copy editor: S. Ferguson, Charles Sturt University Layout: B. Martin, HoneyBee Graphic Design M. Taylor, Reprographics Unit, Charles Sturt University Cover design: M. Turner Text processing: J. Sims
Centre for Information Studies Locked Bag 660 Wagga Wagga NSW 2678 Fax: (from Australia) 02 6933 2733 (international) 612 6933 2733 Email: [email protected] http://www.csu.edu.au/cis
Contents
............................................................... List of figures ................................................................................................................................. i Preface ...........................................................................................................................................v Acknowledgments ........................................................................................................... vii
Examples of information organisation Information management tasks associated with records Contemporary information management applications Levels of information management Proportions of U.S. information professionals Areas of information work Organisations that employ information professionals Websites of international information associations IS’97 body of knowledge IRMA/DAMA model curriculum Data-wisdom continuum Data plus context = Information Representation of Shannon communication model Adapted communication model Signs of the times Records life cycle Information transfer life cycle Information management cycle Leavens’ ‘econometric papers tabulation’ Groos curve Pareto curve Retrieval contingency table
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9
Communication contingencies Organisational intelligence needs Enterprise responsibility for information Environmental scanning Scanning modes Dialog Blue Sheets - part of introduction Thomas Register Online World Translations Index SearchEzee search engine site
Developments in publication A cascaded compound document A business form Part of a screen-based form Copy prooÞng Numbering a document Font examples Illustration examples for print documents ScientiÞc publication cycle A position for electronic publishing Document produced for compact disk Network publishing Electronic journal Procedural markup Document form and structure Descriptive markup SGML declaration example Document Type DeÞnition example Website with source script
The posthorn symbol postmark for a postage stamp Wired communication Wireless communication Open Systems Interconnection Layers with TCP/IP correspondence Connection mechanisms Character set from ISO 646 Latin component of UNICODE Extract from newsgroup display EDIFACT Bibliographic interchange format Data Þelds within ISO 2709 based upon CCF MARC example Example OAI record A record describing a compact disk Naming of the same subject in different data libraries Type classiÞcation method used by Philips Extract from List of Permitted AspectTerms File organisation Sequential organisation Indexed sequential organisation Direct organisation Flat Þle representation of software description Database organisation from network model First normal form of ßat Þle Second normal form Third normal form An OODM with attributes, methods, and exclusive subclasses Examples of ISBD for cartographic materials Standard class description elements for archives DTD for a communication called Note Examples of UNTDED Tags USMARC computer Þle record Dublin Core elements and qualiÞers
Extract from AGLS implementation of Dublin Core using RDF syntax Cyrillic transliteration Word processor indexing Presentation graphics points & images Page makeup screen Project management screen Personal manager Personal referencing software Spreadsheet software Database software
8.7
IRDS levels Active repository and database system IRDS Features Example of dictionary deÞnition procedure Authority record for Shakespeare based upon ABN authority format Authority record from USMARC record for authority data Australian National Archives records
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10
Index extracts from cookbooks Index entries The indexing process Cindex™ cross reference veriÞcation screen Image indexing ICONOCLASS indexing example Extract from Dialog Bluesheet for ERIC Extract from Dialog Bluesheet for SCI Custom paint query screen from QBIC Abstracts
10.1 10.2 10.3
Organising animals Taxonomic classiÞcation for koala Extract from 1997 U.S. NAICS Codes and Titles Records and information management classiÞcation scheme for Þling Extract from U.S. Occupational Safety & Health Administration classiÞcation LC ClassiÞcation Scheme outline of main classes
LC ClassiÞcation extracts Extract from Dutch Electronic Subject Service site using classiÞed arrangement ERIC thesaurus record for the term ‘computation’ Example from ISO Standard 2788-1986 Extract from MeSH vocabulary as mounted on OVID Partial term display from online MeSH Schematic for rule-based system The camera in a frame Manual card retrieval systems Information retrieval schematic The search process Ways of expressing logical operations Database guide extract from Dialog Web for Conference Papers Index Inverted Þle for ‘Paroo River’ Use of the Dialog ‘expand’ facility with the ERIC thesaurus Soundex transformations IBM’s QBIC image query interface Relational database tables for employees with languages spoken Example of ORACLE teletext Example of videotext page numbering layout Sample search engine comparisons Arabic calligraphy Fonts Output format options for the SCI database from Dialog Spreadsheet representation The Minard map as redrawn by Schneider Schematic example (signal delay effect) Example of organisation chart KWIC extract from Biological abstracts
12.9 12.10 12.11
Extract from Physics abstracts subject index A permuted index display from OVID’s MeSH Comparison of different sequencing rules for same data
13.1 13.2 13.3 13.4 13.5 13.6 13.7
Command driven interface Question and answer interface Pull down menu Form completion via an Internet site Association applied in an encyclopaedia VISIMap screen Preferred colour combinations
14.1 14.2 14.3 14.4
Digital resource data types Digital information stakeholders Appraisal criteria examples Extract from Texas State Records Retention Schedule Extract from Conspectus Report Extract from retention schedule for a petroleum exploration company IESC: ‘BeneÞts of the application of electronic ofÞce technologies’. IESC: ‘Essential features of a corporate electronic document’
14.5 14.6 14.7 14.8
15.1 15.2 15.3
15.4 15.5 15.6 15.7
Organisational structure and networking Characteristics of scientists and engineers related to information use How the features of information characteristics vary for different management levels Possible critical success factors information requirements Extract from survey form for a loans service point Librarian mediation levels Extract from AskERIC Q&A digital reference request form
iv INFORMATION MANAGEMENT
16.1 16.2
17.5 17.6 17.7 17.8 17.9 17.10
ER modelling associations Flowcharts Decision tables Data Flow Diagrams Soft systems methodology Object relationship diagram
18.1 18.2 18.3
IRDS evaluation schema Checklist for HCI evaluation Website evaluation
18.4
Data management evaluation
20.1
Management pyramid
16.6 16.7 16.8 16.9 16.10
Knowledge growth stages and forms Summary of some Þndings in Information as an asset Highly rated non-Þnancial factors contributing to share price Example application from Taylor’s valueadded model Examples of IREs that may be used by different divisions IRE inventory example Information quality dimensions IRE evaluation Audits for intellectual resources Impact analysis of information risks
17.1 17.2 17.3
A general view of a system System Development Life Cycle outcomes A rich picture application from SSM
Planning matrix for information policy implementation
16.3 16.4 16.5
Preface ............................................................... Information management is a term that has been used by a number of professions to describe all or some of the procedures in their respective disciplines. Its interpretation differs because of the varied circumstances, practices and levels of application in these disciplines. It has become further confused because of the emergence of knowledge management as a business concept. In the business environment, the analysis of speciÞc business processes as information systems has led to a wider recognition of information as a resource. This view has been promoted in some quarters of management education. There has been a reorientation of teaching approach from management information systems technology, to management of information in general, so that business practice reßects this as information management. This accepts that information though enhanced by technology, may be considered separately from it. Enterprises demonstrate this philosophy by building information plans into their strategic planning processes, and by establishing organisational structures to manage information. The separation of information from technology has been interpolated to include recognition of the intellectual capital in enterprises, and coining of the term knowledge management to embrace both unexpressed knowledge and recorded information. In the computing environment, the roles of database administration and data administration have been differentiated for some time. As software development has turned increasingly to maintenance and integration of existing systems, the proÞle of data administration has become more prominent. Information technology managers have increasingly had to account for technology implementation within the wider information planning framework that data administration can support. Identifying an enterprise’s information resources and deÞning them within an information repository, is a key information management practice that is a basis for quality systems. In the library environment, librarians have designed and used computer systems to manage and disseminate print-based information, the so-called library housekeeping subsystems. As the amount of information in computer-based form has multiplied, and access to it, particularly through the Internet, has improved, they have initiated systems that deal with organisation and integration of access to information sources inside and outside libraries. They call this process information management. Records managers and archivists are facing similar challenges in controlling information that is created and used predominantly within enterprises. Computer-produced records have intensiÞed the need for a managed framework within which creation, distribution, retention and retrieval of business records takes place. The process is information management.
vi INFORMATION MANAGEMENT
Distinctive systems for organisation and retrieval of information have long been applied in disciplines that have emphasised the subject content of what is being managed. Systems include those that manage geographic information, health records, and legal research. Each is a speciÞc application of information management. The divisions between each of these environments have been blurred by the application of information technology to their processes, and by the scope and utility of the resulting information systems.
Organisation of the material This work is organised in four parts that emphasise what the environments have in common in order to consolidate the province of information management. PART A:
Factors that shape the meaning of information management - the people who work in the area follow similar principles that are being enunciated by a ßedgling science of information, and Þnd that these principles are strongly inßuenced by the culture of an enterprise in which they are applied, and the customs of the distinctive user groups who seek information and knowledge.
PART B:
An operational approach – in applying principles, information professionals must all concern themselves with information about information - the organising structures, Þnding aids, classiÞcation and retrieval systems, that make their respective information systems useable.
PART C:
An analytical approach – in determining enterprise and user information and learning requirements, information professionals must apply techniques for assessing available information, and systematically providing information services.
PART D:
A strategic approach – in order to foster effective utilisation of information and knowledge resources, a planning framework must be fostered that aligns information services with that of an enterprise’s objectives and resourcing, and works effectively within constraints imposed by the broader regulatory and business environment.
A Glossary explains information management terminology. The bibliography integrates references given throughout the text with suggested further readings listed at the end of chapters.
How to make use of this text The work brings together both through its structure and examples, the principles that are common to the different disciplines that manage information. In so doing, it encourages a more extensive outlook on the Þeld, and aims to enlighten both tyros and information specialists, by showing the commonality of the practices and challenges across the range of interpretations of information management. It provides a compendium of the Þeld, for the beginning student entering the practice or related disciplines, who needs an introduction to the concepts and applications of information management. At the same time, through examples and suggested readings, it endeavours to open the door to further study in the area. As a consequence, the work may also support more advanced courses that are able to use its structure and content as a guide and preparation for expounding further on subject matter introduced in individual chapters.
Preface vii
In this respect, it should also beneÞt practicing professionals who regard themselves as information or knowledge managers, by showing the relationship of their own work to converging disciplines in the same area. It may thus help with a fresh understanding of the Þeld, showing the concepts and practices applicable to a range of disciplines. Undergraduate students If you are commencing study of information management you will Þnd this work useful for establishing concepts and giving an overview of the Þeld as it has evolved in recent years. It provides a lead in to the principles and practice and introduces you to specialised areas that you may subsequently pursue in depth. If you are taking a course in business or in information technology and studying some information management as part of it, you will obtain an overview of the Þeld that enables you to understand the issues of concern to information professionals with whom you will work on entering employment. In either case, the information resources and methods that are described in the text should provide a good springboard to improving the information skills that will assist you with continued learning throughout you career, or in other disciplines that you may pursue. This means that the work also has the objective of developing your information literacy. Graduate students and researchers The interdisciplinary nature of information management means that it is necessary to have a comprehension of principles and practice outside your immediate skills. Certain chapters in the work will provide this for you, and the bibliographic reviews at the end of sections should provide a lead in and support for research work in the area. The integrative nature of the work should beneÞt your perspective of information management, and the work is useful as a reference for speciÞc aspects of the Þeld. Instructors The work may be used as a supporting text for an introductory course in information management from either a business perspective or an information technology perspective. It is seen as primarily for use in introducing material that leads to a major in the area by establishing concepts, and demonstrating the main principles and practice. Material from individual chapters may be developed further in more focussed modules later in a course with the help of specialised texts. There it may be retained for reference to concepts and complementary readings. It may be used to impart a substantial overview of the scope of information management to those who are not majoring in the area, but are subsequently expected to have some working contact with information professionals. It can therefore also be used as a support text in service modules dealing with information management required by another discipline, providing that it is supplemented by cases relevant to that discipline. Practitioners and Information Professionals Some parts of this text should already be working knowledge for you. However, other parts, outside your immediate area of specialisation should introduce you to a wider range of information methods than you are presently applying. You should therefore see information problems from a wider interdisciplinary perspective.
viii INFORMATION MANAGEMENT
Standards The nature of the Þeld is such that much standards work has been undertaken to enable data sharing between enterprises. Reference to many of the relevant standards is made. In many cases these standards have been developed nationally, but if an international standard exists, and if it has been adopted to some extent, then it has been used for examples, assuming that a national equivalent can sought out by reference to a local concordance.
References The work contains a lengthy bibliography. Much of the material in it is referred to more than once within different chapters. Each chapter concludes with a section on further reading. Material referred to in these sections is given full reference in the bibliography. An indication is given in the list of where each reference has been made within the text. Reference is made to both print and electronic material. In the case of electronic material that is referenced on the Internet, every effort has been made to refer to locations that are viable. However, it is the nature of the Internet that some of the material may no longer be accessible at the time of reading. In cases like this, the reader will need to use Internet resource discovery facilities to update links to the appropriate material.
Acknowledgements I am most grateful to a number of associates and friends who assisted with encouragement and review of this work in progress. Thanks are due to colleagues from a number of Schools where I was able to spend periods of time whilst preparing the manuscript: University of Technology Sydney Department of Information Studies; University of Pittsburgh School of Information Sciences; University of British Columbia, School of Library, Archival and Information Studies; Loughborough University Department of Information Science; and University of Strathclyde Department of Information Science. In each case, I beneÞted from the support and interest of many members of staff and their students. I am also appreciative of staff from Charles Sturt University who supported my aspirations for this work, as well as giving considerable assistance in its preparation. My thanks too, are due to the referees for their perceptive and constructive comments. Particular thanks are due to the School of Information Systems at Queensland University of Technology, which provided me with the resources to complete this work, and to the staff of the School, and QUT’s Centre for Information Technology Innovation for their interest, input and support.
Michael Middleton Brisbane, Australia June, 2002
For Frank and Paddy
Part
A
............................................................... PART A of the book works on the basis that the scope of information management remains indistinctly defined, and therefore introduces some factors that are instrumental in shaping its meanings: Chapter 1:
Introduction
Chapter 2:
The people who work in the field
Chapter 3:
Their research and study areas
Chapter 4:
The organisations in which information is managed
CHAPTER 1
1
Introduction ............................................................... Technology is so much fun but we can drown in our technology. The fog of information can drive out knowledge
Daniel J Boorstin, NY Times 8 Jul 1983
1.1
A context
When we communicate with each other, the state of our knowledge changes. This learning process may be from our ad hoc experiences, or in more formal environments, where we like to record in some manner the information that is being communicated. So much gets written down, in so many different ways, that we need to set up formal ways of managing it. If those ways are effective, then information helps, rather than hinders knowledge creation. If we analyse what we do when we communicate, then one way of looking at the information transfer that occurs is to consider the form of transferring medium. What agent is carrying the information? The communication may be either direct as in personal discussions or broadcasts or telephone conversations, or it may be indirect via a medium of record such as a letter, a book, a tape recording or a computer disk. The direct forms may always be used indirectly too, by being recorded for later re-use. Recorded communication works better if the document is structured in a manner familiar to users. Books have chapters; computer disks have Þles with standard extension names. Even email has some structure, if only from a title and sender name. By way of contrast, the direct form will normally have much less structure, and may not require organisation for use by individuals. Nevertheless, in order to be put to most effective use in an institutional setting such as a business, both the direct and recorded forms need to be managed. In this book information management is taken to be the organisation of the institutional processes necessary for use of information, as well as organisation of the information itself for effective communication - whether directly or in recorded form. Therefore, management deals both with the processes for planning and implementing the provision and use of information resources, as well as the techniques for conÞguring information in its many recorded forms. This is in search of outcomes such as improved decision making, knowledge gathering, education and cultural support.
4 PART A Overview
On a daily basis, we encounter information management being used to simplify communication. We are able to consult a telephone directory that has been organised into alphabetical order of names, or make sense of signs in a shopping centre where icons have been used according to a convention of symbols, or read a bus timetable, or select items from a menu when it has been arranged in a systematic fashion whether in a restaurant or on a computer screen. In each case, the organisation is a consequence of information management. Consider for example Figure 1.1, which illustrates some prominent historical examples of organisation of information. Figure 1.1: Examples of information organisation Figure 1.1a: The Rosetta stone, named after the town of Rashid (Rosetta to the English) was located by French troops near the western arm of the Nile during Napoleon’s Egyptian campaign of 1799. The broken black basalt stone became a spoil of the British in 1801 and made its way to the British Museum. It is inscribed with an honorific of 196 BC to the Pharaoh Ptolemy V that concludes with the resolution that it be inscribed in hard stone in the sacred (hieroglyphs), native (demotic) script and Greek letters. Because this was done, twenty centuries later hieroglyphics were decipherable (ironically by a Frenchman, Champollion, in 1822). This image is of a replica at http: //www.usask.ca/antiquities/Collection/Rosetta_Stone_1.JPG by permission of the Museum of Antiquities, University of Saskatchewan
Figure 1.1b: The Pioneer plaques designed by Carl Sagan were carried aboard the US space probes Pioneer 10 and 11. These were the first earth-launched vehicles to go beyond the asteroid belt to the outer solar system. Information was organised in an attempt to provide an indication of where the earth is, and the appearance of humans on it (in case anything out there were interested). (image made available by NASA at http:// grin.hq.nasa.gov/IMAGES/MEDIUM/GPN-2000001623.jpg) Figure 1.1c: A 1612 world map by Ortelius. Among the impressive early ‘geographic information systems’ were the maps produced by European cartographers. (reproduced with permission from http: //www.heritageantiquema ps.com)
Figure1.1d: A schematic of the Washington DC transport system, the ‘Metro’, condensing a complex system into a representation that must be understood by many (reproduced with permission from Washington Metropolitan Area Transit Authority)
CHAPTER 1 5 Introduction
These are examples of information management that we take for granted, perhaps unconscious of the extent to which the information is organised for interpretation. It can get more complicated. The syntax of a programming language, the notation of a musical composition, the prescription of a drug or the codes for plays in a football game are more specialised forms of information organisation for interpretation by specialists trained in the respective Þelds. However the specialists often take information for granted too, and regularly to our cost. This was examined on a broad scale some time ago in a collection of ‘information disasters’. The severity of consequences such as the Three Mile Island nuclear meltdown, the cultural disintegration of an Australian aboriginal tribe, the German invasion of the Soviet Union in 1941 and the Stock Exchange crash of 1987 may have been alleviated to an extent, if the information available had been managed more appropriately. Horton and Lewis (1991) drew our attention to this by soliciting and reviewing a number of analyses of the situations described. They decided in many cases the protagonists were either uninformed, misinformed, disinformed or, if they were informed, then not able to Þt the information into preconceived stereotype, value systems, belief systems or attitudes. Their examples are on a grand scale. However, similar examples of un-, mis-, and dis- information, are repeated continually in microcosm, perhaps during a dispute between neighbours over the siting of a boundary, or the lack of substantiating references in an essay. The communication process is often impaired by recourse to incomplete or incorrect information. Because we live in a world where we are ‘collapsing the information ßoat’,1 dealing with and making effective decisions based upon the large amounts of information that we have at our disposal is a pressing problem. A key Þeld of study is the one that can Þnd ways of using effective information organisation and management processes to limit information ßow to an amount that is relevant and can be digested. Contributions to the study of such processes have been made in many Þelds of endeavour. The study of direct communication has been the province of linguists, psychologists, educators and others. The study of the indirect or recorded form of communication through documentation has often been more dependent upon context: records management for Þles and records in ofÞces, archives administration for stored historical records, librarianship for repositories of published documents, museology for description of museum collections and more recently data administration for computer records and scientometrics for scientiÞc publishing. These studies have improved our understanding of information transfer processes, and have lately been given more urgency by the growing movement towards recognition Þrst of information, then knowledge, as a resource in business enterprises. This has occurred concurrently with the diminution of information processing differences in separate environments of application. There is little difference in the maintenance and use of databases for knowledge creation, whether they be databases on human resources, machine components or library books. The organisation of the recorded form in any one of these contexts may be regarded as a component of information management in a business or institutional environment. An enterprise will regularly expect its management and staff to make use of information systems such as: -
1
Inventory control Records management Human resources and personnel Production control Publishing Sales and marketing
This term was used in Naisbitt’s book Megatrends (Futura, 1984) to describe the issue of communications technology markedly reducing the time a message spends on a channel between a sender and a receiver.
6 PART A Overview
-
Library and online and news services Marketing and sales performance Geographic and demographic analysis.
Each of these systems should be designed so that the information communicated through them is created, disseminated and presented to the users in an optimal manner for the beneÞt of the enterprise. The information gathering and maintenance processes that produce databases to support the procedures have regularly been factored into budgets as overheads. However, now that long established processes have attained greater prominence by implementation using information technology, the databases and the services that they are based upon are increasingly being regarded as resources. Business writers have recognised the qualities of information and knowledge as resources. They have increasingly espoused the need for management of these resources as a necessary element of the administrative process. They have regularly done so without reference to substantive techniques other than data analysis of business processes, and with minimal reference to operational methodology. Conversely, these operational techniques have regularly been implemented by analysts, data administrators, librarians and records managers, but often without emphasis on the value and substance that they provide for business practice – the quality in quality systems. Management may therefore be unimpressed by the extent of overheads that may be showing no obvious beneÞts for a business. The different disciplines, often working independently, have developed their own jargon and principles for comprehending similar procedures. However, many of these may be consolidated as a result of the convergence of processes induced by the developments in information technology. This book endeavours to bring together these various contexts for the mutual beneÞt of their practitioners and draws together ideas on the process of information management that have been articulated in different disciplines. It introduces a Þeld of endeavour, but at the same time may be used by those who are working in the Þeld to act as a companion, should they wish to place their understanding in a broader context. Each chapter may therefore be regarded as providing guidance on history and principles, which may be extended by reference to associated readings.
1.2
On the record
It could be said that information management is only required because we have a habit of recording what we do in many ways, be it on a clay tablet or a CD. For the recorded information to be useful, it often needs to have been organised. We should be cautious however, about the extent to which the recording preserves the veracity of the information, as was the Þctional Dr Braithwaite: ‘What happened to the truth is not recorded’.2 Disciplines have developed to deal with various types and applications of documentation. There are records managers for corporate memory exempliÞed in policies and decisions on paper, data administrators for information repositories, cataloguers for libraries, and curators for museum objects as historical records and educational items. Rayward (1996) has suggested that each of these disciplines has differentiated itself as a profession with a distinct character based on historically determined commitments to different technologies, media of communication and record, and primary client groups.
2
Julian Barnes, Flaubert’s Parrot. Picador, London, 1985, p. 65
CHAPTER 1 7 Introduction
The various professions have certainly carved out their respective niches. However we should recollect that early examples of information management were archives that did not distinguish internal corporate information from published information, and that did not have to worry about different client groups. The collections of antiquity whether they were on the clay tablets of Assyrians, or the bamboo and paper of the Chinese in what we now recall as libraries, were repositories of both administrative and expository information, and documented ideas. There was then no distinction between an archive and a library. These antecedents have been succeeded by a range of contemporary tasks that may also be described as information management, but in terms of job titles are usually called anything but information management. Figure 1.2 itemises some examples of information management tasks of today.
Figure 1.2: Information management tasks associated with records.
8 PART A Overview
Management is regularly said to consist of the mechanisms of planning, organising, coordinating, commanding and controlling. Is information management a matter of applying these processes to information? It certainly involves planning, organising and coordinating information, assuming that these may be interpreted to include establishing corporate information policy, analysing for user needs and arranging operational information tasks. The commanding and controlling are part of information management too, but as management processes relating to information management personnel, rather than the information itself.
1.3
Precursors
Managing of information has been happening for years without our calling it information management. Consider some of the institutional environments in which this has formally been taking place.
1.3.1
In the beginning...
Two institutions that have always Þgured prominently in managing information are the State and the Military. The State has always had a need to manage information. Governance by monarchies has been associated with archives since antiquity. An early archive, well known for the extent of source information that it provided scholars is that of the Assyrian king, Assurbanipal. The clay tablets inscribed during the 7th century BCE contain a wealth of organised information. Administrative records, deeds, correspondence, religious tracts and the like have proved a rich resource for scholars in later centuries from the site at Ninevah.3 Four centuries later, the early Ptolemys took a wide enough outlook to establish both an archive and directives for a collection of all Hellenistic literature. This was accomplished by a variety of means including copying, conÞscating scroll cargoes and purloining borrowed copies (setting some precedents that have been followed to this day). This was at the great library at Alexandria.4 Papyrus records from the Þrst and second century preserve the term biblifela meaning keeper of the archive. The Han dynasty in China during the Þrst century BCE was known for its organisation of administrative material, but collecting and organising material already had a long history in China by then. Inscribed bones from two millennia BCE unearthed in recent times may have been from archival collections, in which they accompanied bamboo and wood records that have long since perished. Not so the stone libraries of Buddhist text from the seventh century BCE – Lao-Tse, founder of Taoism is possibly the earliest recorded archivist, serving during the Chou dynasty around 600 BCE.5 Collection and organisation of records was followed periodically by their destruction in both East and West. Alexandria was sacked on more than one occasion, and the Ch’in emperor ordered the burning of books adverse to the regime. 3
Now across the river from Mosul, Iraq.
4
Recreated in an international effort as the Bibliotheca Alexandrina . Perhaps this is following the sentiments of the Pharaoh, Ozymandius (Rameses II) who had, according to Greek historian Diodorus Siculus, inscribed at the portals of his library at Thebes between Thoth the god of wisdom and Shesheta the scribe goddess, ‘Dispensary of the soul’.
5
Among Lao-Tse’s aphorisms from The Way of Lao-Tse are: ‘People are difÞcult to govern because they have too much knowledge’ and ‘To know what you do not know is best. To pretend to know when you do not know is a disease’ Perhaps these thoughts originated with his experience as an information manager.
CHAPTER 1 9 Introduction
Information management has been within the province of the military ever since the procedures of military strategy were formalised. Among the earliest known tracts on military strategy is that of a contemporary of Confucius, Wu Sun Tzu. His Ping Fa (the Art of War) has since the 5th century BCE been extended and modiÞed by succeeding generations of warrior-scholars in China. It remains inßuential in both Eastern and Western military strategy. Ping Fa pays signiÞcant attention to the need for military intelligence. While some of this is covert, most is commonly available information that requires collection, organisation and analysis - just as is the case with defence intelligence and business competitor intelligence systems today.
1.3.2
More recently...
Today’s writers in the different disciplines, have examined the more immediate antecedents of information management, and its derivative, knowledge management, at length. Usually, they are trying to establish deÞnitions and to come to terms with information management in the contemporary environment of government and business. Contributions of a number of recent writers in this area are mentioned at the end of the chapter. Most make reference in some way to the inßuence of activities that are seen to be seminal to information management as we know it at present. These are as follows: •
Management theory Since the advent of computer systems in business, management theorists have attempted to absorb the information system into business models. Initially the technology was treated simply as a tool for carrying out business processes such as accounting and inventory control. With the technology permitting integration of the processes, there has been a much greater focus on the information carried by the technology, its rationalisation, and strategies for using it in ways beyond existing processes. Information is seen as a resource that needs to be managed like labour, capital and property. More recently, attention has been paid, not only to the information codiÞed in documents and in computers, but to the knowledge within the personnel of enterprises and their understanding of processes. It is questionable whether this knowledge can be managed in situ. However, human resources management is concerned to get the knowledge from where it is, and utilising information management, to disseminate it for organisational learning.
•
Records management The management of internal records such as correspondence and accounts and policy documents for organisations entered the computer age with the development of Þnding aids that replaced manually produced registers of these documents. With ofÞce automation, the documents themselves are now produced in digital form. Records managers and archivists who are the ultimate custodians of the information are presented with the considerable challenge of storing and retrieving integrated repositories of paper, optical digital and magnetic digital information. These integrated document management systems are being implemented as an exercise in information management.
•
Librarianship Understanding of internal library processes has now advanced to the design of systems that integrate acquisitions, cataloguing information retrieval and circulation subsystems. Online retrieval is now routine. However, there continues the signiÞcant information management challenge of providing effective and coordinated retrieval from large numbers of databases both internal to libraries and
10 PART A Overview
available through networks. Libraries are now faced with the challenge of managing a window to the Internet from their own Þnding aids, along with a window to the libraries from the Internet. •
Information systems management As transaction processing systems have been extended to provide management information, analysts have had to come to terms with the complexity of providing simplicity! That is, the simplicity of information sought by management. The need for data administration in order to coordinate an enterprise’s information description has become more prominent as a requirement for underpinning decision support and executive information.
•
Technological convergence Convergence is often used to describe the removal of the division between computer and telecommunications technology, increasingly referred to as information and communications technology (ICT). It is also used to describe the way in which digital technology has turned what were formerly distinct communication processes into ones that share the same channel. A simple example is the use of the telephone system for both direct communication by voice and message sending of records by telefacsimile. Convergence has had signiÞcant effect on the performance of work in enterprises. For example, where typing, internal mailing and scheduling of meetings were once handled by separate staff on behalf of their supervisor, they will often now all be handled on a desktop workstation by that supervisor (hopefully leaving some time for what the supervisor is actually employed to do!). Convergence has also meant that what were formerly types of document distinctive to data administrators, archivists, records managers and librarians, are in some cases becoming common types of document, leading to convergence of their roles in managing these documents.
•
Legislation Governments have been grappling with the regulatory environment appropriate for digital information. They have been trying to develop and maintain the principles relating to such matters as protection of intellectual property, freedom of access to government information, privacy of personal records held by enterprises, requirements for retention of documents by enterprises, and the transfer of information across national borders. Consolidating legislation and making it effective is a complex exercise for information management. Its application in organisations requires a clear understanding of the regulatory obligations of both public and private sector organisations within the business community.
•
Information Science Whether there is a science of information, remains contentious. What is certain is that there are many researchers in diverse disciplines ranging from psychology to engineering, from computing to sociology, all trying to further the understanding of information transfer processes. This endeavour may produce some fundamental principles that can be applied to communication (explored in Chapter 3). As this understanding improves, so will its application to the practice of information management.
The advent of the Internet provides us with a ready window to many information management tools and applications. These are explored in some detail in later chapters. Figure 1.3 shows some examples of contemporary applications on the World Wide Web (WWW, which henceforth will be referred to as the Web).
CHAPTER 1 11 Introduction
Figure 1.3: Contemporary information management applications
Figure 1.3a: Street directory lookup http://www.whereis.com.au/; image reproduced with permission of Pacific Access Pty Ltd.
Figure 1.3b: Community information service front page http://www.escis.org.uk/
Figure 1.3d: Consolidated access to reference material http://www.xrefer.com/
CHAPTER 1 13 Introduction
1.4
Levels of information management
DeÞnitions of information management given in the literature vary according to context. For example, Taylor and Farrell (1992) talk in terms of existential, operational and hybrid-manager deÞnitions. Information management can be viewed simply as: The process of managing the information needs of an organisation.
Or from an epistemological viewpoint advanced by Cronin and Davenport (1991) as: The utilisation of codified knowledge (symbols, patterns, algorithms) to produce formal representations of information entities, which allow the automation of transaction processing, decision making and information retrieval.
Much of the writing that endeavours to deÞne information management in recent years has either confused or not identiÞed the different levels of business process at which information management takes place. The same ambiguities have arisen with the knowledge management movement, initially because of a lack of distinction between data, information and knowledge, and then because of the situation in which it is to be managed. In other words, are we talking about the details of operational procedures, investigation and structuring of an enterprise’s knowledge framework, or planning for utilisation of knowledge as a resource? It seems that if knowledge is to be regarded as something that is manageable, then handling it requires cognisance of organisational culture and practice, and that sharing, codiÞcation, learning and applying knowledge must be understood within a contextual business model that requires management of information and human resources for knowledge creation. Management of business processes is often described as being at operational, tactical and strategic levels. Diener (1992), while not exactly following this characterisation, delineates technical, analytical and strategic domains of information management. These may alternatively be described as the procedural, assessment, and administrative aspects. In the Technical or narrow operational sense, the following descriptions may be used: -
The organisation of personal or corporate records
-
Procedures such as indexing, classiÞcation, Þling and cataloguing, that are used to provide access to collections of documents or to other recorded forms of information ranging from historical archives to digital imagery
-
Control of the description of an organisation’s data through use of a data dictionary
-
Use of techniques such as collocation and abstracting and of tools such as software packages for storage and retrieval of collected information
-
DeÞnition and maintenance of databases that support business analysis
-
Selection, organisation, control, analysis and dissemination of information by an intermediary for an end user
-
Analysis and reduction of information into surrogate form, and organisation and presentation of this form for re-interpretation
-
Structuring and indexing a Þle of lessons-learned to support knowledge transfer
-
Design and maintenance of an enterprise information portal on an intranet.
In each of the preceding deÞnitions, the emphasis is on technique, methodology and procedure. They have in common a requirement for metainformation - the information about information that helps to organise the information that is of concern to the person who will ultimately use it. For example, in a database dealing with description of property for a geographical information system, the information of concern to the ultimate or end user is the description of the size of a property, its value and so on. The metainformation
14 PART A Overview
is the names and deÞnition of the data elements that contain the property information, and the search protocols necessary for retrieval of that information. In the Analytical sense, the emphasis is on assessment and evaluation, for example: -
Studies of information needs and use by particular groups
-
Production of information resource inventories
-
Determining the requirements of information services and systems
-
Conducting a knowledge audit to determine where knowledge resides in an enterprise and how it may be transferred.
These processes have in common the fact that they are not carrying out operational information management, but are identifying what needs to be carried out, how and why it should be carried out, and to what end with particular reference to those who are going to use it. If one approaches the concept from a wider business-oriented framework, one Þnds that the operational and analytic approaches are addressed, but that emphasis is more on planning, management and administration. To take a Strategic approach: -
The administration of all manual and automated data, and of all methods used for the communication, manipulation and presentation of information used in the course of doing business
-
Establishing a learning culture based upon effective recording and communication of knowledge assets, and associating these with external information sources
-
A fundamental managerial discipline founded on the conviction that both public and private sector organisations must treat information as a resource, in a manner similar to Þnancial, physical, human and natural resources
-
Development of strategy and policy for information handling
-
A means of promoting organisational effectiveness by enhancing the capabilities of the organisation to cope with the demands of its internal and external environments in dynamic, as well as stable conditions; this includes two dimensions: i. Managing the information process so that the knowledge resources of the organisation, are utilised effectively for organisational decision making ii. Ensuring that the various types of data an organisation uses and the various ways that data are handled and processed can support the needs and demands of the information process.
The British Government’s Central Computer and Telecommunications Agency provides an example that takes into account this delineation by levels. In addressing the role of information management in government departments, it characterises the underlying questions to be addressed by the tasks of information management (CCTA 1990). These have been adapted and included in a table in Figure 1.4 to illustrate the correlation with the identiÞed levels. TASK
LEVEL
Determining a department's business aims and objectives
Strategic
Determining information needed to support those aims
Analytic
Identifying information available in a department
Analytic
Establishing differences between needs and provision
Analytic
Ensuring processes that match needs with provision
Technical
Identifying best means of provision
Analytic, Technical
Considering means of further exploitation of information
Strategic
Figure 1.4: Levels of Information Management
CHAPTER 1 15 Introduction
1.5
Further reading
Beginning in the 1980s there has been a lot of writing that tries to deÞne what information management is, and what has led to it. Horton and Marchand have tried to explain it in detail, coming from a North American perspective of dealing with information as a resource in both government and commercial enterprises. Among their many writings, Horton (1985), Marchand and Horton (1986), and Marchand (1985) provide overviews. A comparable approach, but with an English perspective, is presented by Wiggins (Wiggins 1988), who conceptualises information management using diagrammatic representations of relationships within an organisation and tabulates the contribution of specialists to particular activities. Cronin has collected and published much seminal material on what constitutes information management (Cronin 1992). He has also written extensively and inßuentially on the subject himself. In a relatively recent integrative work (Cronin & Davenport 1991), information management is seen to rely on codiÞed knowledge to produce formal representations of information entities that facilitate information processes. Taylor and Farrell (1992) consolidate this framework, and claim that there is a growing perception that information management identiÞes, coordinates and exploits information entities in an organisation for the purpose of using the characteristics of that information to achieve greater value from existing information resources and to gain competitive advantage. The terminology used to conceptualise the Þeld has been examined in some depth (Boaden & Lockett 1991; Trauth 1989), and it has been explained as the application of information science (Greer 1987; Diener 1992). Davis (1995) has considered business information systems and adopted a framework similar to that of this book, in that he considers them within the framework of what he terms operational, tactical and strategic levels of management. However his emphasis is more on systems and their support for business processes, rather than dealing with stages of information transfer and the metainformation that supports them. The book is presented in the context of an employee progressively working up through tasks at the different levels. It gives examples of applications of productivity tools such as spreadsheet software to the management process. There are many writings, such as English (1996), that promote information management in terms of utilising a business resource. A Þlm that does the same thing, but which is bolstered by substantial analysis of the processes necessary for doing this is Information resource management (1990). More recently, business has found knowledge to be a more in-vogue resource. The intellectual capital of an enterprise is seen to comprise both what is recorded and what is tacit. Understanding the management of this intellectual capital, has occupied a great many authors, among the more inßuential being Boisot (1998), Choo (1998b), Davenport and Prusak (1998) and Liebowitz (1999). A compilation by Srikantaiah and Koenig (2000) also helps to spell out alternative approaches to managing knowledge as a resource. Websites that provide links to detailed material in the area include American Productivity and Quality Center (2002), Brint (2002b), and David Skyrme Associates (2002).
CHAPTER 2
2
The information professions ............................................................... Since the information society concept took hold, and now that so many people are said to be knowledge workers, we could say that that a large proportion of the working population is doing information management for at least part of their work. They always were. Now, with the pervasiveness of information technology, a signiÞcant proportion of the workforce undertakes computer-assisted information management, and has the capacity to use tools such as email, databases and search engines in support of knowledge use. However, the concern here is to focus on the people whose work is principally information management. In the introductory chapter, several references were made to the different disciplines that are involved. This chapter explores how the people who are working in the disciplines have deÞned themselves into professions, what these professions stand for, and how they inßuence the body of knowledge with which they work. The territories that are mapped out by the disciplines overlap markedly. Typical of approaches to deÞning a discipline in broader terms, is that of Cook (1993) who sees records management as a discipline that aims: a. To understand and control the information collected or generated by an organisation so that b. All appropriate information required for the conduct of business is acquired, made available to the people who need it, and recorded in suitable systems, and c. The most valuable core of the resulting records is available in the long term. Cook (1993, p. 27) is at pains to distinguish the agents that carry information from the information itself, whilst maintaining that it is the concern of the records manager both to control the ways in which information is stored and to control the ways in which the documents themselves are disposed or stored. His perspective is the practice of records management; however similar generic descriptions have been employed by those amplifying on information management from other disciplinary bases such as systems analysis, librarianship and knowledge management.
2.1
The work of information professionals
The work of the information professions is cause for continued re-examination. It is yet to be articulated in a form that the different groups might endorse as one. However there are areas of interest common to all of the groups, and it is these that tend to deÞne the employment environment.
CHAPTER 2 17 The information professions
A seminal work of analysis of this environment was conducted some years ago. It involved an extensive survey of U.S. information professions (Debons et al. 1981). At the time, it estimated there were about 1.64 million information professionals working in the USA. Although the Þgures are now dated, it is worthwhile examining the categories deÞned and the proportion falling within each, as shown in Figure 2.1. In this table the data have been restructured to provide a tabulation showing proportion by rank. The estimated percentage of the total information professionals as deÞned in the survey is listed beside each heading.
Figure 2.1: Proportions of U.S. information professionals (based upon data of Debons et al. 1981)
18 PART A Overview
The remaining proportion of information professionals was either performing other information functions or could not be categorised from the responses. These deÞnitions formed the basis for a survey to determine employment in these areas in the United States. The descriptions of these areas have been modiÞed somewhat in following years by researchers attempting to reÞne the description of the professions. No doubt the proportions employed in respective areas have changed since that time and will vary from country to country. However the headings adopted still provide a useful overall description of what information professionals do.
2.1.1
Jobs, jobs, jobs
At around the same time of the Debons and other early studies, there were attempts by professional associations in the area to elucidate the range of information work that was within their domain. For example the American Society of Information Science and Technology (ASIST), at the time known as ASIS, identiÞed a range of information work and organisations that employed information professionals (Spivack 1982). Some examples of such analysis sought to include a wide purview that includes such employment as public relations and advertising. Others have endeavoured to focus more speciÞcally on employment that is management of information involving organisation and provision of access to the information for use by those who require it. There are job names that occur frequently for information professionals, but often they are very generic and do not convey the range of roles and duties that are required, for example: -
Computing services ofÞcer
-
Information manager
-
Information ofÞcer
-
Business analyst
-
Information architect
-
Information services ofÞcer
-
Knowledge manager
-
Librarian
-
Professional ofÞcer.
An extraordinary range of labels has been applied within the employing organisations to the many roles in information work. Everyone works with information to some extent, but many jobs concentrate on information utilisation. To adopt a relatively narrow focus, one may exclude those whose focus is the interpretation of information such as journalists and lawyers, in order to concentrate upon those who act as information intermediaries. These intermediaries take existing data and knowledge and organise it in a form to be used by others. Their titles may be any combination of: data
administrator
document
analyst
information
With
manager
knowledge
ofÞcer
records
specialist
CHAPTER 2 19 The information professions
Area of work
Examples of tasks
Collection management
• • • •
Accounts and subscription budgeting and maintenance Digitisation programs Identification and acquisition of information materials Physical repository management and maintenance
Education & training
• • • •
Course design Development of guidance instructions for software Information literacy programs User instruction
Evaluation
• • •
Cost- benefit determination of information resources Interface effectiveness determination System efficiency study
Information control
• • • •
Data administration of dictionaries and repositories Data quality monitoring Websites maintenance Portal structuring, design and development
Information dissemination
• • • •
Current awareness programs Environmental scanning Facilitation of knowledge sharing Gatekeeping and mentoring
Information policy and planning
• • •
Corporate objective setting Public policy formulation Strategic resource planning
Information retrieval
• •
Database searching Resource discovery
Knowledge interpretation
• • • •
Abstracting Annotation Business reporting Translating
Knowledge organisation and representation
• • • • • •
Cataloguing Classification Indexing Information architecture for Websites Metadata design and assignment Spreadsheet design and manipulation
Management
• • • •
Information centre management Information and knowledge resource mapping Project management Regulatory administration – copyright, privacy, standards
Marketing
• • • •
Focus group analysis Market research Promotion Software demonstration
Presentation
• • • •
Editing and markup Forms creation Interface & report output design Publishing
Product development
• • • •
Commissioning information products Manual preparation Online help facilitation Repackaging disparate sources
Reference work
• • • •
Digital reference Help desks Interviewing Reference source searching
Authority file maintenance Quality identification and assurance Thesaurus maintenance
Systems analysis and design
• • • • •
Analytical tools application Computer-assisted analysis & design cycle Distributed systems integration Interviewing System maintenance & testing
User need analysis
• • •
Community of practice understanding Ergonomics Use studies
Figure 2.2: Areas of information work
20 PART A Overview
The range of requirements of any one of these may vary markedly. It is necessary to examine some of the tasks that may be deÞned within the role. Examples of these are shown in Figure 2.2. Any one of the above titles may require a mix of a number of these duties. Most enterprises, once they go beyond the size of a small business, require information professionals. Many small businesses in the information broking or Website design areas consist mostly of information professionals. Figure 2.3 tabulates examples of industries that employ signiÞcant numbers of information workers.
Figure 2.3: Organisations that employ information professionals
It can be said that the areas of work lack focus. Many subsequent studies have conÞrmed the diffuseness of the employment sector for such work, sometimes in colourful language – the ‘heartland’ (traditional jobs in established institutions), the ‘hinterland’ (information work utilising traditional skills, but outside the traditional institutions, or requiring adaptation) and the ‘horizon’ (software engineers, telecommunications managers, and the like) (Cronin, Stifßer & Day 1993). The term multimodal is sometimes used to describe the tasks carried out, and one description that has gained some currency during this period is that of the ‘hybrid’ information worker. This is to convey the idea of a person who has had education in both information management and a subject discipline such as biology or psychology and who is an information specialist focusing in the subject discipline.
2.2
Professionalism and professional societies
A professional association normally is created by a group of individuals who are concerned about the effective application of a deÞned Þeld of expertise and in particular with: -
The development of expertise in the knowledge of the Þeld through publication and meetings
-
The maintenance of a standard of conduct by members of the profession including monitoring of acceptable practice
-
Education, for those entering the profession in terms of appropriate practice within the prevailing paradigm and for the general public with respect to acceptable practices
-
Representation of the interests of the profession in the wider community including political forums and the press.
CHAPTER 2 21 The information professions
2.2.1
Professional responsibility
These concerns are usually spelled out in detailed policy statements relating to professional responsibilities and ethics. For example, among the criteria of the Association for Computing Machinery (1992) are: -
General moral imperatives including avoiding harm to others, being honest and trustworthy, honouring property rights and conÞdentiality, and respecting privacy
-
Specific professional responsibilities such as:
-
·
Striving to achieve the highest quality in process and products
·
Acquiring and maintaining professional competence
·
Knowing and respecting laws pertaining to professional work
·
Honouring contracts, agreements, and assigned responsibilities
·
Improving public understanding of computing and its consequences
·
Accessing computing and communication resources only with authorisation.
Articulation of social responsibilities of members of organisational units
·
Managing personnel and resources to design and build information systems that enhance the quality, effectiveness and dignity of working life
·
Acknowledging and supporting proper and authorised uses of an organisation’s computing and communication resources
·
Articulating and supporting policies that protect the dignity of users and others affected by a computing system.
Some other Associations have broadly similar aims but have an orientation more towards like-minded enterprises than individuals, in which case they delineate the responsibilities of institutions that are part of the Association. In many cases, the Associations express their criteria with respect to a member’s responsibilities towards the general public, clients, and other members of the profession. They may have a code of ethics or a statement of professional responsibilities. Typical of the responsibilities required of information professionals are: -
That opinions expressed on professional subjects should be based upon honest belief and knowledge
-
That conÞdentiality of personal data should be respected, that privacy be maintained, and professional skills not be used to intrude on the rights of members of the public
-
That conÞdentiality of information obtained in the course of professional services without the consent of the person or parties concerned be maintained unless there is a legal requirement to divulge it
-
That the intellectual property of others be respected, and proper acknowledgment made if referring to it for the purposes of establishing ideas
-
That the property of others obtained or developed by the member during employment, contract or other means not be used for commercial gain without reference to the other parties
-
That information providers have the right to know what will become of the supplied information - the principle of informed consent
22 PART A Overview
-
That any personal Þnancial interests be shown when providing evaluation, statements or commentary about information processing or management, or recommendations about supply of goods and services
-
That any business interests that may inßuence professional judgment be divulged to client or employer, and in general, conßicts of interest be avoided
-
That professional skills be used diligently, competently and carefully
-
That acquisition of material for collections of information or access to material through interfaces used by the general public be provided with due regard for community values
-
That promotion of goods or services be carried out without deception
-
That the terms of service to a client and the fee basis be made clear to a client before a service is undertaken, and that an endeavour is made to match client expectations
-
That new work for personal beneÞt be undertaken with proper regard for the interests of existing clients or employers
-
That client’s information resources be managed effectively so that information is transmitted in the right form to the people at the right time in a cost-effective manner
-
That the good standing of the profession be maintained, and that views representing the group be expressed only with its authority
-
That laws, contracts and licence agreements be observed.
The last mentioned point, adherence to laws, normally includes statutory requirements in relation to matters that are not speciÞc to information management, for example contracts or company law. Expectations in areas that are more speciÞc to information such as privacy, and intellectual property, must also be formulated with reference to any existing legislation. The principles underlying legislation are considered in Chapter 21.2.
2.2.1
International societies
The number of professional societies that are active in the information professions is evidence of the diversity of approaches to information work. At an international level, many of the societies have come into existence since the midpoint of the twentieth century - an indication of the inßuence of information technology on the Þeld. It is of interest to look at what they see as their scope of operation. Considered here are a number of international groups, many of which have national and local sections. Their policies and promotional materials broadly indicate the ßavour of the range and scope of their interests. They are also representative of the many organisations operating at national level, particularly in the computing and library spheres, that have similar aims. These examples in the main exclude speciÞc subject areas such as chemical information or health information for which there may be interest groups within the organisations listed, or in some cases, independent organisations. •
American Society for Information Science and Technology The American Documentation Institute was created in 1937 and became ASIS in 1968 and ASIST in 2000. Although based in North America there is wide international afÞliation among its 4,000 or so members.
CHAPTER 2 23 The information professions
It is interdisciplinary in intent and endeavours to reduce the gap between disciplines and to link the research that drives, and the practices that sustain, new developments. Its membership includes specialists from such Þelds as computer science, linguistics, management, librarianship, engineering, law, medicine, chemistry, and education. These members share a common interest in improving the ways society stores, retrieves, analyses, manages, archives and disseminates information, coming together for mutual beneÞt. Its mission is to advance the information sciences and related applications of information technology by providing focus, opportunity, and support to information professionals and organisations, with a vision expressed as establishing a new information professionalism in a world where information is of central importance to personal, social, political, and economic progress by:
•
-
Advancing knowledge about information, its creation, properties, and use
-
Providing analysis of ideas, practices, and technologies
-
Valuing theory, research, applications, and service
-
Nurturing new perspectives, interests, and ideas
-
Increasing public awareness of the information sciences and technologies and their beneÞts to society.
Aslib: the Association for Information Management Aslib evolved from the Association of Special Librarians and Information Bureaux and is based in London, but has about 2,000 corporate members worldwide who use its information services and consultancy. It also has individuals as afÞliates in special interest groups (SIGs) in speciÞc subject areas such as materials, biosciences and technical translations. It helps and advises organisations ranging from small business to large corporations and government on information management issues and problems through consultancy, publications, training and recruitment, and promotes best practice in use of information resources, including:
•
-
Stimulating awareness of the beneÞts of good management of information resources and its value
-
Representing and lobbying for the interests of the information sector on matters and networks which are of national and international import
-
Providing a range of information related products and services to meet the needs of the information society
Association for Computing Machinery The ACM was founded in 1947 as a society of the computing community. It has about 80,000 members internationally. It dedicates itself to the development of information processing as a discipline, and to the responsible use of computers in their diversity of applications for advancing the arts, science, and engineering. It tries to serve both professional and public interests by fostering the open interchange of information, by promoting high professional and ethical standards, and curricula for education. The ACM has many SIGs that promote these aspirations through conferences, publications and other communications. Particularly relevant in the information management area are SIGs concerned with computer-human interaction, systems documentation, information retrieval, knowledge discovery in data, management information systems, management of data, and hypertext and hypermedia. For example the SIG for management information systems, together with groups such as the Society for
24 PART A Overview
Information Management and the IFIP, founded ISWorld Net, which provides information services for this professional community through the Internet. The ACM maintains a digital library of its many publications for the beneÞt of members. •
AIIM International This began as the National MicroÞlm Association in the USA in 1943. It focuses on document management systems and technologies, their interoperability and standards development. It produces several publications, and continues to help users connect with suppliers who can help them apply document and content technologies to improve their internal processes.
•
Association for Information Systems The AIS was founded in 1994 as a global organisation for those working in academia who specialise in information systems. It endeavours to create an identity for IS academics, to provide a voice to speak for the Þeld, and to develop a vision and enhance communication among members. It is one of the Þve professional societies that support IS WorldNet and supports the International Conference on Information Systems.
•
Association of Independent Information Professionals This is an international association of owners of information businesses. Members provide such information services as online database searching, market and industry surveys, document delivery, library services, general research services, public records research, thesaurus building, indexing and abstracting services, digital library development, competitive intelligence, and specialised research in speciÞc subject areas. AIIP was founded in 1987 and has about 750 members internationally. Its objectives are:
•
-
To advance the knowledge and understanding of the information profession
-
To promote and maintain high professional and ethical standards among its members
-
To encourage independent information professionals to assemble to discuss common issues
-
To promote the interchange of information among independent information professionals and various organisations
-
To keep the public informed of the profession and of the responsibilities of the information professional.
ARMA: Association of Information Management Professionals ARMA was formed in 1955 and is an international association of over 10,000 records and information management professionals. Its mission includes the advancement of records and information management as a discipline and a profession; the organisation and promotion of programs of research, education, training and networking; support of the enhancement of professionalism of members and promotion of cooperative endeavours with related professional groups.
•
Data Management Association International DAMA International is an association of technical and business professionals dedicated to advancing the concepts and practices of data and information resource management. It promotes the understanding, development and practice of managing information and data as a key enterprise asset.
CHAPTER 2 25 The information professions
Its objective is to help practitioners become more knowledgeable and skilled in their profession by deÞning and clarifying the roles of information and data resource management, educating corporate management by demonstrating how information and data asset management affects corporate performance, co-sponsoring regional and international conferences and symposia dealing with data practices and theories, providing a focal point for addressing issues relating to information and data resource practices, and establishing academic and professional certiÞcation programs for the DRM/ IRM professional. •
Information Resources Management Association IRMA’s primary objective is to assist organisations and professionals in enhancing the overall knowledge and understanding of effective information resources management speciÞcally by:
•
-
Promotion and encouragement of the association among individuals with an interest in the Þeld of management of information resources
-
Provision of resources, assistance, encouragement and incentives to individuals planning to be or already engaged in IRM in order to enhance professional knowledge on IRM issues and trends.
-
Promotion and publication of professional and scholarly journals such as Information Resources Management, Journal of End User Computing, conference proceedings and other IRM publications
-
Presentation of seminars, conventions, and other educational opportunities for association members and individuals or organisations interested in IRM
-
Provision of professional and educational services to IT management personnel.
International Federation for Information and Documentation (FID, the Fédération Internationale d’Information et de Documentation) Although at the time of writing, FID appears to be moribund, it is mentioned here because of its long history in the Þeld. The FID came into being as the Institut International de Bibliographie (IIB), as a result of a resolution of the Conference Internationale de Bibliographie, assembled by Otlet and La Fontaine in 1895. The two major interests of the founders of the IIB were the development of the Decimal Classification and the Universal Bibliographic Repertory. The two were developed interdependently, and the former resulted in the subsequent development of the Universal Decimal Classification (UDC). Until 2001, it was an association of institutions and individuals who were developing, producing, researching and using information products, information systems and methods, and were directly or indirectly involved in the management of information. FID programme activities aimed to promote, through international cooperation:
•
-
research and development in information science
-
information management and documentation
-
improvement of all the various processes covering the entire life cycle of data, information and knowledge.
International Federation for Information Processing IFIP is a multinational federation of professional and technical organisations (or national groupings of such organisations) concerned with information processing, IFIP was founded in 1960 under
26 PART A Overview
the auspices of UNESCO. From any one country, one organisation that must be representative of the national activities in the Þeld of information processing can be admitted as a full member. In addition, a regional group of developing countries can be admitted as a full member. There are about sixty organisations that are members of the Federation. Technical work, which is the core of IFIP’s activity, is managed by a series of Technical Committees, which on an international basis in liaison with participating national authorities foster cooperative action, collaborative research and information exchange. Committees have a number of working groups. A committee that has particular information management interests is for example TC 8: Information Systems. In the case of TC 8 the working groups are: -
WG 8.1 Design and Evaluation of Information Systems
-
WG 8.2 Interaction of Information Systems and the Organisation
-
WG 8.3 Decision Support Systems
-
WG 8.4 OfÞce Information Systems
-
WG 8.5 Information Systems in Public Administration
-
WG 8.6 Transfer Smart cards.
Among the other committees, those that often have information management interests include TC 3: Education, TC 9: Relationship between Computers and Society, and TC 13: Human-Computer Interaction. •
International Federation of Library Associations and Institutions (IFLA) IFLA was founded at an international conference in Scotland in 1927. It has about 1600 members in about 140 countries and regards itself as the global voice of the library and information profession. It is an independent organisation created to provide librarians around the world with a forum for exchanging ideas, promoting international cooperation, research and development in all Þelds of library activity. IFLA’s aims are to:
•
-
Promote high standards of provision and delivery of library and information services
-
Encourage widespread understanding of the value of good library and information services
-
Represent the interests of our members throughout the world.
Society for Information Management SIM International was founded in 1968 as the Society for Management Information Systems. It styles itself as an association for senior information executives and has about 2,500 members, such as chief information ofÞcers from many large corporations and government agencies, as well as professionals in higher education. It has promoted benchmarking approaches for management of information systems. It has an Internet communications facility SIMNet for communicating between members in addition to its Network newsletter. It publishes MIS Quarterly and an electronic support publication MIS Central. It is also one of the governors of ISWorld Net along with IFIP TC 8 and ACM.
•
Society for Technical Communication The STC was founded in the USA in 1953, and adopted its present name in 1971. It represents a membership of about 25,000 who are dedicated to advancing the arts and sciences of technical
CHAPTER 2 27 The information professions
Figure 2.4: Websites of international information associations
28 PART A Overview
communication. The members include technical writers, editors, graphic designers, videographers, multimedia artists, Web and Intranet page information designers, translators and others whose work involves making technical information available to those who need it. •
Urban And Regional Information Systems Association (URISA) This association, which has been in existence since 1963, is subject-based and, although international, is North American in the main. However it is listed here because of the overall importance of geographic information systems. URISA is an interdisciplinary society of professionals dedicated to stimulating and encouraging the effective application of information technology and integration of urban and regional information for decision-making. It is an educational association of providers and users of spatial information in both the public and private sectors and for individuals concerned with the effective use of information systems by local, regional, and state/province governments. A prime concern is that of geographic information systems implementation and application. It provides a professional forum for those who believe that the effective use of computers and information systems technology can improve decision-making by public ofÞcials. URISA’s members comprise a diverse multidisciplinary cross-section of government, private industry and academic professionals. Many are in management, engineering, planning and data processing positions with government.
Considerably more detail about these organisations is available via their Internet Home pages. A number of these are itemised in Figure 2.4. Many are also described in more detail in the Information Industry Directory published regularly by Gale Research. Other international associations of interest include NFAIS, the National Federation of Abstracting and Indexing Services established in 1958. It is a US-based international federation of agencies and companies involved with indexing services, information centres or research into the extent and quality of documentation. It endeavours to develop communications, cooperation and coordination among the publishing, library and data centre sectors.
2.3
Curriculum
The societies have a major interest in the preparation of those who are to enter their profession. Cronin (1995, p. 57) has pointed out that “a curriculum is the operationalization of a discipline’s knowledge base and value system. As such it is the single most meaningful statement of an academic tribe’s raison d’être”. Educational requirements are probably the most signiÞcant factor of divergence between the different societies, as well as a major matter for debate within each organisation. The disciplines have come to the curriculum for information management from many viewpoints. For example inßuential meetings at Georgia Institute of Technology in the early 1960s that were concerned with what were seen as science information specialists (Conferences on Training Science Information Specialists 1962), conveyed the strong scientiÞc inßuence at the time with a joint approach to science information and information science. In the decades since, there has been continuous debate about what it is that information professionals need to learn, with disciplines inevitably emphasising their own orientation. Nevertheless, there have been moves towards joint development of curriculum. For example IS2000 is a set of curriculum guidelines for undergraduate courses being developed for North American business schools jointly by the Association for Information Technology Professionals (formerly DPMA, a North American society for information systems professionals), the Association for Information Systems, and the ACM Education Board. It builds upon a body of knowledge that was identiÞed in IS’97. A similar exposition has been produced for graduate courses and is known as MSIS 2000. A summary tabulation of IS’97 is shown in Figure 2.5.
CHAPTER 2 29 The information professions
Figure 2.5: IS’97 body of knowledge http://www.acm.org/education/curricula.html; with permission of Association of Information Technology Professionals
This body of knowledge has in turn arisen from the association of curriculum characteristics with abilities and knowledge that were used for an earlier manifestation of the curriculum (Cougar et al. 1995). The characteristics itemised by Cougar et al. are communication; computer applications systems; information technology and tools; interpersonal relationships; management; problem solving; systems development methods; systems theory and concepts; and professionalism. Each characteristic is associated with certain abilities and is expected to make use of speciÞed knowledge, for example: Characteristic Communication
Ability to... -
Problem solving
-
Using knowledge of...
Actively listen and express complex ideas in simple terminology Make presentations Write memos, reports and documentation
-
Interviewing skills Proper presentation of data Automated tools and techniques
Recognise the need for the application of analytic methods Formulate creative solutions to simple and complex problems
-
Collection, summary and interpretation of data Statistical and mathematical methods
-
30 PART A Overview
This group of abilities is derived from a systems perspective on information handling. By way of contrast, two of the other professional associations that we have listed, DAMA and IRMA have produced a model curriculum (Cohen 2000) that includes required components in: -
Information resources management (IRM) principles
-
Information systems technology
-
Algorithm concepts and information management
-
Data warehousing, data mining and decision support systems
-
Data resource structures and administration
-
IRM design and implementation.
And electives from: -
Communication technology and information management
-
Global information management
-
Executive information systems management
-
Selected topics in IRM.
The framework proposed for delivery of such a curriculum is shown in Figure 2.6.
Figure 2.6: IRMA/DAMA model curriculum (Cohen (2000)), illustrated with permission, IRMA
CHAPTER 2 31 The information professions
Turning from the systems-oriented approaches to the curriculum, it is worth considering an outline that assumes a records management outlook. This proposal looks at information based upon UNESCO’s guideline (RAMP) that associates records with archives administration (Cook 1993). Components of such a curriculum are as follows: •
Design and organisation of a records management program
•
Records creation -
•
Records maintenance and use -
•
Paperwork quality control programs Clerical work measurement Source data automation Automated & electronic data processing management Management of micrographics.
Program evaluation -
1
Records surveys and setting up schedules to control disposal or retention of records Record centre management Input of records Reference service Planning and administration.
Specialised areas, eg -
•
Filing classiÞcation systems1 Management of Þling systems Filing equipment and supplies OfÞce machines including those for copying and control of supplies Management of ofÞce space and equipment Design and control of central microÞlm services.
Records disposal -
•
Methods of generating correspondence Management of administrative directives including writing technical manuals for staff training Design and use of forms as a management tool Reports management - introduction and control of text processing and creating management systems for department executives Mail management - controlling ßow, establishing systems to assist decision makers.
Training.
Filing and classiÞcation are treated as separate procedures in this text though in records management they are not necessarily differentiated.
32 PART A Overview
The former Institute of Information Scientists, based in the UK, took some trouble to outline the components of an information curriculum (Institute of Information Scientists 1999), so that it could use the criteria for accrediting courses. IIS saw that the curriculum should consist of the four general areas: information science, information management, information technology and ancillary skills. A condensed form of their content follows, since the Þrst two of the four subject areas probably come closer than do those of any of other societies to outlining the contents of the chapters of this book. • Information science The theory and practice of creating, acquiring, assessing and validating, organising, storing, transmitting, retrieving and disseminating information: comprising, -
Information: its characteristics, providers and users: nature, properties and characteristics of knowledge and information ßows; generation, transfer and use of information; elements in the information chain; the information industry and its history; information needs and information seeking and user behaviour; communications systems theory, design and evaluation; human communication and communication in the organisational environment; user types; Þnding and analysing user needs
-
Information sources: sources of recorded information in general and special Þelds, irrespective of format; individuals and organisations that collect, extract and disseminate information such as information brokers and consultants, experts, libraries, information centres, and documentation centres; major information services; secondary sources of information such as abstracts and indexes, databases, catalogues.
-
Information storage and retrieval: media for information storage and choice and organisation of those media for various information types such as full text, abstracts, numeric and tabular data and audio-visual material, and combinations of these; theory of classiÞcation and indexing of information content; thesaurus construction; search strategies for retrieving references, data, full text or combinations of these; reference interview; use of manual, automated and mixed systems; use of human and technical networks for retrieval; expert systems; internal and external systems, services and networks; input, indexing and output for successful retrieval; evaluation of retrieval systems and secondary sources of information.
-
Analysis of information: use of appropriate information sources for regular and systematic collection of information; evaluation, interpretation and validation of that information, including the preparation of abstracts; building of specialist Þles for storage and retrieval of evaluated information; quantitative and qualitative analysis; preparation of state of the art reports, reviews, overviews and scenarios
-
Presentation of information: preparation of bibliographies and evaluated information reports; effective presentation of information, including oral and written presentation skills; proof reading, editing and presentation; reprography and publishing, including desk-top publishing; selective dissemination of information and other methods of current awareness
-
Theory of information science: theoretical studies of information, its nature, deÞnition, content and signiÞcance; development of theoretical models of information systems and processes; research into information science.
• Information management The management of the total information resources of organisations, comprising: -
Planning: information requirements analysis; impact of information on organisational performance; information units within the organisation; integrating information systems with corporate strategy; impact of technologies
CHAPTER 2 33 The information professions
-
Communications: theories and models of communication and their applications; communications audits; information ßow; value added networks; interpersonal communication and intergroup communication
-
Management information and control systems: decision-making process and the role of management information; data collection and data ßow analysis; systems analysis, design and speciÞcation; documentation management; information provision for management control and business analysis
-
Human resource management: job analysis, design and description; job evaluation; recruitment; selection; assessment; training; industrial relations; staff management, motivation and interpersonal relations
-
Financial management: accounting; cost analysis and control; decision support; programming, planning and budgeting, including the estimation of expenditure; performance assessment - objectives, cost-effectiveness and cost-beneÞt analysis; Þnancial forecasting; planning and policy
-
Promotion, economics and marketing: publicity and public relations; production of newsletters, bulletins; economic factors; marketing techniques and strategies
-
Political, ethical, social and legal factors: political climate; role of government and agencies; ethical and legal factors including privacy, secrecy, freedom of information, health and safety, data protection, transborder ßow; social factors.
• Information technology Technology, which may be used in information science or information management as follows: -
Computer systems: hardware and software: work stations; input/output and storage devices and systems; principles of operating systems and applications programs; software packages, especially for information storage and retrieval; programming; Þle design; record layout; database systems and database management; feasibility studies; speciÞcations; design; package appraisal; implementation; evaluation; documentation
Applications: information retrieval, videotex, teletext, computer typesetting, computer output micrographics, speech synthesis and voice recognition, automation of library functions, ofÞce automation, compact disc technologies, video scanning and digitising, satellite and cable TV, other methods of electronic publishing and document delivery, including telefacsimile; machine translation
-
Environment: health and safety; ergonomics; data protection; copyright; piracy; encryption.
• Ancillary skills Examples of important ancillary skills (not intended to be comprehensive): -
Research procedures: research proposals; investigation, data collection and sampling; statistical signiÞcance analysis; evaluation of results; report writing
-
Linguistics: natural and formal languages, linguistic classiÞcation; semantics, syntactics, pragmatics; relations of semantics and linguistics, psychology, logic and philosophy
-
Foreign languages: use of foreign language information sources; translating and abstracting from foreign languages.
34 PART A Overview
Courses were accredited by IIS if it considered that they contained a large proportion of any one of information science, information management or information technology together with a reasonable coverage of the two areas not taken as the main topic. These examples from different professional associations show similar scope with different orientation, according to the different disciplines. This is also reßected in the way that information management courses are situated in the universities. There are many examples where information management is taught in some guise as a graduate course from a vocational perspective, assuming that those entering it already have a discipline in a subject area – any subject area! Equally, there are many undergraduate courses, where information management is taught as the discipline. The wider framework in which the information management is conveyed has markedly different emphasis, depending on the school or faculty that harbours it – there are numerous examples in schools of management, of social studies, of professional studies, of information systems, of education – but not too often of information management itself. Reference to an international range of courses can be made in the list maintained by Wilson (2002).
2.3.1
Competencies
In recent years, there has been an increasing drive within some countries to codify the workforce requirements in terms of competencies. This has been embraced particularly at sub-professional levels; however there have been pressures exerted on the higher education sector to pay more attention to a competency-based approached to education. Many universities see this is being in conßict with an approach based upon broader educational objectives that aims to produce ßexible and adaptable graduates. This stems, among other things, from the concern that higher order intellectual skills are not amenable to measurement as competencies. More speciÞcally, there is difÞculty in assessing such abilities as problemsolving skills, judgment for decision processes and analysis. There is also discomfort with the procedures for assessment by performance (such as demonstration of tasks, though not necessarily rating capability), rather than attributes (examination and testing for knowledge).2 Nevertheless, sectors in the information professions have from time to time tried to establish core competencies. For example, Nichols et al. (1996, p. 12) have characterised the competencies of a library and information professional as: ability to conceptualise information, knowledge of internal and external information resources, understanding of information resource management, and ability to synthesise and tailor information. Similarly Danner (1998) cites working groups from professional associations that have deÞned competencies broadly as the interplay of knowledge, understanding, skills, and attitudes required to do a job effectively. Given this, there is seen to be a knowledge base that includes expert knowledge of the content of information resources, specialised subject knowledge appropriate for organisations or clients, use of 2
Professional associations have taken different stances on this matter. For example in Australia, the Australian Library and Information Association committed itself to support Arts Training Australia (the body designated to develop the competency standards in the library sector) from sub-professional levels through to the professional levels, although it is not clear to what extent the ‘library sector’ is deÞned by ATA and whether courses for information professionals are seen to be addressing that sector. The Australian Computer Society’s response was less committed, although it has published what it considers to be a core body of knowledge for IT professionals - both central skills and knowledge, as well as topics (Goldsworthy 1993, p. 121) - that would provide the necessary basis for competency development. Of relevance is the fact that professional organisations like the IEEE Computer Society and the ACM have drafted agendas on competency training and government licensing for computer programmers, despite reservations in the software development community about licensing.
CHAPTER 2 35 The information professions
appropriate technologies to acquire, organise, and disseminate information, ability to evaluate outcomes and conduct research, and participation as an effective member of a senior management team. These are associated with personal competencies - the set of skills and attitudes that enable effective work - such as commitment to excellent service, interest in seeking out challenges and new opportunities, effective communications skills, leadership skills, capacity for teamwork, personal business skills, and ßexibility. Many of the values that have been outlined in this Chapter are shared between the different specialisations of information management. Have the various disciplines within information management reached a common calling? It seems not. If they had, there would be a well-deÞned profession that could acknowledge different areas of specialisation within a general framework of skills. There are signs of this. For example, there is assembly of groups that bring information skills from the different disciplinary backgrounds to bear in projects that create information systems and services. In cases like these, the common calling seems to include a mission to provide intermediation for information resources, to solve problems of information organisation, to empower end users to use the resources effectively, and to carry these out by means of effectively communicating the possibilities with a service orientation.
2.4
Further reading
Since most of both the international and national associations maintain World Wide Web pages on the Internet, the most current information about them may be found by direct access to the respective pages. Debons et al. (1981) report on a national survey of information professions in the United States, and give detailed categorisations of information functions performed and statistical information associated with the functions. Recent literature has examined education from the perspectives of particular disciplines, for example Bearman (1993) on the education of archivists, Cox (1994) on electronic records archivists, Beheshti (1993) on programming in library and information studies education, and Friedman and Kahn (1994) on educating computer scientists from a social and the technical perspective. Cooper and Lunin (1989) produced an extensive bibliographic review of the education and training of information professionals. Forgionne (1992) suggests a mechanism for a multidisciplinary approach to education. Danner (1998) provides particularly stimulating discourse that compares computing professionals with librarians. The journals of the professional societies regularly examine educational and professional matters. Other journals that regularly deal with educational material include Education for Information, and Journal of Education for Library and Information Science. Mason, Mason and Culnan (1995) contains a detailed section on information professionalism in the context of a work on ethics of information management. Texts that deal with professional responsibility with an emphasis on information technology include Johnson and Nissenbaum (1995); Forester and Morrison (1994); and Weckert and Adeney (1997). This last undertakes a philosophical approach to develop a stance on such matters as privacy, intellectual property, censorship and responsibility based upon information ethics.
CHAPTER 3
3
Information science ............................................................... “… do we not feast upon trivialities to divert attention from the truly portentous issues that engulf us”
said Stephen Jay Gould in Questioning the Millennium.1 Some would say that it is not worth discussing what information is – that we should just get on with managing it. However, there is so much confusion caused by different interpretations of data, information and knowledge, that some discourse on them seems essential. The discussion inevitably leads us to seek basic principles, and to understand these within a framework as information or knowledge science.2
3.1
Principles
The emphasis of this text is more on the application of information science to information management, than it is on information science itself.3 However the management cannot be considered without reference to some foundations, so this chapter examines some of the more prominent principles and terminology that are a basis for the management. A discipline can be regarded as a deÞned body of knowledge accompanied by a set of principles that are developed and practised by a scholarly community that has a deep insight in the area. If that is so, information science is multidisciplinary in that it borrows from those aspects of psychology, linguistics, engineering, documentation, classiÞcation and computing, among others, which focus on information.
1
He was exploring the ramiÞcations of the decision of Dionysius Exiguus (‘Dennis the short’), to start the Christian calendar on January 1 of year 1 Anno Domini rather than year zero.
2
Contrast this with the social theorist, Herbert Spencer’s description of science as organised knowledge.
3
This implies a clearer distinction between information science and information management than is probably the case as yet. There continue to be works written as information science that describe applications rather than investigations.
CHAPTER 3 37 Information science
3.1.1
A SCIENCE of information?
The deÞnition of science itself is problematical. The philosopher of science, Thomas Kuhn, has been inßuential in describing what a science is, and how it develops. He looked for four elements in a scientiÞc discipline: symbolic generalisations, exemplars, shared commitment to beliefs, and shared values. Information science is still at a formative stage of its development. It would be difÞcult for the range of people who see themselves carrying out information science to maintain that they follow each of these elements. Kuhn (1970) is probably more instrumental than most in introducing the term paradigm into common usage. He saw science as undergoing paradigm shifts, where for a particular discipline there is a change in world view and a new framework of understanding for the community of practitioners which itself realigns and attracts new adherents who try to deal with the unanswered questions of the discipline. It is probably true to say that information science has yet to attain a coherent paradigm, but is still a set of partially overlapping ‘proto-paradigms’. The sense in which science is used here is the broad one of knowledge that is being systematised (arranged in an orderly manner) by the continuing investigation and interpretation undertaken by scholars and researchers who are trying to comprehend and improve the practice to which it is applied. The practice in this case is information management.
3.1.2
A science of INFORMATION?
One of the more debatable aspects of establishing the scope of information science is determining just what information is. To this point, ‘information’ is a term that has been used glibly in this book. Now it is worth considering some interpretations. One should also heed the advice of McGarry (1993) who in exploring the many meanings of information, warns against conceptualising it ‘as a kind of ßuff that is wafted between people’.4 Some elaborations of McGarry’s reßections, as well as some deÞnitions brought together by Liebenau and Backhouse (1990) are adapted to produce what follows. Information can be: •
A near synonym of fact. This is the everyday information we readily Þnd out and assimilate into our knowledge base: the football score, the price of bread, the name of someone. It is a something that is communicated or received concerning a fact or circumstance.
•
A reinforcement of what is already known. Claude Shannon, whose model of communication will be considered shortly, suggested that we have received information when what we know has changed. Information is that which logically justiÞes alteration of, or reinforcement of a representation of a state of affairs. Such representations may be as explicit as in a map or proposition, or they may be as implicit as in the recipient’s state of goaldirected activity.
4
Roszak (1994, p. 13), citing examples such as ‘intelligence’ and ‘order’, has noted how much confusion results, when scientiÞc rigidity of deÞnition is applied to words that have common sense meaning.
38 PART A Overview
•
The freedom of choice in selecting a message. This implies that if an attempt at communication is ignored, no information is received, though some information about the communicator at least, if not about the content of the message, would be transferred.
•
The raw material from which knowledge is derived. This type of interpretation is often also used for ‘data’, but in this case information is an intermediate organising state between data and knowledge. The data must be recorded, classiÞed, organised, related or interpreted within a context to convey meaning within the cognitive structure of a recipient.
•
That which is exchanged with the outer world, not just passively received. This is a cybernetic interpretation, for example information is “a name for the content of what is exchanged with the outer world as we adjust to it, and make our adjustment felt upon it. The process of receiving and of using information is the process of our adjusting to the outer environment, and of our living effectively under that environment” (Wiener 1950, p. 124)
•
Defined in terms of its effects upon the recipient. Here we have a pragmatic interpretation. If we were trying to measure it, we would be trying to measure an outcome such as a decision. If we look at an amber signal when driving, the information is considered in terms of whether we choose to stop or not.
•
Something that reduces uncertainty in a particular situation. This is a data-oriented deÞnition of information. The data may reduce uncertainty in a statistical sense, but not a semantic sense. Hayes (1993, p. 1) describes an alternative approach: “that property of data (i.e. recorded symbols) which represents (and measures) effects of processing of them”.
•
In verse ‘Noise and randomness are information’s constant companions Poetry is a tangle of bits on a pedestal, in the mind. Poetry is information Þreworks. A poem is a hard, sparkling diamond of information. Poetry is compressed insight, unstable and likely to explode.’ (Housman 2000)
It is noticeable that these deÞnitions tend either towards an approach in which information is self-contained and has a kind of objective existence independent of use, or towards an understanding that says information is deÞned by its use and human interpretation. The latter requires information to be constructed by the cognition of receivers (Dervin & Niland 1986). These approaches can be reconciled if one accepts that information is being analysed at a different level of interpretation for different purposes. In the copious literature of information studies, one may see data, information and knowledge all used interchangeably. One will also see distinction made between them, as successive levels on a graded scale of understanding. For example, although Debons, Horne and Croneweth (1988, p.2) consider prevalent everyday uses to be information as commodity, energy, communication, facts, data or knowledge, they articulate a continuum (they call it a spectrum) starting with an event that may be symbolised with data, that may be successively processed through a cognitive domain as information, knowledge and wisdom. Similar approaches to distinguishing these entities as successively higher levels of awareness have been
CHAPTER 3 39 Information science
suggested more recently by those grappling with the concept of knowledge management. An example of the continuum is depicted in Figure 3.1. It is in this same sense that these terms are used throughout this book, despite assertions by some (Clarke 1999) that there is no continuum.
Figure 3.1: Data-wisdom continuum
This is signiÞcant because if an information or knowledge science is to be developed in terms of shared understanding by its investigators, they must be conÞdent that they are talking about the same terms in the same way for development of their models and theories. Another way of emphasising the distinction that may be made between data and information is to say that context adds meaning to data. Figure 3.2 tries to exemplify this.
Figure 3.2: Data plus context = Information
Devlin (1999) extends this analysis by saying that information must be grasped according to situational analysis. He makes a distinction between the representation of information (red light, knot in a handkerchief, words on page) and what the information conveys. So the situation codes information by virtue of the situation being of a certain type. The information for the situation is subject to constraints (such as grammar, or the limited coding system of knots in a hanky). Constraints are the regularities that make intelligent action possible. The motorist who sees a red light in the type of situation such as a trafÞc intersection takes an action according to the constraint of the trafÞc laws. The extent to which knowledge about information itself was being systematised from quite different viewpoints was recognised by Fritz Machlup. He appreciated that contributions came from cognitive
40 PART A Overview
science, cybernetics, library science, linguistics, artiÞcial intelligence and computer science, and explored ways of synthesising the different approaches (Machlup & MansÞeld 1983). However the nature of the area of study that information science investigates is still very diffuse. Perhaps that is why a number of writers in the Þeld have turned to poetry to try to capture some of the indeterminacy. Often quoted in this context is T.S. Eliot. His lines from the Þrst chorus of The rock, are often used by those describing the different levels at which information may be interpreted: “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” Taken out of the context of the poem, these lines can be used to exemplify the qualitative differences between information and its interpretation. However, Eliot’s meaning may have been more spiritual in intent.5 One might regard Eliot as impenetrable, but perhaps in an earlier work of 1917 he presaged information in the computer age.6 Possibly the image here, like deÞnitions of information, allows different levels of interpretation. Before leaving the data-wisdom continuum, note the alternative take on beauty that you may see inscribed on a stone bench at one of our seats of learning: “Verily by beauty, it is that we come to wisdom”.7 One may not see information science as poetry or imagery but it is easy to see why the humanities have been summoned to deal with the difÞculties that information scientists have in deÞning what it is they are investigating.
3.1.3
Epistemology and information science
In contrast to a poetical approach to information, one can turn to abstract reasoning. Epistemology is the theory of knowledge. It is concerned, for example, with the relationship between an object and the ‘knower’ of the object, as manifested by perception or reasoning. Knowledge is consequent upon the nature of reality (metaphysics), and its representation is shaped by that aspect of metaphysics
5
Eliot, of whom it was said by Henry Root ‘He generally ßoats like a cork on a sea of knowledge’, versiÞes: The endless cycle of idea and action, Endless invention, endless experiment, Brings knowledge of motion, not of stillness; Knowledge of speech, but not of silence; Knowledge of words, and ignorance of the Word. All our knowledge brings us nearer to ignorance, All our ignorance brings us nearer to death, But nearness to death no nearer to God. Where is the Life we have lost in living? Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information? (Choruses from The rock 1934 in The complete poems and plays of T.S. Eliot. Faber, London, 1969)
6
It is impossible to say just what I mean! But as if a magic lantern threw the nerves in patterns on a screen: Would it have been worth while If one, settling a pillow or throwing off a shawl, And turning toward the window, should say: ‘That is not it at all, That is not what I meant, at all.’ (The Love Song of J. Alfred Prufrock in The complete poems and plays of T.S. Eliot. Faber, London, 1969).
7
From Robert Bridges, The testament of beauty at University of Western Australia
CHAPTER 3 41 Information science
concerned with describing the characteristics of reality - ontology. (Ontology is also of interest in relation to classiÞcation, but this is addressed in Chapter 10.) If one can assume that information is an originator of knowledge, can one adopt a metaphysical approach to information? Starting with the Cartesian assumption of reality (‘I think, therefore I am’) -, an assumption of our existence as a result of our perceptions -, one might say the process of identiÞcation of things is a way of symbolising our perceptions. IdentiÞcation carried out between people is then an exchange of information, or communication. From this viewpoint information science could subsume all other sciences, and would be presumably studied as a discipline that understands the explanatory strategies of other disciplines (Rayward 1996). One might contrast this with the theory that information is not dependent upon the existence of the human mind or a mental construct to help us understand the world we inhabit. Instead, it is a property of the universe like matter and energy. However, Stonier (1990) who has argued for this independence of information also sees information as a range with a datum at one end of a spectrum that has knowledge, insight and wisdom at the other.
3.2
A focus for information management
For the purposes of this text, those aspects of information science that relate to information in use are emphasised. This means the application of information science to study the needs of information seekers, and the preparation of information to address those needs. In this respect, one can employ the epistemology of information in a particular context, but conÞne oneself more narrowly than the disciplines canvassed by Machlup. By seeing as contributory, but not quintessential, those disciplines that are primarily interested in the processing and interpretation of data (or symbols, or information, as it may be to them), one can move cognitive science, computer science and cybernetics to the periphery of one’s focus and concentrate on processes that assist the transfer of information for identifiable needs of users at an information rather than data level (thus also moving telecommunications to the periphery). This focus for information management has been approached since the 1960s. The proceedings of one of the Þrst interdisciplinary meetings documented a serviceable deÞnition of information science: investigates the properties and behaviour of information, the forces governing the ßow of information, and the means for processing information for optimum accessibility and useability. The processes include the origination, dissemination, collection, organisation, storage, retrieval, interpretation and use of information. The Þeld is derived from and related to mathematics, logic, linguistics, psychology, computer technology, operations research, the graphic arts, communications, library science, management, and other Þelds (Conferences on Training Science Information Specialists 1962, p. 115) Such a deÞnition, though with computer science having a more central focus, has been used for ‘informatics’ (or ‘informatique’) in parts of Western Europe, and with social sciences being a more central concern, as ‘informatika’ in Eastern Europe. The focus on information in use and the organisation of its provision either directly or via the record is an approach introduced by documentalists such as Otlet, La Fontaine and Cutter at the advent of the twentieth century. Such use is concerned with any communicated information whether or not recorded. For a documentalist, the record is the physical information in any form, and the document is the record together with the medium that is carrying it, be it a handwritten letter or a digital tape. The study of the process of using information has appropriated a wide range of techniques initiated during the Þrst half of the twentieth century including:
42 PART A Overview
-
Analysis by people such as Lotka, of publishing behaviour during the 1920s
-
Applications of computational devices for text processing (the Þrst functioning electromechanical computers were running in laboratories in the 1930s about a hundred years after Babbage had envisaged them)
-
Cryptology, which gained impetus during the World War II with developments such as deciphering of the Enigma code.
In the second half of the century, additional techniques were introduced including: -
Mathematical modelling of communication, especially the Shannon-Weaver explication of the transmission and processing of data or information theory
-
Operations research, and its progress into systems theory
-
Cybernetics, the study of control as advanced by Wiener
-
Linguistic and documentation analysis
-
Analysis of information seeking behaviour, and the process of making sense from initially uncertain situations.
Study in each of these areas has contributed to our understanding of the systematisation of information processes. The systematisation itself of course long predates computer systems, with the realisation that the data accumulated in so many ways had to be organised to make it capable of providing information in use, and of advancing knowledge. There are many illustrations of such systematisation through the ages. An early example is the coded arrangement of clay tablets at the Royal Library at Nineveh from 600 BCE. A somewhat more recent approach is that of the Swiss Conrad Gesner’s 16th-century attempt to produce a bibliography of world literature, the multi-volume Bibliotheca universalis.
3.3
Areas of study
Four basic disciplines (Debons, Horne & Croneweth 1988, p.15) that provide a foundation for information science study are philosophy, mathematics, linguistics and behavioural science. -
Philosophy
This is used in trying to understand the nature of the universe. The rules of logic are used as a device in this discipline. When used with mathematics rules they enabled the development of computing machines and concepts important to information systems development. Philosophy helps with information retrieval by providing an understanding of inquiry systems - the way individuals ask questions.
-
Mathematics
This provides the formal language that enables the measurement of experience, and therefore provides the basis for statistical and other analytical tools. Together with logic, it also provides for computer science approaches to the manipulation of the symbolic representation of the information.
-
Linguistics
This analyses our communication through language. Communication provides the information transfer that leads to knowledge acquisition through our representation of events, their categorisation and classiÞcation; and linguistics gives us the formal study of language use in the expression of this.
CHAPTER 3 43 Information science
-
Behavioural science
This, including branches of psychology and sociology, provides understanding of the ways in which a person seeks, generates, uses and acts upon information, and of the signiÞcance of the social practice of information transfer.
Other writers have expressed the multidisciplinarity by positioning information science, not as one discipline, but as some thirty or forty disciplines of information (GrifÞths 2000; Machlup & MansÞeld 1983), and including: ArtiÞcial intelligence Bibliometrics Communication sciences Communicative theory Computer science Control theory Cryptography Cybermetrics Cybernetics Documentation Lexicology Library science
Linguistics Living systems research Pattern recognition Phonetics Scientometrics Semantics Semiotics Speech science Systemics System science Telecommunications
The areas of study in which these disciplines may be brought to bear for the practice of information management are concerned with intermediation between information content and its users, and are as follows: •
The nature of information Concepts leading to a better understanding of what information is and how it may be deÞned and measured unequivocally.
•
Information design The ways in which documents and databases may best be structured and used in association with metainformation.
•
Information retrieval The most effective ways of extracting appropriate items from databases.
•
Information access Barriers to control and the political framework that inßuences legislation relating to access with respect to copyright, freedom of information and privacy.
•
Informetrics Analysing patterns of production and use of information.
•
Information repositories The storage and access issues associated with maintaining documents in organised collections and providing an effective environment for use of the documents.
44 PART A Overview
•
Document analysis Content and context analysis of documents associated with cataloguing, indexing and classiÞcation studies, and natural language processing of the content of the documents.
•
Information use behaviour Analysis of information seeking, organisation and utilisation and communication by individuals.
•
Interface design In its broad sense, this concerns the interface between the records and those who have to use them, so it includes human factors, presentation formats of books, presentation formats on computer output such as permuted indexes, bibliographic representation, and data design and schemas for databases.
•
Management of information sources, services, systems Organisation and control of information environments, the people who work in them, and the resources needed to maintain them.
•
Evaluation measures Appraisal studies, such as cost beneÞt analysis, performance assessment, and economics and growth study of any of the above areas.
3.4
Communication of information ‘It takes two to speak the truth - one to speak and another to hear’
H.D. Thoreau, A week on the Concord and Merrimack Rivers. Wednesday. 1849. Communication involves change in the knowledge of the participants as a consequence of the information that is shared between them. There are various approaches to understanding this process.
3.4.1
Models for the process
Understanding of the communication process will be improved if one can share an underlying conceptual model of what is happening. Unfortunately, we are far from a shared paradigm. In fact the range of perception of how communication happens is a driving force for the establishment of different camps in information science. For example, some say that communication is any information transfer process where information is deÞned in terms of statistical uncertainty. Others would say that only the signiÞcance, competency, motivations and consequent interpretations that a message sender and receiver bring to a situation make it communication. Modelling of the communication process is only one of the approaches that may be taken to providing a conceptual framework for information science. Alternatives include classiÞcation or taxonomical approaches to naming and categorisation, and systems theory or a general systems approach to describing information transfer process in terms of their interactions and effects.
CHAPTER 3 45 Information science
However, it is through communication theory, modelling information transfer in terms of interactions between creators and users of information, that a framework is investigated here. This is because for one aspect of interpretation of information at least it is substantiated by a mathematical model. It also provides a useful framework for Part B of this book, which looks at various stages in the information transfer process in terms of a communication model. The most inßuential communication model is that used by Claude Shannon in 1948 to illustrate his communication theory (Shannon & Weaver 1964, p. 34). A representation of it appears in Figure 3.3.
Figure 3.3: Representation of Shannon communication model
It is important to note that the mathematical framework that Shannon established to support this model concerned communication, and that his model is pertinent to the data that are being transferred in a communication process. The model was developed in order to investigate such matters as bandwidth in telecommunication. Although he was at pains to indicate that the sense in which he was dealing with information was at the un-interpreted data level, the diagrammatic representation of what is happening has been appropriated subsequently by others to represent semantic information and knowledge transfer as communication. This is acceptable from a graphical viewpoint, as long as we remember that there is no underlying mathematical quantiÞcation of semantic information or knowledge. There are additional limitations to this graphical approach, for example: -
It is a simple one-to-one representation of communication, but as we know, mass communication by deÞnition has multiple receivers
-
A single receiver is usually receiving messages from multiple sources concurrently, or indeed concurrent messages from one source (sometimes referred to as multi-channel transmission)
-
Each element of the representation may in fact be a gross simpliÞcation of underlying processes; for example, if one takes the case of a microform as a channel of communication where the source is an author and the destination is a reader, then the encoding process of getting the words on to Þlm may in fact involve a number of linear representations of the communication process: · author→publisher; · publisher→proof-reader; · proof-reader→photographer; · photographer→data processor;
-
No account is taken of the knowledge state of the sender or receiver.
Nevertheless, in the absence of a better alternative, the model represents something with which we may share an understanding of information transfer and use to ask ourselves questions about the process.
46 PART A Overview
A number of adaptations have been made to Shannon’s communication model in order to make it describe communication in a more general sense. Figure 3.4 is a way of using the model as a metaphor for a broader look at information transfer in general, and so allowing us to interpret at different levels the message transfer in the communication process. This representation of the model uses an approach like that of Vervest (1987) to extend the provision for noise, incorporate provision for feedback and indicate the different levels at which information may be considered within the framework. A whole chain of transmissions may in fact be taking place within what may have arbitrarily been designated as a single transmitter-channel-receiver schematic. For example, in a radio or television broadcast, a presenter or announcer may be regarded as a source for the broadcasting process that is regarded as the channel, a receiver being a radio or television set, and a destination being a listener or viewer. However the broadcasting process itself may be segmented into a number of successive channels of communication, such as the air between an announcer and a microphone, a tape used to record the presentation, wires to a transmission antenna, and so on. On each of these, the message is carried by a signal of a different form.
Figure 3.4: Adapted communication model
Considered in this way, each receiver becomes a transmitter after transforming information. Meadow (1973, p. 6) describes the communication process conceived thus as a chain of transducers. However, he sees the successive sources and destinations as being the components of each transducer. If sources and destinations are to be regarded as human for the process to be communication, then a transducer should reasonably be regarded as successive transmitter-channel-receiver parts. There are other aspects of the model that require clariÞcation. These are explored below.
CHAPTER 3 47 Information science
3.4.2
Levels at which messages may be interpreted
Interpreting where information sits within a communication model, brings one back to consideration of the data-information-knowledge continuum. Within the framework of the model, messages may be analysed at the following levels: •
Syntactic Weaver referred to this as the technical problem (Shannon & Weaver 1964, p.4). It concerns the accuracy with which the symbols of communication may be transmitted. It is this level of information that Shannon’s model deals with, and provides the basis for determining such matters as how much syntactic information may be carried by channels. At this level therefore we are able to quantify information. This type of information may be described as a pattern or design that arranges data as an instrument of information. In some writing one may see reference to an empirical level of information that is concerned with the mechanical technicality of conveying data, but here it is not differentiated from the syntactic level.
•
Semantic Our concern here is with the meaning of the message transmitted. The message may not be interpreted without a syntactic foundation, but overlaid on this is the identity of the message, something that we can comprehend. This type of information is something that changes the knowledge state of the recipient.
•
Pragmatic Weaver referred to the pragmatic level as the effectiveness problem. It is concerned with the extent to which the message inßuences the desired conduct of its recipient. It is therefore concerned with outcomes. This type of information adds to the syntactic and semantic levels, and causes a decision to be made within a particular context.
The interpretation made by Meadow is a useful one. He notes Weaver’s point that physical transmission is a requirement if the semantic level is to be communicated, and refers to 3 levels of communication: technical, semantic and effective. He equates Weaver’s semantic and effective levels with what Hayakawa (1978, pp. 63, 106) calls the informative and affective functions of language, using affective deliberately to suggest arousing of response rather than efÞciency or effectiveness. Meadow avoids a deÞnition of information but assumes that, whatever it is, it is communicated only in structured sets of symbols that are representations of information but not the same as information: “If the reader of a book feels that he has found information therein, so be it” (1973, p.4).
3.4.3
Ways in which messages may be represented
In the model, a message is carried on the channel by a signal. The signal may only be understood at the destination if the receiver is capable of decoding it. The signal will comprise a series of signs organised according to a code that determines how the signs relate to each other, and may be transmitted in several different forms carrying the same message during the communication.
48 PART A Overview
In binary digital communication, the bits are the signs corresponding to the magnetic or electronic state of the channel, and with a code based upon the number of signs (just two signs in the case of the binary system) and the number of positions that are used to organise the signs. For example, if the receiver works on the basis that the signs are positioned in groups of Þve, then we have a 5-bit code and the two basic signs can represent 25 or 32 different things. Semiologists are concerned with message representation with signs irrespective of the form of representation of the sign. For them, a sign is something that has established signiÞcance to a recipient, and which refers to something other than the sign itself. For example, yawning may be a sign of boredom, or the SOS signs in a Morse signal may be used to convey distress. Signs represent objects. If they have a physical resemblance they are termed icons. If they have signiÞcance beyond what they directly represent, they are symbols. The signiÞcance is usually created by association that the sign has in a particular culture, for example the skull and cross-bones, or the Þve circles of the Olympic symbol.
Figure 3.5: Signs of the times
Language is a key form of human communication in that it tries to provide information to a recipient from the knowledge of a source. But knowledge and language are inextricable - to what extent does one modulate the other, and therefore the ensuing messages that are carrying information? A concern of linguistics is to try to determine the extent to which the knowledge that we try to convey is misrepresented by the tools that we use to convey it (language), or if it is actually shaped by that same tool. Other characteristics of communication that are signiÞcant are noise, redundancy and feedback.
3.4.4
Noise
Noise is anything that affects the Þdelity of the messages and distorts the meaning intended by the source. -
Syntactic noise
produces technical distortion of messages, for example the faulty decoding of bytes from a magnetic disk, the unfocused projection of Þlm or the poor typography in a book. This in turn has an effect on the semantic and pragmatic levels of information. Syntactic noise may be introduced at the channel or from the transmitting and receiving devices.
-
Semantic noise
is introduced by the source or the destination and is anything that distorts the original meaning of the message as it was intended by the source. This may include: Communication skill: The originator may not be able to express the message as was intended because of limited writing or speaking skills. The recipient may not be able to take the message in as intended because of limited listening or reading capacity. Knowledge: What both a source and a recipient know about the subject being communicated will affect the message. For example, a specialist in data communications, speaking with a novice in the area, may introduce noise by using jargon and acronyms without explaining them. Attitudes: Both source and recipient may affect the Þdelity of a message. If, for example, the source is a political journalist who
CHAPTER 3 49 Information science
has reluctantly moved to the sports pages of a newspaper, then the articles produced may be coloured by a derisory approach to what is being reported. Alternatively, an airline passenger inattentive to safety messages from ßight attendants because of boredom or distraction may not take in the content or import of the messages. Sociocultural system: The way information is interpreted will depend upon the cultural context in which it is conveyed. For example, if a source has some indication of social status of people conveyed by a title like ‘Reverend’ or ‘Professor’ or ‘Dame’, their messages may be assimilated more attentively than the same messages conveyed by someone of different social status. Alternatively, the same source may communicate the same message in different ways. As Berlo (1960, p. 50) points out, the position of sources in a social and cultural context will affect their communication behaviour. An army captain will communicate in one way to a group of colonels and in another to a group of sergeants. Is it possible to speak of Pragmatic noise? Perhaps it is possible to say that the way a message takes effect at destinations will be distorted by the attitude of the destination and the cultural milieu in which it operates. However, it is difÞcult to maintain that this is any different from semantic noise.
3.4.5
Feedback
The original model of Shannon was criticised for its linearity (derivative of an Aristotelian rhetorical outlook), and found to be wanting in describing the inßuence of destination on source. Cybernetic models like that of Wiener incorporated adaptive control. They describe communication in terms of a system that has a receptor, interpreter, and effector in a gestalt, where all the entities in the communication system are affecting all the others. The interpreter processes information, endeavours to maintain an equilibrium or homeostasis, feeds back and develops knowledge by assimilation of the received information. The principle of feedback has been incorporated in the model derived from that of Shannon, so that you may visualise a source modifying its messages as a consequence of information received from the destination. Typical examples at the semantic level involve sources that are presenting to audiences as performers or lecturers. The feedback may be verbal, but may also be through concurrent parallel channels - the term ‘multi-channel transmission’ is sometimes used in this respect - such as the gestures of body language, accompanying verbal messages and eye movements. The raising of an eyebrow may reinforce a verbal response. Interpretation of these gestures at the source affects the way that the originating message is being conveyed. At the syntactic level, feedback returns from the decoding device to affect the information coming through the transmitter - heard typically from loudspeaker through microphone. These deal with feedback in the adaptive sense, but since the intention here is to describe the various information storage media as channels, does the communication model still apply in relation to a book or a Þlm or a disk? One attempt to deal with this has been called inferential feedback. This assumes that for example, the source as writer of a book or developer of some software is inferring feedback by anticipating the response of the sources to some extent, and modifying the messages accordingly.
50 PART A Overview
3.4.6
Redundancy
If we are repetitive about something we say, the information transfer has a signiÞcant amount of redundancy.8 In data transfer terms redundancy indicates the extent to which a code is away from its maximum efÞciency. However in semantic terms, redundancy often helps us to understand information. For example, had the last phrase from the previous sentence appeared as ‘redudancy oftn hlps us understd information’, the meaning of the words would still be comprehensible because we can supply the missing letters, knowing that those missing ones are the only alternatives if we have to guess the word. This is a consequence of the redundancy in language. It is used with words as well as with letters in words. A source that transmits information with no redundancy will cause a receiver to make an uncorrectable mistake if the message is affected by noise before reaching the receiver. Cryptographers when trying to break codes use redundancy in language as part of the search for repeating elements. As the redundancy of information increases, its entropy decreases. Entropy is a measure of the extent of disorder or unpredictability of a message. This is considered further in the section on measuring information.
3.4.7
Life cycle model
An alternative way of looking at information ßow in general terms is to consider a life cycle approach. Figure 3.6 illustrates such an approach derived from the records cycle that records managers have used as a model for the document ßow environment, prior to dealing with the records environment as a continuum of information stages.
Wurman frets about redundancy in signs in Information anxiety. He objects to messages with a lot of redundancy such as ‘Unnecessary noise prohibited’.
CHAPTER 3 51 Information science
In this model, examples of each step are: •
Creation (production of records to support an operational function) Correspondence, directives, reports, forms, computer Þles, microforms.
•
Distribution (internal and external to organisations) Post, courier, fax, email, online access.
•
Use (utilisation and decision making based upon information content) Administrative, Þscal, legal, research.
•
Maintenance (organisation of the records for searching) Indexing, Þling, retrieval, metainformation.
•
Disposal Transfer to secondary storage or destruction.
One difÞculty with this approach is the confusion introduced by completing the cycle with ‘record destruction’, when in fact the information may have already ßowed on beyond the record. So it is the agent carrying the information that is destroyed rather than the information itself. A second difÞculty is that the model does not make explicit reference to what certain commentators call sensitisation to external sources. Because it has been developed primarily from a documentation perspective that is internal to organisations, it does not allow for the creation segment being modulated by information resources from the external environment. Some writers who have used the life cycle to illustrate information management practices, for example Marchand, Kettinger and Rollins (2001) have included a component called ‘sensing’ to make plain that management is concerned about scanning of the external environment. A more accommodating life cycle model is that developed by Browne (1985, p.100) in order to relate information enterprises to their activities. It is illustrated in Figure 3.7. Here, step 1: research and information generation allows for external sources being used to create information, and the steps of the cycle are then elaborated until step 14: interpretation that follows analysis and retrieval. The model then makes a distinction between two levels of step 15: evaluation and synthesis prior to packaging of the information. The levels make a distinction between a basic level of consideration that takes into account attributes such as currency and authority of information and a deeper level that assumes profound subject knowledge and extensive interaction with the substance of information as it is being articulated in new form. Each of the processes depicted in the life cycle is of concern to the process of information management. Figure 3.8 depicts another approach, called by Choo (1998a) the process model. In this model the process is seen as a life cycle in an organisational context. Information is created by the organisation’s actions, shown here as adaptive behaviour. These actions interact with the internal and external environment generating new information. Members of the organisation express the information needs in order to make sense of their environment, and acquisition is actuated by the needs.
52 PART A Overview
Figure 3.7: Information transfer life cycle (from Browne 1985, p.100)
The information, or descriptions of it, may be organised into storage of various types before the recipient actually receives it. The organising and retrieval processes, which must maintain the Þdelity of the original information (low noise), are a key role of information management.
Figure 3.8: Information management cycle (from Choo, 1998a)
Provision of a model for information management within an organisational framework has also occupied McPherson (1995). He has explored the concept of information mastery within organisations. This is seen to encompass information management with human networking and thought, reinforced by the effectiveness with which information technology is used to reinforce information management and cognition. His system schematic endeavours to allow for appropriate information storage and acquisition that matches the delivery of information to cognitive need. These models have in common the desire to express the various stages at which information must be considered if it is to be managed effectively. They may also be contrasted later in Chapter 5.3 with the publication cycle particularly as it is envisaged for scholarly publication.
CHAPTER 3 53 Information science
3.5
Measuring information ‘The union of the mathematician with the poet, fervor with measure, passion with correctness, this surely is the ideal.’
W. James Collected essays and reviews, 1920, Ch. 11: Clifford’s lectures and essays If there is to be empiricism in information science, we expect to adopt a variety of measurement techniques. The fundamentals of measurement are independent of application. When there is application to information studies, the difÞculties of limiting variables and the ambiguities of results often lead to unresolved debates. Experimental work should suggest repeatable data analysis and derive underlying principles, about which the science is conÞdent. Some of the better-known principles are outlined here.
3.5.1
Measuring syntactic information
Hartley had shown in the 1920s that in order to transmit a quantity of data telegraphically a product of bandwidth by time is required. He arrived at a deÞnition of information as the selection of signs or words in succession from a list. If a message contained N signs from a sign system that contains S possible signs, then SN possibilities existed, and the quantity of information could be deÞned as: h = NlogS Shannon built on Hartley’s work when developing his communication theory. He worked on the basis that ‘information’ was a selection of signs. His model shows that for the level of information that he is describing the information carried by a particular message is inversely proportional to the probability of the message being produced. Therefore, the less likely a message is to occur, the more information that it carries. I = -log (pi)
pi is the probability of message i occurring
Why are logarithms used here, and why to the base 2? The use of logarithmic function permits the information obtained by combining messages to be additive. If two dice are thrown, the amount of information obtained by rolling the two dice together will be the same as that obtained from rolling the two sequentially, that is 5.16 bits. In the sense that information is treated here, its amount can be determined as a function of the probability of each part of the message occurring (Losee 1990, p.5). Any message from a coding system may be interpreted as a set of yes or no decisions represented in binary code. For example, if the coding system were to be the Baudot code,9 which has thirty-two possibilities, one could ask of any particular character represented by the code, ‘is it in the Þrst sixteen of these characters? Yes or no? If yes, then is it in the Þrst eight of these sixteen’, and so on until one were to establish its position with a unique code. If the yes or no answers are represented by 1 or 0, then it may be 11010 that represents the character’s position uniquely. For this code set of thirty-two signs, one can with Þve choices establish any one of them unequivocally. One can say that if each of the signs were equally probable, then any one of them would contain the amount -log2(pi) = 5 bits of information (as operationally deÞned by Shannon). Similarly if the character set were simply the English alphabet with its twenty-six characters, any one character would contain -log2(1/26) ≈ 4.7 bits, assuming characters were equally probable in the language used for communication, or about twenty-three bits per word assuming an average word length of Þve characters. Of course this is not the case. Some words are much more common than others and both words 9
A 5-bit code for the character set used for example on telegraphic equipment; it was devised by JME Baudot (18451903), the French engineer who constructed the Þrst successful teleprinter.
54 PART A Overview
and characters within words are conditionally dependent upon associated terms, for example ‘q’ rarely appears without ‘u’. Information theory was the name that Shannon used for his communication model. The measure that he derived is for the syntactic level of information only - that is data. This is an approach that measures the amount of information by the statistical properties of signals transmitted. He was concerned with averages rather than information from single signs. The average ‘information’ or uncertainty in a system is referred to as its entropy.10 If a set of signals may be represented by X = (x1,x2.....xn ), and if: pi = p(xi) is the a priori probability that each of them will be transmitted, then for signal xi , H(xi) = -log(pi) = log (1/pi) where H represents entropy which may be measured in bits, and the signals are assumed statistically independent. For the set of n signals: H(X) = -∑pilog(pi) = ∑pilog(1/pi)
where ∑ represents sum from 1 to n
Shannon’s work was addressing the problem of how much information may be moved across a channel over which communication is occurring. This brings one back to the problem of just what communication is. There are two positions that may be taken here. One is that information only exists in the ‘senses of the beholder’. In other words, if there is not a human receiving a communication, then there is no information to talk about. An alternative is the view espoused by Stonier (1990) that information exits independently of the human mind. None of this has anything to do with understanding the meaning of the messages. Shannon, perhaps unfortunately, described what he was measuring as information, but he was not trying to deÞne information. He was not trying to analyse semantic or pragmatic message levels, merely trying to establish what we might think of as data rates. Shannon’s is the best known of several approaches that have been developed for trying to measure that which is communicated and has some effect upon a receiver. Others have developed the approach further, taking account of factors that extend the meaning of information beyond that used operationally by Shannon. For example, if one is to take the dependence of messages upon each other into account, then a formula must take into account the probability of receiving a message, after having already received another (for example, the probability of receiving a ‘u’ having already received a ‘q’ in English language words). However the measurement that is being described does not account for meaning - the semantic content. Bar-Hillel and Carnap did concern themselves with semantic content of information, but in terms of information contained in a statement based upon the language system used to construct the statement. They did not consider the process of communicating the statement. QuantiÞcation of this process is something that is yet to be understood, although this has not deterred attempts to tackle the problem (Cherry 1978, p. 236; Losee 1990, p. 21; Tague-Sutcliffe 1995, p.68). One therefore has the as yet unresolved difÞculty of measuring the communication process at the semantic and pragmatic levels.11 3.5.2
Informetrics
The analysis of patterns of production and use of information using quantitative mathematical and statistical techniques is one of the branches of information science. As much as any area of information study, it exempliÞes the different disciplines that are attracted to the Þeld. Probably the three seminal studies in the 10
The formula is the negative of the Boltzmann’s measure for thermodynamic entropy.
11
Hayes (1993) has endeavoured to extend the Shannon entropy formula to account for: - Data selection where a factor is introduced to account for signal importance to a user - Data analysis where the data are structured to have meaning (a semantic measure is therefore introduced) - Data reduction.
CHAPTER 3 55 Information science
area are those of Bradford (he was a librarian), Lotka (a statistician) and Zipf (his Þeld was philology). The principles that they developed (and which have since been generalised and extended by others) are often called laws, although that may be a rather strong term scientiÞcally. Although the functions were developed and represented differently, Egghe for example, has shown (Egghe 1985; Egghe & Rousseau 1990, p.292) that they are essentially describing distributions that are alike and are special cases of a hyperbolic distribution. Lotka’s measurement of literary productivity In his original work, Lotka (1926) examined the publication behaviour of a sample of chemists (as represented in the Chemical Abstracts cumulated index available at the time), and physicists (as reßected in Auerbach’s Geschichtstafeln der Physik). He found behaviour that has since been noted in many measurements by others. In his case, he found a regularity that has been found to be generalisable for productivity of authors in learned disciplines. This may be expressed: xαy = β
for y = 1,2,......yT
Here y represents the number of authors producing x papers; ∞ and β are constants; α ≈ 2 depending upon the Þeld of research, but for the case when α = 2 and where T represents the total
number of authors, then the constant β =6T/π2
(approximately 0.6T).
To put it another way, an inverse square law applies: the proportion of chemists producing x papers is roughly proportional to 1/x2 of the number who have authored one paper. Similar distribution has been shown in many other Þelds of scientiÞc endeavour. The table in Figure 3.9 shows one example of this for papers authored in the Þeld of econometrics. If plotted, the shape of the graph is similar, but the constants (which represent slope of curve and axis intersection) differ.
Figure 3.9: Leavens’ ‘econometric papers tabulation’ (Reprinted from Potter, 1981, p.27, Copyright (1981) The Board of Trustees of the University of Illinois.)
56 PART A Overview
Such empirical studies of the Lotka distribution, when plotted show a curve that has the general form shown in Figure 3.10. Theorists in this Þeld are not without a sense of humour. The inßection in the curve which has been shown to apply for larger values of rank being plotted, is known as the ‘Groos droop’ (after the person who originally documented it). Pareto similarities
Figure 3.10: Groos curve
At the end of the nineteenth century, the Italian economist, Pareto investigated income distributions in society. His analysis, obtained by plotting income against the proportion of the population receiving the income, had shown income distribution to follow a curve similar to Lotka’s. He was measuring production, and others have subsequently shown variations from his income distribution, but his Pareto curve, also known as the 80/20 rule, is of interest to us, because it repeatedly approximates situations where a limited number of key factors contribute to an effect, in a greater way than a mass of smaller factors. For example in a business environment, it may be useful to carry out a time analysis of a number of operations to isolate the limited number of most time-consuming processes, or a cost analysis to emphasise the procedures or components of highest cost, or a quality analysis in which case you may: -
Rank items in frequency of breakdown or error
-
Determine the proportion of the total for each breakdown or error type
-
Plot the two against each other
Such a procedure may be carried out fairly readily with a spreadsheet program (using features outlined in Chapter 7.6) so that the information may be presented for straightforward interpretation. For Pareto, y represents the number of employees with an income greater than or equal to x, and a curve plotted to reßect such a distribution looks like Figure 3.11. In the case of income earners, this means that the top 20% of income earners earn 80% of the income. In the case of authors, it means that the top 20% of producers of papers produce 80% of all the papers, or in the case of equipment breakdowns it may mean that 80% of all breakdowns are caused by 20% of the equipment.
Figure 3.11: Pareto curve
Zipf’s measurement Zipf looked at words used in text rather than documents. He arranged them in decreasing order of occurrence but found in effect the same relationship with word occurrence in text that Lotka had found for author occurrence in publications. The number of words occurring x times, equalled the number occurring 1 time, divided by x2. Zipf also found that if words were ranked in order of decreasing frequency of occurrence for a given passage of text, then a word appearing at position x on a ranked list will appear 1/x times as frequently as
CHAPTER 3 57 Information science
the word with highest occurrence. In later work, Zipf (1949) attempted to generalise his Þndings in terms of human behaviour and a principle of least effort. For Zipf: xρy = σ
for y = 1,2,......yT
Here x represents the rank of a word and y represents the number of times that a word is used in a text. The constants here are represented by ∂ and Ó to distinguish them from the constants of Lotka. Mandelbrot (he of fractal fame) arrived at a generalisation of this expression also showing that frequency of occurrence is a function of rank of words: (1 + Ηx)ρ’y = σ’ Egghe has demonstrated that Zipf’s is a special case of Mandelbrot’s formulation. Bradford’s distribution Bradford (1934) studied articles that are published in journals. He was not concerned with authorship. For a given subject Þeld he established that a relationship exists between the number of articles published and the number of journals that publish them. He ranked journal titles in order of productivity, the most productive (containing the most articles) was ranked 1, the second most productive ranked 2, and so on. He then grouped the journals into groups for which the cumulative total of articles in each group was approximately equal. Thus he found that: The top 9 titles contained 429 items
= 9 titles
The next 59 contained 499 articles
≈9x5
(45)
The last 258 contained 404 articles
≈9x5x5
(225)
The 9 can be described as the size of the core, and the 5 as a multiplier. Dividing the groups by the core gives groups approximately in the proportion: ≈ 1 : a : a2
(a being the multiplier)
By plotting the logarithm of rank against the cumulated number of papers up to that rank, a relationship may be demonstrated. It has since been generalised by Leimkuhler as: R(x) = λlog(1 + µx) Here:
x represents rank;
R(x) represents cumulative number of items produced; Constants are λ and µ. Each of the bibliometric distributions of productivity, in general terms, and with a logarithmic axis, follows the general form of the Pareto curve. Bibliometric coupling and co-citation The patterns of documentation in different Þelds of endeavour have been put to use to provide indicators of similarity of research material, or the extent of inßuence of search output. Work in this area has been fostered by the citation indexes of the Institute of ScientiÞc Information. ISI by recording in databases, the bibliographies and lists of references that accompany published literature makes citation analysis possible. In the sense in which the term is used here, citation refers both to the identiÞcation (citing) of an
58 PART A Overview
information source quoted or otherwise drawn on in a body of text, and to the description, in a bibliography accompanying such a body of text, of the information source cited in the text. A typical measure of the inßuence of either an author, or an entire journal, is impact analysis. This is a determination of the ratio of the number of citations received, to the number of citable publications. This can be carried out on a time series basis to observe variations. Although the ISI databases such as Science Citation Index provide the facility for such calculations, they are limited to the material that is held in the databases, which is subject to the editorial policy of the organisation. ‘Coupling’ requires comparison of the references that pairs of publications have in common in order to establish coefÞcients of similarity. It may therefore be used to establish cognate groups of papers by cluster techniques. ‘Co-citation analysis’ works on the basis that references cited in the same publication are themselves related. If a large number of publications are examined, then a measure of their strength of association is how often they co-occur as references in bibliographies. Coupling therefore is derived from similarity expressed by source documents, whereas co-citation is derived from similarity of citation. The in-built association through linking that exists on the Web makes similar citation analysis possible for Web sites – hence Webometrics (Almind & Ingerwesen 1997). For example, the degree of inßuence of sites may be measured by the extent of linkage to them from other sites (though this of course cannot take into account the extent of linkage that people have created through bookmark software in browsers). Co-citation and coupling may also be analysed by analysis of linking associations. Any of the measures derived from either databases or the Web are rough, because of the unavailability of full datasets and the difÞculty of establishing that what is available is representative. Nevertheless, in a world searching for indicators of performance, they are widely used. Information retrieval outcomes The process of measuring information retrieval outcomes, has probably occupied the minds of information scientists more than any other single principle of information management. The central issue has been to design systems that can present to requesters the most effective number of most useful documents that satisfy an information query. The issue is complex because the information query often cannot be expressed cogently and may be unstable, being modiÞed as a consequence of seeing results from an initial search request. A great deal of experimental work has taken place, much of it ßowing from the CranÞeld studies using aeronautical databases (Cleverdon, Mills & Keen 1966). Fundamental to evaluation approaches are versions of a retrieval matrix as a contingency table. In the table of Figure 3.13 the symbols used are: ∩
to represent intersection, or AND logical relationship between item classes
¬
to represent NOT
Therefore, for example D = N¬Ret∩¬Rel means that the number of items N in this cell comprises the total of those NOT retrieved that are NOT relevant
CHAPTER 3 59 Information science
Figure 3.12: Retrieval contingency table
The two most common measures based upon this matrix are: -
The Recall ratio:
Number of relevant items retrieved = Total number of relevant items in database
A A+C
-
The Precision ratio:
Number of relevant items retrieved Total number of items retrieved
=
A A+B
Other ratios that have been considered in experimental work are: -
Fallout
The proportion of unwanted records retrieved
B B+D
-
Omission
The proportion of relevant records not retrieved
C A+C
-
Noise
The complement of precision
B A+B
-
Specificity
Ability of system to reject unwanted records
D B+D
Determining the number of relevant records in a database when circumstances are other than an experimental set of known items is problematical, particularly when databases are large and dynamic. However, sampling procedures may be used to estimate the total number of relevant items. Just what is meant by relevance is also problematical. Relevant material may not be pertinent to an information query or if it is, the pertinence may change during an information-seeking period because of the changing ability of the recipient to make sense of the material. This is examined later in discussion of evaluation of information retrieval in Chapter 18.
3.5.3
User analysis
Informetrics includes analysis of use information, so it can hardly be divorced from analysis of users. Nevertheless, this separation is undertaken here in order to emphasise the more qualitative approaches that seem appropriate. As has been pointed out (Boyce, Meadow & Kraft 1994), separating the characteristics of users is necessary for predicting performance or explaining differences in performance of individuals with information systems.
60 PART A Overview
Whereas informetrics focuses on quantitative use of data, user studies by their nature tend to lend themselves more to qualitative evaluation. Typical user studies may try to determine: -
Ability to express an information need
-
Ability to transfer a concept into the search language of an information retrieval system
-
Cognitive model of an enterprise information system
-
Skill in being able to create surrogate information from large documents
-
Knowledge of information resources pertinent to a subject area
-
Institutional readiness for information system implementation.
It is possible to constrain each of these to quantitative analysis, if only by creating scales of ordinal or ranked data, for statistical analysis from surveys. However, more meaningful discursive analysis may come from case or ethnographic studies. It has been argued that the whole Þeld of information science has suffered an unfortunate divide between empirical approaches to information analysis based upon documents and systems and qualitative behaviourist models that investigate information seeking behaviour. This has led to a great deal of research that investigates the retrieval performance of systems without taking enough account of the motives of the people who are using the systems. User needs analysis is considered in some detail in Chapter 15.
3.5.4
Statistical analysis
There are many applications of information management that make use of statistical analysis techniques in order to support investigation. These include: -
Comparative studies of patterns of use of an information resource that wish to test for signiÞcant difference between identiÞed groups
-
Factor analysis to determine most prominent inßuences on user behaviour
-
Analysis of variance of collection distributions
-
Performance of vendors supplying information resources
-
Error sampling determination for software or data quality
-
Survey evaluation of sources, systems or services
-
Queuing theory that establishes optimum load factors for information service points.
As in other Þelds of endeavour, these are most useful for establishing the degree of conÞdence with which experimental results can be accepted. In information science the populations that are likely to be investigated include records in databases, items in physical collections and many aspects of the behaviour of information users, such as use of search terms, borrowing patterns from repositories, or understanding of information services that are provided for them.
3.6
Further reading
Key periodicals that report information science research are Journal of the American Society for Information Science and Technology (JASIST, formerly JASIS), Information Processing and Management, Journal of Documentation, Information Sciences, and IEEE Transactions on Information Theory. These will sometimes produce an issue concentrating on a topic, for example the January 1996 issue of JASIS focused
CHAPTER 3 61 Information science
on information retrieval research. The Annual Review of Information Science and Technology contains extensive bibliographic reviews of areas of research progress. The multidisciplinary nature of the Þeld means that there is also material carried in communication periodicals such as Journal of Communication and psychology journals such as Cognitive Psychology. Interest groups in the professional societies such as ACM, IFIP and IEEE also publish periodicals and maintain electronic discussion lists of information science research.
Overviews of information science Saracevic (1970) compiled a selection that was inßuential in mapping the territory of information science, including papers on information theory, basic processes such as communication of documented records and behaviour of information users, information analysis and retrieval, evaluation of systems, and a discourse by Goffman on a general theory of communication encompassing information retrieval. Heilprin (1985) brings together a number of papers in similar vein with an emphasis on trying to provide a basic model for information science. A more recent compilation is that of Williams and Carbo (1997). Buckland and Liu (1995) provide a bibliographic overview of works that have examined the history of perceptions of the science. Debons, Horne and Croneweth (1988) attempt to present a framework for the concepts and issues that contribute to a science of information in relation to building information systems. In this context they consider deÞnitions of information, the professions that work with information and the models of information systems with which they work. They also examine the technologies used for implementing the systems. Meadows (1987) brings together a collection of early information science material including seminal papers dealing with growth of documentation, citations and their use, information services and science, and statistical regularities in communication such as those relating to scientiÞc productivity. Pemberton and Prentice (1990) present a number of conference and seminar papers that consider contributions from a variety of disciplines to information science, lending support to the notion of its continuing interdisciplinarity. They also include some works on the relationship between information science and library science. Walker (1992) has collected a set of readings described as dealing with the information environment. These include a number dealing with the nature of information, information science and information society. More recent works include the concise work of Norton (2000) that is designed to stimulate fresh discourse, and the revisit by GrifÞths (2000) that reßects a desire to refocus on the foundations of information science disciplines.
Human communication There are many texts on human communication that have a cognitive orientation and are used, for example, in journalism schools to introduce concepts of behavioural inßuence on communication and mass communication. Typical of these are DeVito (2000) and Fiske (1990). These are mentioned because of introductory sections based upon Shannon’s concepts. Berlo’s (1960) work, though much older, provides examples of semantic noise that may be utilised within the communication model. Dervin and others have compiled a wide-ranging pluralistic look at communication models that emphasises the human aspects of communication (Dervin et al. 1989). Cherry (1978) carries out an extensive analysis of communication theory in order to show the elements of a science of communication.
62 PART A Overview
The nature of information Belkin (1978) extensively surveys the literature up to that time, and comprehensively reviews the range of concepts embodied in the term information. Machlup and MansÞeld (1983) present a collection of commissioned discussion papers dealing with the study of information from a variety of interdisciplinary viewpoints including cognitive science, computer science, library science and cybernetics, together with editorial commentary. They include a lengthy discussion by Machlup of the semantics of information and knowledge and science. McGarry (1993) considers the meaning of information at length, and in a historical context looks at the way it has been transmitted, stored and retrieved by humanity. Ritchie (1991) also looks at information and its characteristics in some detail, with reference to Shannon’s communication theory. Hayes (1993) considers the relationship between data and information, and using Shannon’s data transfer measure as a starting point, proposes data selection, data structuring and data reduction. Stonier (1990) describes his Þrst step in the development of a general theory of information in which he promotes a deÞnition of information having existence independent of the human mind in the same way that matter and energy have. He describes information in terms of being a measure of the extent of organisation of a system. By way of contrast, Dervin has written extensively from the point of view of information needs, seeing information as a stimulus that alters our cognitive structures (Dervin 1992; Dervin & Niland 1986). Buckland (1991) gives a detailed exposition on information and information systems in which documents are interpreted broadly as evidentiary objects. He pursues three meanings of information: informationas-process, information-as-knowledge and information-as-thing. Menou (1995) reviews concepts of information, and proposes a research agenda for its deÞnition and measurement. Roszak (1994) questions the data-knowledge continuum, approaches knowledge as something that makes information possible, and disputes the importance given to information in the information society at the expense of the self-originating idea. Tague-Sutcliffe (1995) looks in detail at interpretations of information as a prelude to demonstrating a technique for measuring ‘informativeness’ in information retrieval, and applying this to evaluation of information services.
Information science technique Boyce, Meadow and Kraft (1994) explain the importance of measurement in information science and go on to explain statistical, bibliometric and information retrieval system and service techniques. The work by Egghe and Rousseau (1990) is in similar vein for statistical techniques, but providing more detail on operations research and on informetrics. Losee (1990) also looks at measurement, coding and organisation of information, and then goes on to consider perception, decision making and language. He also looks at information retrieval principles, and information production and use. Flynn (1987) also uses an empirical approach, but within a framework of analysing different stages of data use from collection through manipulation to presentation. Vickery and Vickery (1992) in the revised edition of their book indicate that it presents and discusses a scientiÞc understanding of the processes of information transfer. They give examples of models of information transfer, systems for achieving this, how the systems are evaluated, and the social context of information. Bose (1993) has rewritten an earlier work that approaches information science from a classiÞcation and documentation perspective, to include sections on the systems approach and information technology.
CHAPTER 3 63 Information science
Wilson (1999) summarises the models that have been applied in information behaviour research. Further reading in this area is found at the end of Chapter 15.
Informetrics The journals referred to above regularly carry research material in this area. Saracevic (1970) contains a number of items that review the development of bibliometrics, as does Library Trends vol. 30, no. 1 of 1981. More recent overviews are those of Hertzel (1987) on bibliometrics, and the wider overview of Egghe and Rousseau (1990) that in addition to looking at citation analysis and bibliographic coupling, also has sections on statistics and operations research for application in information management environments.
CHAPTER 4
4
Information and organisations ............................................................... The introductory chapter included some everyday examples of information management, and how these applied to knowledge sharing within an enterprise. Some of the examples may also be applied to managing information on behalf of the organisation for the beneÞt of clients, either individual or corporate. This chapter focuses on information in the context of an organisation. Organisations continually have to assimilate a great deal of information that they generate themselves, and weigh this against the information from outside sources so that the two are incorporated for planning and decision-making. Both organisational bearing on internal information processes and external information - and how the latter is brought into the organisation – are therefore considered. Because the process of environmental scanning to monitor external information is discussed, reference is made to the characteristics of external information sources. Many of the associated information management processes are said to contribute to corporate memory. Personal memory is a different matter. With some exceptions, individuals have little need to apply a great deal of intellectual endeavour to their personal information management needs. Information management is simple for an individual, but complicated for an enterprise. We have little difÞculty organising our own record collection by category of music, or Þling our educational and employment records in an idiosyncratic order in a Þling cabinet, or even building a database of recipes - even if it is a bit of a nuisance when the keyboard gets clogged with ßour - because we are suiting our own information needs. The difÞculties with information increase markedly when it must be managed corporately. When used by an enterprise, it is desirable to have the information organised and interpreted and decided upon in a consistent manner by all users within that enterprise. A catering company with databases supporting its operations would presumably avoid the problem of ßour in the keyboard, by providing alternative avenues for its chefs to refer to their recipes. However, it may have to resolve information inconsistencies between the purchasing ofÞcer who thinks of butter primarily as a dairy product to be ordered and the executive chef who thinks of butter primarily as a sauce ingredient. The two must both be able to create and Þnd the shared information about butter in a manner consistent with its use for their respective purposes. If one has recorded information about butter purchases in terms of packages or truckloads, it will not serve the purpose of the other who may need to see what is on hand in terms of kilos. Similarly, the manager in an enterprise who is responsible for planning buildings may need to Þnd information that was created under the direction of the human resources manager responsible for personnel
CHAPTER 4 65 Information and organisations
records creation. The personnel database may record staff salaries and leave entitlements, but can it provide travel distances to employment that may be required by the building planner? The problem is compounded when internal information must be presented externally to the general public, or external information must be integrated with internal information to assist decision-making. Even when the information is well organised for presentation, there are marked variations in human approaches to seeking the information to fulÞl information needs such as improving knowledge, or assisting decision making. The ready availability of the required information is a key factor. The time and trouble to obtain additional information is quickly foregone if the anticipated beneÞt is perceived as being minimal. Information managers must create systems that are ßexible and viable for large and changing groups of people with different information behaviours. This may mean the provision of alternative approaches to the same database, such as menu-based or command-driven. Overlaid on these information problems is the issue of knowledge transfer. The chefs communicated some of their knowledge as information by recording their recipes. However, beyond the recipes is the working experience that enables them to time the feeding of large numbers, or to substitute ingredients that may be necessary to approximate the same meals. This type of knowledge is often passed on by mentoring or apprenticeship processes, or simply through facilitation of formal and informal meetings. The environment within which the people in the enterprise are encouraged to share knowledge must complement the technologies that assist information transfer. There are many factors that inßuence decision making by individuals within organisations. One approach to examining these is to attempt to understand what it is that has formed the culture of the organisation and its organisational intelligence.
4.1
Organisational intelligence
If the dimensions of a corporate context and human interpretation are added to information, it may be regarded as intelligence that will considerably inßuence the way knowledge is formed for application of skills and for the technical and political decision making processes. If enterprises are to be effectively managed, then the management of an enterprise must be able to extract from a plethora of sources the information that may be turned into knowledge and used effectively. What factors make information salient as far as management is concerned? One would expect the relevance of the information to be important, but the context in which information is communicated will also form organisational intelligence in an uncertain environment. One analysis of organisational communication proposes a framework of contingencies that affect the way information is turned into intelligence (Goldhaber et al. 1984). Figure 4.1 summarises these contingencies. Appreciation of these contingencies by the managers within an enterprise should assist them to remove barriers to organisational effectiveness by improving information ßow. One of these factors can be taken as an example and elaborated upon. The structural contingency in organisations varies widely. It has been characterised by Horton (1985) as follows: -
Job shop
rigid and unchanging information ßow; record keeping follows established conventions; Þxed information tasks; ßow from workermanagement (job sheets, time cards, machine fault notices...) as important as management-worker (job assignments, quality checks...)
66 PART A Overview
-
Batch flow
smaller upward and downward information ßows than for job shops; job sheets and assignments may be combined; main information ßow better deÞned; orientation towards internal information ßows (time cards, job sheets, materials usage, overhead costs)
-
Worker-paced
service industries; tailored to customer needs; minimal information for routine predictable tasks
-
Machine-paced
assembly processes; complex information ßows from top down; communication between workers, managers, senior management and outside suppliers; telecommunications use; great attention to planning use and control of information
-
Continuous flow
information ßow correspondence with process ßow; information technology extensively used for monitoring (pressure, temperature...); structured information sources and services
-
Mixed
combination processes such as cosmetic production accompanied by packaging, that have different information needs. Internal contingencies
External contingencies
1. Structural The way in which the working environment functions, for example batches, assembly line etc
1. Economic The extent to which the market is stable, or that competitive forces may have an impact (cf Chapter 20.2)
2. Outputs
2. Technologies The extent of scientific and engineering innovation and research and development influence
The range of products and services and the extent to which quality varies 3. Demographic The extent to which employees personal characteristics (education, gender, job sharing, contracted …) varies
3. Legal The regulatory framework imposed from different all levels of jurisdiction
4. Spatial/Temporal The importance of placement – where people placed physically in relationship to each other, to assist serendipitous meetings for example; also the effect of delay in dissemination
4. Social/Political/Cultural For example, consequences of religious or racial differences
5. Traditional The organization continues to follow convention relying upon historical precedent
5. Environmental Factors resulting from climate, energy availability, geography, population density
Figure 4.1: Communication contingencies based upon contingencies identified by Goldhaber et al. (1984, p.42)
An enterprise’s external and internal communication contingencies have direct bearing upon its organisational intelligence. The rate of contingency change along with its communication system effectiveness has direct bearing on its intelligence needs. A range of organisational conditions is tabulated in Figure 4.2, based upon the contingencies of Goldhaber et al. (1984) in order to indicate the extent of need for intelligence.
CHAPTER 4 67 Information and organisations
Communication contingency rates of change Organisational condition
Intelligence needs
High communication system effectiveness
Proactive coping ‘Marriage’
Low
Low Communication System effectiveness
Reactive stress ‘explosion’
High
High communication system effectiveness
Proactive relaxation ‘Honeymoon’
Low
Low communication system effectiveness
Reactive hibernation ‘Time bomb’
Moderate
High, coupled with:
Low, coupled with:
Figure 4.2: Organisational intelligence needs based upon contingencies identified by Goldhaber et al. 1984, p.43.
These may be summarised as follows: -
An organisation that is in a proactive relaxation or ‘honeymoon’ condition has a stable environment and an effective communication system. The organisation is probably a small one with a single product line, perhaps a catering or spare parts operation with focused information demands. It is ready to cope with changes in its environment because of its effective communication system. It is proactive because of its state of readiness, and relaxed because of the stable environment. It requires little organisational intelligence under current conditions.
-
An organisation that is in a proactive coping condition (‘marriage’, as they call it!) has conditions inside and outside the organisation that are dynamic and are felt directly by the organisation. The organisation has an effective communication system and can deal with the changes. Its needs are being met. This could apply to Þnancial services organisations, providing that they have wellfunctioning internal communication systems.
-
Where the environment is relatively stable and the communication system is ineffective, the term reactive hibernation is used. The organisation may think that it can cope, but problems may be looming. Typical of this organisation is the presence of institutional power - people controlling others by title, tenure or role. When environment changes, organisation will resort to crisis management. This type of situation may arise in public sector bureaucracies.
-
When the environment is highly unstable, a reactive stress condition applies. Contingencies affecting the organisation are changing at a rapid rate. The organisation is unable to cope with environmental uncertainty because of a weak communication system. This condition may result in the disintegration of the organisation if it is unchecked. It can result if a key manager with sole knowledge of the organisation’s external contacts leaves the organisation.
This terminology may be a little quaint for one’s liking, but it does help to characterise the way that an enterprise should examine its information needs. Such a diagnosis of a Þrm’s condition will help to establish knowledge requirements, but is of little use, unless conditions are then established to enable the use of internal and external intelligence and knowledge sharing to occur. These conditions include: -
Understanding that access to information alone is not knowledge sharing
-
Knowledge accretion is a social construct requiring collaboration between staff
68 PART A Overview
-
There should be communities of practice that are trusting and are able to share terminology, rather than hoarding personal expertise
-
Learning is seen as an activity that generates the knowledge resource.
4.2
Organisational decision-making
Can an organisation be intelligent? The question of the learning organisation is explored in Chapter 20. For now, the frameworks in which decision-making takes place are examined. There have been numerous approaches to examining the strategy for arriving at decisions. There have also been attempts (Lord & Maher 1991; Choo 1998b) to contrast and compare a variety of these. If one were to characterise the types of strategies identiÞed, one would arrive at approaches such as: •
Rational This assumes people seek an optimum outcome in a decision process and seek and thoroughly process all relevant information. Rational decision makers require information about many alternatives and have an extensive memory and capacity for processing information. This framework derives from scientiÞc and economic models of decision-making. Such models may omit assumptions about limitations of human capacity. A speciÞc case of the rational approach is the ‘bounded rationality’ approach developed by Herbert Simon. Here we assume non-routine decisions to be made within a framework of what is actually happening, rather than what someone thinks should happen. The issues are generally so complex that only a limited number of aspects may be considered at one time and the Þrst satisfactory alternative is selected. This limited capacity framework has been termed satisÞcing (a good enough solution is chosen, rather than best alternative – ‘she’ll be right, mate!’). A rule of thumb approach to problem solving is embraced.
•
Expert This approach assumes that experts have particular ways of thinking that set them apart from novices. These may be in the form of different cognitive structures developed from years of experience in a specialised area of research or practice, that enable simpliÞed and shortcut approaches to decision making, and quickly get to a point, whereas the novice may take a more circuitous path and a lot of energy. This approach assumes highly speciÞc information within a bounded domain.
•
Cybernetic This approach is a more dynamic model, acknowledging the effect of feedback on cognitive processes and taking into account learning from experience. The information requirement is selective access to current information, along with recall and evaluation of outcomes of past actions.
•
Political This approach assumes that there is relatively high technical certainty, but signiÞcant goal ambiguity about preferred decisions. The information requirement involves understanding of stakeholders’ positions, relative inßuence, and what underlies the stand that they are taking. Decision-making may be based upon bargaining advantages, such as a particular department’s ability to inßuence other matters like stafÞng or technology implementation, outside the framework of the decision at hand.
CHAPTER 4 69 Information and organisations
Chapter 20.3 looks at some alternative political frameworks applied speciÞcally to information management. •
Process This is an approach that focuses on stages of choice, where a series of phases structures the decisions. These phases may be identiÞcation, development and selection, each of which may contain components, for example search and design in the case of development. The search component may be pursued in a variety of ways, such as making use of the formal databases of the corporate memory or actively employing environmental scanning, which itself may be adopted in a number of ways (see Chapter 4.4 below).
•
Anarchic This is the situation in which no one approach may be identiÞed. Decisions are arrived at with good intentions through a loose collection of ideas, from a varying group of participants (as may happen in public and educational institutions), and an unclear understanding of IT capacity for process support.
The preceding typology is useful for emphasising differences in decision-making approaches. However, in reality, a disorderly mixture of inßuences applies to information used for decisions, including: •
Power Comprehension of information is inßuenced by how much of it is available. Individuals in organisations may ration or withhold information in order to assert their decision-making role in hierarchies.
•
Relevance Information available may be at inappropriate levels of speciÞcity, condensation or summary, and is therefore ignored.
•
Justification Decision makers for a variety of political and timeliness reasons may make decisions without reference to available information, but subsequently use the information to justify decisions.
•
Significance An organisation with an environmental scanning program such as that described below in Section 4.4 may not appreciate or give appropriate store to material that has been collected about general external factors.
•
Politics Information may be presented to decision makers in order to advance vested interests. If conßicts of interest exist within an organisation, a decision maker may have to interpret information in the light of the tendentiousness of its reporting.
•
Overload Too much available information may lead to assimilation of none.
70 PART A Overview
•
Ritual Being seen to make a decision in the right way may be more important than the decision’s quality. It has been pointed out (Dunford 1992, p. 281) that the decision-making process, as opposed to the outcome, is important as a symbol of rationality of an organisation.
4.2.1
Decision-making framework
What are the requirements of an information system that supports high-level decision-making? Among various analyses that have taken place, the ones itemised here are among those that an information manager would need to be cognisant of, if providing support services: -
Provide information throughout the decision process from problem recognition to solution implementation
-
Simultaneously provide information on the range of decision tasks in the decision process
-
Provide information that acknowledges the subtle changes that occur over time in the deÞnition and magnitude of the problem
-
Review information already presented to extend it as necessary, should decision makers return subsequently to a task already commenced
-
Assist the incremental process of building a solution by providing answers to the very speciÞc and narrowly focused questions posed by decision makers
-
Adapt to the shifting priorities of decision makers in relation to problems to be solved, and the pattern of periods of intense information activity alternating with relatively little activity
-
Initiate supply of information and not rely on the group to request all relevant information
-
Estimate how long decision makers are willing or can afford to wait for information that they perceive they should have, and establish provision priorities
-
Explore the corporate memory contained in people’s memories and the Þles of the organisation for potential solutions
-
Monitor the wider environment for information on events and trends having an impact on the problems of the organisation
-
Identify outside interests and stakeholders and facilitate the ßow of information between these and the decision makers
-
Give decision makers conÞdence that they are receiving enough information for their task without inducing information overload.
Browne (1993, p. 224), who has investigated this area, also investigated information outputs required. Her analysis was of a higher education environment, but many of the outputs that she identiÞes are generic in applicability. Some of the outputs she identiÞes have been included in the following list: -
Explanation of the genesis and background of the problems to be solved
-
IdentiÞcation of stakeholders inside and outside the organisation
-
Acknowledgement of different perspectives and interpretations of the nature of the problem
-
Explanation of any earlier attempts made to solve the problem, or a similar one
-
Analysis of existing policy and procedure that may provide a framework for solving the problem
CHAPTER 4 71 Information and organisations
-
IdentiÞcation of possible solutions and options
-
Clear distinction of the difference between solutions, the quality of the solutions and the criteria that may be used to evaluate them
-
Presentation of information at different levels and inclusion of summaries as well as detailed information to allow different users to access the content at appropriate levels
-
Provision of abstract information primarily in forms such as general principles, summaries and numerical data, rather than information based solely on individual viewpoint
-
Aggregation of information provided over time to enable return for re-use
-
Structuring to allow dissemination to stakeholders of parts of the information available
-
Provision of technical assessments of the options using expertise from outside and inside the organisation
-
Provision of background on the political environment and attitudes to different solutions as well as potential blocks to the implementation
-
Summarisation of the information activities of the group as a basis for developing the awareness of stakeholders outside the group with regard to what information has been considered in the decision process.
These examples show a variety of the features of information support for strategic planning and decisionmaking at high levels. They give a general picture of information characteristics for decision-making. However, for any particular environment, an information use analysis will beneÞt the provision of information support speciÞc to that environment, Elsewhere, some speciÞc information use examples are considered, including information seeking of managers at different levels in an enterprise and the information needs of executives (Chapter 15.2).
4.3
Information responsibilities in an enterprise
The responsibility for provision of the range of information for organisational decision-making is very diffuse. Even in organisations that have attempted to establish the responsibility under a person with a title like chief information ofÞcer, there have been difÞculties, often because of the variety of structures and vested interests extant in organisations, but also because of a focus on IT rather than information processes. With the increasing proportion of knowledge work and information management within many jobs that have a different primary focus, the need to establish information management responsibilities becomes more pressing. The scope of the information that is to be managed within an enterprise may be deÞned in the following terms: •
Internal information that is either: -
Highly structured such as that coming from data in numerical databases or being used for transaction processing
-
Loosely structured such as identiÞcation of knowledge sources and expertise
-
Minimally structured such as information carried in documents like reports, and memoranda.
72 PART A Overview
•
External information that is either: -
Highly structured such as that held in statistical databases or geographic information systems
-
Minimally structured such as that carried in print publications, news media and Þlm.
The distinctions between these categories are blurring, as ofÞce automation and publishing processes make documentary information more structured in computer form. At the same time, databases formerly conÞned to structured records now accommodate more data that are less structured in textual form. Nevertheless it is worthwhile to examine the distinction, because the four areas have tended to be the domains of different parts of an enterprise, whereas information management sees them all under one umbrella. Sprague and McNurlin (1993) examined the association of type of information with domain of responsibility in an enterprise. We have derived Figure 4.3 from their work to illustrate that corporate authority for dealing with information sources, systems and services may be widely dispersed. This may lead to problems in effective utilisation of these resources if there are technological solutions that make possible their integration and enhanced use. INFORMATION
INTERNAL RESPONSIBILITY
INFORMATION SOURCES
SOFTWARE SUPPORT
INTERNAL Highly structured
Information systems department
Transaction processes Organisational units
Process control Database Management Systems Management Information Systems
INTERNAL Less structured
Records management Archives Document management Word processing Files control Knowledge management
Corporate documents: • policy statements • memoranda • mail • printed forms Lessons leaned files Expertise collections
Word processing Document management Office automation Text retrieval Data mining Micrographics Optical digital storage Reprographics
EXTERNAL Highly structured
Business analysis Statistics unit
Public databases Internet
Online numerical databases CD databases Public networks Time-sharing services
EXTERNAL Less Structured
Library Business intelligence unit Strategic planning support
News services Films Printed publications Internet
Automated library systems Online catalogues Environmental scanning Current awareness services Monitoring services Videotex systems Push technology
Figure 4.3: Enterprise responsibility for information.
When analysis of this type is combined with identiÞcation of who is responsible for management and transfer of knowledge, we are taken in the direction of the rather higher-minded idea of the ‘intelligent organisation’. Such an organisation needs to be able to combine the professional rule-based and practical knowledge that the workers in an organisation have, that makes it possible to optimise the efÞciency of
CHAPTER 4 73 Information and organisations
operations, with the ongoing environmental knowledge that the managers of an organisation use to align its mission and objectives with its capabilities. This implies ongoing organisational learning based upon effective information gathering processes and a framework within which information may be used to create and apply knowledge from the information sources used. It also implies that effective information retrieval processes are available for reference to the ‘corporate memory’ through facilities such as historical database analysis and records management and archiving systems.
4.4
External information scanning ‘...le hasard ne favorise que les esprits prepares’ (chance favours the prepared mind)
L. Pasteur Address given on inauguration of the Faculty of Science, University of Lille, 1854. Success in business competition is often said to derive from good management of an enterprise’s information resources. Part of this management is the matter of being well informed about ‘the opposition’. This information is often known as business intelligence or competitor intelligence and the process of compiling it is a justiÞable concern of management.1 When formalised into a corporate intelligence gathering system, collection of competitor intelligence can be regarded as part of an environmental scanning program, which considers the outside environment as something broader than simply competitors. Environmental scanning is the process by which information about events and relationships in an enterprise’s outside environment is scanned for the purpose of assisting senior management in its task of planning an organisation’s future course of action. It requires: -
Gathering of information about an organisation’s external environment
-
Analysis and interpretation of this information in the context of an organisation’s business plan
-
Use of analysed intelligence in the organisation’s decision making.
Figure 4.4 illustrates the environment that the scanning process endeavours to cover. The types of general environment information that may prove useful in setting an organisation’s direction include:
1
-
Societal
information such as demographics relating to population movements, life expectancies, consumer activism, environmental awareness and leisure utilisation
-
Technological
information relating to new products, technology transfer from research to marketplace, automation applications and effects on productivity, research and development programs of government, universities and scientiÞc organisations
-
Economic
information such income distribution and disposable incomes, employment levels, inßation, interest rates and other Þnancial indicators
-
Political
information relating to potential changes of government, and regulatory framework for such matters as trade, employment and Þnancial services.
Business intelligence and industrial espionage are different matters. The latter refers to covert information gathering and is outside the scope of this text. Here, business intelligence is seen as part of environmental scanning and having recourse to public information only.
74 PART A Overview
General Environment
Operating Environment
Internal Environment (External information)
Competitor intelligence
Environmental scanning
Figure 4.4: Environmental scanning
The Operating environment concentrates on intelligence about an organisation’s competitors and consists of information about: -
Production
such as anything to do with product range and evaluation, quality control, packaging, delivery, production capacity and breakdown tolerance
-
Organisation
such as ownership, control and management structure, extent of decentralisation, directors, links with other companies, facilities, Þnancing and asset return
-
Marketing
such as the extent of advertising budgets, the placement of product information for target markets, market share, pricing policies and discounts, service policies and performance and customer distribution
-
Personnel
such as the range of human resources employed, their remuneration, the degree of movement in the workforce, the state of manager-labour relations and the decision makers in organisations.
Therefore one might say that POMP covers the more speciÞc environment of competitor intelligence, and STEP the wider environment beyond the immediate concerns of the competition.
CHAPTER 4 75 Information and organisations
4.4.1
Systems for environmental scanning
Since Aguilar (1967) investigated in depth the process of scanning the business environment, many models have been put forward for formalising the process. Some have been expressed in a cyclical manner so that the collection and analysing of information is followed by derivation of intelligence, which is disseminated and leads to modiÞcation through feedback of the requirements for further information. Most see that the framework in which scanning is carried out may take place in different modes. For example, Aguilar’s original suggestions for frameworks were simpliÞed by Fahey and King (1977) into irregular, regular and continuous modes. Figure 4.5 illustrates an extension of the characteristics of these modes. IRREGULAR
REGULAR
CONTINUOUS
STIMULUS
Crisis initiated
Decision and issue oriented
Planning process oriented
OPERATION
Ad hoc
Periodically updated
Structured data collection and processing
SOURCES
Primarily people, some documentary
Documentary & personal
Primarily documentary, some personal
SCOPE
Specific identified matters of interest, primarily POMP
Specific identified matters of interest, POMP and STEP
Environment in general, primarily STEP
INDUCEMENT
Reactive
Proactive
Proactive
INFORMATION COLLECTION
Retrospective
Primarily current
Prospective
DECISION TIME
Current and near-term future
Near-term
Long-term
ORGANISATIONAL IMPLEMENTATION
A variety of different participants
A variety of different participants
Unit dedicated to the process
Figure 4.5: Scanning modes
The frequency and formality with which the process of gaining this information is carried out also depends upon the economic means of an organisation. For example it has been differentiated at three different levels (J. L. Horton 1995) as follows: • Low level This includes swapping gossip with suppliers, customers and vendors and others who cross a market; reading local and national media; and subscribing to and reading key trade journals and newsletters reporting an industry in which a company competes. • Mid-range This includes the low-level approach plus: -
Developing and implementing an integrated organisational information strategy to disseminate business environment information regularly
-
Reviewing information about individuals who are key to organisational survival and success, for example owners, employees and customers
-
Maintaining a brieÞng document on key business issues
-
Automating supplier, distributor and customer contact
-
Maintaining one or more online data services focused on the company’s business environment
76 PART A Overview
-
Using work group information systems to place business environmental data on terminals for employees to consult as needed
-
Providing company-wide email
-
Appointing a person to coordinate and digest data ßows to resource Þles. Horton (1995, p. 112) calls this person an information editor.
• High level This adds the following to the actions already listed: -
A department to analyse and report business environmental information company-wide; this department would tie into company communications lines and would maintain a digest of events classiÞed by key business environment variables
-
Company-wide meetings to update employees on the business environment and its implications for business
-
Key measures for business environmental change and company response
-
Real-time reporting of a company’s business environment to special groups to help them understand company actions
-
Regular surveys, focus groups and panels with key individuals in the business environment who have direct economic power over a company
-
Ongoing investigations of change in the business environment and how the company should prepare for it
-
Retreats for managers in which the state of the company and the business environment is presented.
There are several institutional frameworks possible for carrying out the process. If a speciÞc unit is to be established, it may be within a department equivalent to corporate planning and have a name something like the strategic intelligence unit. This provides the advantage of being close to senior management, but may suffer from lack of contact with other divisions in an organisation. An alternative may be an information analysis centre, physically remote from senior management and possibly suffering politically because of that, but perhaps more neutral and accessible about information gathering from the organisation as a whole. On a smaller scale of operation, management may have to look at employing an outside agency to carry out the procedures. This may present the problem of the agency not fully appreciating or attending to the organisation’s needs. Alternatively the role may be distributed throughout departments in an organisation, or taken to the extent of writing it in as part of individual duty statements. To be effective, such an approach needs considerable coordination. This may push an organisation in the direction of establishing a unit. In all cases, the structure will be set up in an attempt to resolve the problems of reliability and credibility of intelligence being gleaned, evaluation time required to deal with the information and appropriateness of the product for senior management.
4.5
Sources of information
Naming, analysing and discussing speciÞc sources of information are outside the scope of this text. However, this section provides an outline of categories of information sources and some of their characteristics. The speciÞc sources used in a particular environment will depend upon the requirements of the information users. Some examples of guides to sources are given at the end of this Chapter.
CHAPTER 4 77 Information and organisations
4.5.1
People - the specialists
Most people prefer to get information from other people. Particularly if they know the source, it helps to adjust their impression of authority of the information. (If one thinks in terms of the communications model discussed in Chapter 3, it may be that they are trying to reduce the social contribution to the semantic level of noise. It also means that they are getting closer to a knowledge source in the expectation that the information they receive will be more easily assimilated as knowledge.) Within organisations, the different specialists often can make a business intelligence contribution. Typical of the operational units that can provide intelligence are: -
Customer relations
because of their regular contact with the organisation’s clients; personnel in these areas are likely to be the Þrst informed if customers see advantages in competitors’ products or services, or introduce new features
-
Human resources
because of its knowledge of industrial relations, its monitoring of press employment advertisements and consequent knowledge of job relativities, and for its ability to identify employees who have worked for competitors
-
Accounting
because they will be aware of the Þnancial condition of customers who may also be competitors
-
Legal department
for its collection of material on the regulatory environment, and its monitoring of cases that involve enterprises that have the same business concerns
-
Research and development for its knowledge of competitor products, and the need of its staff to follow the contents of technical literature such as scientiÞc periodicals and patents.
Depending upon the organisation, other departments such as sales, purchasing, public relations and property could all have contributions to make. Structured approaches to gathering such information vary from having regular meetings with personnel concerned to asking them to complete reports on potentially useful material, and to use of enhanced conferencing software that permits a degree of information assessment. An example of such software is grapeVine, which is designed to gather, Þlter and classify and consolidate information emanating from multiple personal sources, thereby building a knowledge base from information that originates in the form of issues and opinions.
4.5.2
Documents
A document in this text is used to mean either a print document or its digital equivalents. Most of the material that is mentioned has at least some examples now available in digital form via networks or on compact disk. • Directories Directories are signiÞcant, not least because they include directories of people, and it may be possible to identify someone who can provide the required information, without having to commit to further reading. They may be general ones such as Who’s who in its various guises, or more specialised ones for particular industries or professions. In the business Þeld, there are directories
78 PART A Overview
that provide information about the Þnance of companies, their products and services, and their directors. Typical of these are the print and computer-based Kompass directories for a number of countries. Organisations that create and maintain information about companies in a number of countries and in many print and database forms include Dun & Bradstreet. Many professional associations, including the ones in the information management profession mentioned in Chapter 2, maintain directories of membership, sometimes with research specialties or special interests. Government agencies, at both national and local levels, are a cornucopia of information. It could be said that the principle role of many government employees is the gathering and dissemination of information, so they are excellent sources. If one wishes to identify an appropriate person, where does one start? Coming to terms with the government structure is part of the problem, and government directories usually provide an indication of departmental structures as well the responsibilities of key personnel. Countries that have freedom of information legislation (as described in Chapter 21) may require government agencies to provide public descriptions of themselves and how to approach them for information. Many countries Þnd the vast information gathering resources of the United States are something that may partially complement their own resources. For example United States Government manual gives guidance on the different arms of US government as well as independent and quasi-government agencies. The Congressional directory, as its name implies, is a directory to the US Congress. Many countries have national equivalents of these directories. Telephone directories should not be neglected as a source of information. For information relating to organisations, Yellow pages directories can be very useful. Many such directories are now available in CD-ROM form or are searchable through the Internet. These general organisational listings carried in telecommunications company directories are complemented by organisations which have a formal page structure on the Internet that includes access to their internal telephone directory listings. • Serial publications Serial publications are those that are issued with periodicity and with the intention of indeÞnite continuation. Useful serials for environmental scanning include some directories but also the newspapers of the national and local press, trade magazines, and the annual reports of companies. Some environmental scanning programs consist partly of subscriptions to news clipping services that try to cover the interests of the company. There are also computer-based equivalents of these, sometimes called current awareness services or SDI (Selective Dissemination of Information). The worth of such services, print or manual, will depend upon the extent of coverage of material, the effectiveness of the ‘Þltration’ process for determining what is of interest to the company, and the currency of the service. Computer-based services are increasingly able to offer a greater range of major newspaper and other press coverage, and SDI services have long been able to provide abstracts from specialised research databases. However the needs of a small organisation may be much more localised, and recourse to print may be the most appropriate approach. News services are the most important source of information dealing with current events. The traditional print form of publication, the newspaper, remains very signiÞcant. However, broadcast services have greater immediacy, and online news services draw on a wider range of sources and have great utility in the business community. These services may provide continuous updates to news through press or business news agencies such as Reuters, Information Access Company, AAP, Dun & Bradstreet, and Standard & Poor’s, or they may have consolidated databases such as NEXIS in the USA or TEXTLINE in the UK.
CHAPTER 4 79 Information and organisations
• General print information Although serials, including directories are increasingly appearing in computer form which makes them more amenable to automatic searching, there is a great deal of useful material that appears only in print form. This ranges from books relating to particular industries or companies to prospectuses for companies, to government regulatory information and reports or conference papers. Such is the great range of reference material that it may be necessary to come to terms with what is in a particular Þeld by consulting a guide. There are many guides for the literature of particular areas, but it is possible to step further back in order to identify guides about guides. Prominent guides to materials in general include latest editions of Walford’s guide to reference materials (Library Association, London) and Guide to reference materials (American Library Association, Chicago). The American Library Association also produces a Reference books bulletin. The governments of the world publish a myriad of documents, and there are many print and database directories to this information. A starting point will often be a directory of the respective government’s own publishing service. However, government publishers do not publish all the material of the different branches of government, and it is often necessary to seek out other catalogues and directories. A major source of government information will usually be the relevant statistics bureau and its catalogues. Many government agencies now maintain their own Internet sites. While it may sometimes be possible to obtain an electronic version of a publication directly at the site, it is often the case that details of how to obtain the print version will be provided.
4.5.3
Structured databases
The computer-based equivalents of print material may be regarded as databases, but they are not necessarily structured to permit effective searching and information retrieval. Structured databases have been created in such a manner as to facilitate a wide range of approaches to searching. Chapter 11 looks at search strategy information retrieval procedures, but it is appropriate at this point to consider some of the electronic databases and take account of how they should be approached prior to searching. As with print material, it is important to establish the overall content and scope of a database before deciding upon how it will be used. With print material this often involves scanning of contents pages and indexes. With databases, the ease of using information retrieval software often means that searchers neglect to examine the structure of a database before using it, and consequentially make poor use of the database, even though they pick up some relevant material. The structured database can be thought of as being constructed of a hierarchy of building blocks. The database consists of Þles (or relations) that consist of records (or tuples or segments) that consist of Þelds (or attributes) that consist of characters (or bytes) that consist of bits. It is of importance prior to developing a search strategy to establish what type of database is to be searched and how the Þeld and record levels for the database are deÞned. Databases may be categorised as: • Reference These databases contain pointers to the material that is sought rather than containing the documents themselves. Typical of these are the bibliographic databases such as the many online bibliographies produced by abstracting and indexing services, for example Biological abstracts and INSPEC, as well as online library catalogues that refer to the books rather than contain the books. Another reference example is the referral database that may contain pointers to people such as workers on particular projects, or to goods and services information such as the US Department of Commerce trade opportunities database, or to telephone directories, such as Electronic yellow pages.
80 PART A Overview
• Source These databases contain the actual material that is being sought. They may be: -
Numeric
containing transactional, statistical, time series or properties information
-
Full-text
such as newspapers, law reports or dictionaries
-
Image
containing photographic or representational data
-
Software
containing programming code
-
Electronic services
these are not usually structured databases, but in such forms as email and WWW pages may be regarded as databases by some users
The producers of the databases include government organisations, professional societies, academic institutions, commercial organisations (some of which specialise in database production) and individuals. The quality control varies as widely as it does with print material. As with print material, the extent to which editorial processes and validation of input are imposed has signiÞcant effect on the utility of the databases. Although many databases are available freely for searching over the Internet, those that are regularly updated and have strong editorial policies are usually made available, not directly from the producers, but through commercial database vendors via a common interface over the Internet and on compact disk. There are directories of both the producers and the vendors, for example the Information industry directory and the Gale directory of databases (Gale Research, Detroit) are both wide ranging in scope. There are numerous guides to the Internet many of which describe databases available through interfaces such as the Web, or Wide Area Information Servers (WAIS). One of the earliest vendors to make databases available over networks, and one that makes available a wide range of different types of information is Dialog. Effective searching of such databases is facilitated when searchers are familiar with the content and information structure. Services such as Dialog usually make available a great deal of this type of information through the medium of database guides and associated material in order to assist searchers. The retrieval software itself will regularly incorporate a portion or all of this within its help facilities. In the case of Dialog, Bluesheets that are available in print and CD as well as over the Internet (Figure 4.6) carry this information in succinct form.
Figure 4.6: Dialog Bluesheets - part of introduction http://library.dialog.com/bluesheets/
CHAPTER 4 81 Information and organisations
Figure 4.7: Thomas register online
Figure 4.7a: Thomas register online - extract from searchable fields listed on Dialog Bluesheet http://library.dialog.com/bluesheets/html/bl0295.html
Figure 4.7b: Thomas register online - extract from sample record listed on Dialog Bluesheet http://library.dialog.com/bluesheets/html/bl0535.html
The Bluesheets provide a description of a database and, among other things, its subject coverage, equivalent material in print, sorting and formatting options for retrieved output, both default and additional indexes that are created from available Þelds, usage rates and special features that apply. Figures 4.7 and 4.8 show examples of database search options and output records from the Thomas register online and World translations index databases.
82 PART A Overview
Figure 4.8: World translations index
Figure 4.8a: World translations index - extract from searchable fields listed on Dialog Bluesheet http://library.dialog.com/bluesheets/html/bl0295.html
Figure 4.8b: World translations index - extract from sample record listed on Dialog Bluesheet http://library.dialog.com/bluesheets/html/bl0295.html
CHAPTER 4 83 Information and organisations
4.5.2
Word Wide Web search engines
The databases mentioned in the previous section are available via the Internet, but are different in character from the databases that have been created on the Internet to assist searches of Websites. These databases are either manually created directories to Web sites that may be organised by a system of categorisation such as the Yahoo facility, or created by software, which is variously called ‘spiders’, ‘webcrawlers’ or ‘robots’. A spider is a program that automatically and regularly conducts searches of the countless Web sites and stores their addresses (Uniform Resource Locaters) and index information in databases that may then be searched with search engine software such as Altavista, Google or Searchezee (Figure 4.9).
Figure 4.9: SearchEzee search engine site http://www.searchezee.com
The search engines search databases that cover a vast array of information and have in common the fact that they index records that have been created with some version of HTML (Hypertext Markup Language), but little else. Unlike many of the databases available via services such as SilverPlatter or Dialog, there is no uniformity of record structure, no editorial policy for inclusion of material, no focus for subject content, or quality control of the information reported. The search engines also have not had retrieval facilities developed to the extent of the software that is used for searching the structured databases. Nevertheless, the vast amount of material, available on the Web makes it a valuable information source, as long as the currency of an individual site is balanced against its credibility and authority (see Website evaluation in Chapter 18).
4.5.5
Personalised agents
In the section on serial publications above, reference was made to current awareness or SDI services. The large amount of unÞltered material on the Internet has led to the further development of such software so that it searches material on the Internet using artiÞcial intelligence principles that try to reßect the subject proÞle of individual searchers, based upon the way that they are searching. Such software is sometimes called an intelligent agent, and the general concept, embracing as it does, the idea of reporting automatically to users on new material becoming available, is called ‘push technology’ An example of such software is Agentware i3 Server, technology based on the Dynamic Reasoning Engine (DRE), a platform technology developed by Cambridge Neurodynamics Limited. Personalised Content Push Server, an intelligent information engine, covertly observes a user’s area of interest and is able to
84 PART A Overview
provide recommendations about related material by dynamically invoking an agent that assesses content. For example, if an article is selected for viewing, the agent will adapt to this and automatically create a new selection of articles based upon the content of the current displayed article. The Personalised Content Push Server also responds to explicit queries and retrieves information based on the user’s speciÞc requirements. It has for example been employed in LineOne, the proprietary Internet service of Springboard, which is a joint venture owned by News Corporation and British Telecom. Such software will be used increasingly to support the information gathering processes of individuals in organisations, particularly so that requisite information may be Þltered from the vast amounts that are deliverable.
4.6
Knowledge transfer
The preceding sections of this Chapter have tried to show that information management is concerned with information emanating from both outside and inside organisations. Attention must be paid to the sources of information, and how they are identiÞed and structured. Along with this, the setting in which the information is to be assimilated must be understood so that knowledge may be formed, and decision making effected, from an appropriate combination of people and documentary sources. A learning organisation is characterised by its ability to foster an environment in which its documentary and intellectual resources are combined effectively to facilitate knowledge transfer and further accretion of intellectual capital. This intellectual capital has been described as comprising tacit, explicit and cultural knowledge, though cultural knowledge will be subsumed here under the broader rubric of structural knowledge: • Tacit knowledge This is the combination of learnt, intuitive and experiential knowledge that enables us to carry out actions or make decisions without really having to think about them. It may be expressed in manual skills, or professional judgements, but is difÞcult to verbalise. It is often highlighted by Polanyi’s expression “we know more than we can tell” (Polanyi 1966), although it has been suggested that it is better to restrict tacit knowledge to implicit knowledge that may be individual-automatic or socialcollective and that “we know more than we know we know” (Spender in Choo, Detlor & Turnbull 2000, p. 38). It is shared by being applied rather than by writing it down. It is learnt by participating: doing, imitating or being apprenticed. • Explicit knowledge Knowledge that has been formally articulated and is documented in objects such as drawings, software, manuals, photographs, memoranda, and the like, and may be rule-based such as transaction routines, or standard operating procedures … and is therefore information until interpreted by learners into their own mental framework. • Structural knowledge This comprises the system by which an enterprise’s processes occur; the organisational structure that assigns responsibilities, accountabilities and relationships; the strategy that expresses the goals of the enterprise and the way in which it seeks to achieve them, and its culture (Saint-Onge 1996). Culture may be described as an exchange of meanings. In organisational terms, it is the collective opinions, shared mindsets, values and norms of those within the enterprise. As Boisot (1998) has pointed out, the potential value of knowledge is largely a function of how it is used and in what context. The culture created through institutional structures is itself a knowledge asset. The other
CHAPTER 4 85 Information and organisations
types of knowledge are applied dependently on the culture within which they are embedded. The culture establishes what knowledge is taken for granted, how collaboration occurs, who is taken seriously, and what has bearing upon decisions. Nonaka, who has written extensively on knowledge creation and transfer, refers to the Japanese concept of ‘ba’, which is like a shared knowledge space, unifying physical, virtual, and mental spaces (Nonaka & Konbno 1998). The theory of organisations has developed through a number of orientations over the last century, but always the importance of communication has been central to models. At Þrst, the problem of how to resolve the tension between individual and organisational needs was engendered following the Industrial Revolution, and led to establishment of systematic or ‘scientiÞc’ approaches to work in industrial settings with attention paid to discipline, division of labour, and authority vested in management from which came unity of command. This also led to the realisation of bureaucratic tradition with its formalised rules, procedures and authoritarian structures. Subsequent organisational theories have taken on board more attention to social factors (favouring the primacy of people as social beings), to systems factors (see the Chapter 17 on information systems) or to cultures (trying to go beyond saying that culture is a characteristic of processes, but instead showing that it comprises shared beliefs and values interacting with the structure and system to inßuence behaviour). What is appropriate for a particular enterprise will depend upon its business processes and culture. A cereals production Þrm in agriculture and a Þnancial consultancy Þrm in the services sector both have a need to manage the application of knowledge. The former may carry this out with reference to structured demographic and marketing information, and research applying to cropping, nutrition, and pest management – this in an organisational environment that draws upon a mixed structure of machine-paced and continuous ßows, diverse output and decentralised production and management, and requiring structured meetings, training sessions and apprenticeships for knowledge transfer. The consultancy Þrm may use as primary sources earlier internal documents that have been produced along with the expertise of its professionals. Structure will be worker-paced, and the outputs may have a high degree of uniformity in the sense of always comprising reports and advice. Knowledge transfer may take place via a combination of digital messages and informal meetings, and ‘think tanks’. The environments are very different, but understanding the environments and sources helps to facilitate the learning of the organisation by enabling the identiÞcation and implementation of sources, and creation of a knowledge sharing milieu. Methodology for analysing such situations is described in Chapters 15 and 16. Enterprises that have signiÞcant intellectual capital within their employees and an organisational culture that facilitates its transfer are seen to have added value to their physical and Þnancial resources. Devlin (1999) gives an example of IBM paying $3.5 billion for Lotus when the book value of Lotus was $250 million. It is justiÞed on the basis that they were doing more than buying the information recorded in Þling cabinets and databases – they were buying the knowledge in the minds of the employees and the organisational culture that fostered its development. Companies make efforts to manage knowledge by proactively capturing expertise in what may be termed ‘stores of knowledge’. Despite the fact that considerable effort may be put into its structuring, it still can only turn it into knowledge in the minds of employees after they have assimilated it contextually. All too often, obtaining information is equated with gaining knowledge. It is the vital step of learning that turns information into knowledge. Many examples of this are given by Brown and Duguid (2000). One of these examples is the faster speed of children learning vocabularies in everyday conversation, as opposed to learning from dictionaries. They emphasise the distinction between learning about and learning to do or become. Knowing what, doesn’t mean knowing how. Much of what we might regard as basic communication skills is being rediscovered in enterprises, as it is found that knowledge repositories are only as good as the people who use them. These personnel must
86 PART A Overview
be committed to their use, and be prepared to function within an environment of communication and learning. These communication characteristics have been expressed by Skyrme (1999) as knowledge networking with the following features: -
There is a structural feature of nodes and links
-
Links provide paths for communications, knowledge ßows and developing personal relationships
-
Nodes may be individuals or teams working across boundaries for a common purpose
-
Nodes are the focal points for activity or formal organisational processes
-
The pattern of nodes and links continually changes
-
There is often no discernible boundary to a network
-
Networks interconnect
-
One-to-one and multiple conversations take place, asynchronously or synchronously
-
Knowledge ßows in both deliberate and unanticipated ways.
It sounds a bit like real life doesn’t it! In fact the statements are so open-ended that one would think it goes without saying. The reason it has been found necessary to articulate them is that, as opposed to much organisational communication, this type of networking assumes openness and collaboration across departmental and organisational boundaries, and building multiple relationships for mutual beneÞt. This may be contrary to traditional methods of competitive management. This theme will be discussed in Chapter 20, which looks at learning organisations within the framework of information and planning. If there is to be such a thing as a learning organisation then facilitation of knowledge transfer within the organisation must be fostered by the management culture.
4.7
Further reading
Communication and use of information Goldhaber et al. (1984) investigate factors that affect the way information is communicated within organisations and suggest ways to develop what they call organisation intelligence, which they see as a way in which the management of organisations can effectively assimilate information and turn resulting knowledge into appropriate action. Browne (1993) and J.L. Horton (1995) both look at the inßuence of information on management processes within organisations. Browne’s emphasis is on the factors contributing to management decision making and she frames these within cognitive styles of decision making; Horton’s emphasis is on the way managers communicate, but he considers how information inputs contribute to what it is that they ultimately communicate. Anthony and Govindarajan (1995) do not focus on information processes, but their book contains many case studies of management control that they see as intermediate between strategy formulation and task control. It therefore has relevance for analysis and examples of management communication. The emphasis of Maguire, Kazlauskas and Weir (1994) is on innovation within organisations. Their analysis includes consideration of organisations as information processing systems, and of external and internal information sources.
CHAPTER 4 87 Information and organisations
Liebowitz (2000) analyses different deÞnitions of organisational intelligence, and links them with ways of making effective use of intangible assets. Choo (1998b) describes a learning organisation as a decisionmaking system making sense of knowing creation in an uncertain environment. He explicates several models of interpretation of this. A set of readings has been compiled by Srikantaiah and Koenig (2000) in which they have addressed the four areas: background and issues; the culture of learning and knowledge sharing; tools; and applications in particular environments.
Environmental scanning Aguilar (1967) produced the seminal work on the utility of environmental scanning based upon a doctoral thesis. Subsequent examination of the possibilities of the process was carried out extensively in the management journal literature. Typical of this are works by Fahey and King (1977); Cleland and King (1975); and Ghoshal and Kim (1986). Gilad and Gilad (1988); Fuld (1995); Kahaner (1996); Stanat (1990); and McGonagle and Vella (1999) have produced books that describe corporate approaches to business intelligence gathering, and that suggest procedures for carrying it out and sources that may be used. In each case the sources are speciÞc to North American business, but the same categories of sources are appropriate in other contexts. Lester and Waters (1989) give a British viewpoint on the same process. Hussey and Jenster (1999) provide a European perspective and include case studies. McGonagle and Vella (1996) have tried to synthesise competitive intelligence, strategic intelligence and other forms of information gathering such as reverse engineering into what they call ‘cyber-intelligence’, and Choo (1998a) has looked at environmental scanning in the context of information management in an enterprise.
Information resources Information resource guidance tends to date very quickly. Online Internet sources are no exception, as many are not regularly updated. The better resources are well structured so that they give a feel for the types of documentation as well as actual sources. Books that provide detail on speciÞc information resources with an emphasis on business, tend to concentrate on material for speciÞc geographic regions. For example, an Australian guide is that of Denison and Stewart (1998). Lavin (1992) provides considerable detail on print and electronic sources with a North American emphasis. Haythornthwaite (1990) is in similar vein but drawing on material from Britain and the European community. In each case the texts are of generic value, and may be used as a guideline for the material of other countries. Peete (1999) has produced a disk-based Internet business resource guide. Armstrong has been involved in editing a series of resource guides focussing on different areas of knowledge such as industry (Armstrong 1994), and management (Armstrong 1996) that are wide ranging geographically, and include brief summaries of databases. He also has co-edited a guide to digital sources (Armstrong & Hartley 1997), and a multi-volume searching manual (Armstrong & Large 2000) that brings together writers in specialist areas such as agriculture and patents. The manual is more oriented to resource identiÞcation and structure rather than search strategy. Cousins and Robinson (1992-97) have edited an extensive compendium dealing with business database access. Websites that provide guidance to business information on the Web include brint.com (Brint, 2002a) and globalEdge (Michigan State University, 2002).
88 PART A Overview
Beyond business resources, for most information on the Web, it is best searched directly in order to identify its general information resources. Among the sites that provide some general structured guidance to information resources are Yahoo (2001). Many libraries maintain ‘virtual reference desks’ that provide access to the Internet organised by subject. Typical of these is Explore the Internet (United States Library of Congress 2001a). Of the numerous print guides to the Internet, Krol’s have been valuable, a more recent example being (Connor-Sax & Krol 1999). Hahn has produced many guides, for example his Golden directory also published as Internet yellow pages and his reference guide (Hahn 1996), which is wide ranging, as is that of Morville, Rosenfeld and Janes (1999). Gale Research publishes an ongoing guide and database of Internet databases (Gale guide to Internet databases, 1995-).
PART B Operational information management, concentrates upon the technical operations of information management, and approaches them in the context of information being created, used and organised, retrieved and stored in a staged process. At each step of the process, it is the information ABOUT information, rather than the information itself going through its stages of use, that is of principal concern in the procedures that are examined.
CHAPTER 5
5
Creation of information ............................................................... Feels what they feel, loves what they love, learns to hate what they condemn, Takes his pen in tears and triumph, and he writes it down for them.
Will yer write it down for me? Henry Lawson Information is created in many ways. Information management requires an understanding of the different ways in which information is likely to be documented, and the ways in which the documents are constructed. This understanding improves the subsequent processes of distribution, organisation and retrieval. The more that document structures are made self-organising, the less effort has to be expended upon subsequent description of the documents for retrieval. This Chapter considers examples of information creation by data acquisition and by publication, in each case looking at the considerable inßuence of IT in organising the documents as they are produced.
5.1
Snippets of history
Information is created for communication. It is with the recorded form of communication that information management so often concerns itself. The record has been with us for a long time. It represents history, whether it be as daubs on cave walls, or knotted cord.1 Early systems of writing such as the cuneiform of the Sumerians, which came to be used widely throughout the Mesopotamian region, and the hieroglyph of the Egyptians, were pictographic transition systems between images and symbols. Some claims have been made recently that with the widespread use of icons on screens we may be in a transition stage back to images for recorded communication. Recorded communication started to approach a more familiar form with the development of writing and the use of a variety of materials to carry the writing forms: animal bones in China and clay in Babylonia for inscriptions, papyrus in Egypt because of the availability of reeds around the Nile delta, parchment from
1
Examples of early documents that predate text, include the tjuringa or stones used by Australian aborigines for recording spiritual totems, and the quipu, or cords used in Peru for transaction records using knots in the cords.
CHAPTER 5 91 Creation of Information
animal skins, including vellum from calves. The Chinese were using paper and ink during the Han Dynasty around the third century. When ways were found to reproduce original records with ease, we had to become more conscious of what we now call information management. We had to start thinking about how to organise the records, at the point at which they were created, and subsequently when they were stored for future use. Figure 5.1: Developments in publication
Figure 5.1a: An extract from the Dun Huang print of the Diamond Sutra depicted by Zen Buddhist order of Hsu Yun at http://www.hsuyun.org/Dharma/zbohy/chinese_text/diamond/diamond.jpg
Figure 5.1b: Early Korean movable type from Yi Gyubo’s Donggukisanggukjip, which itself makes reference to earlier movable type http://www.korea.net/koreanculture/kw/culture/heritage/image/11_04l.jpg
Figure 5.1c: Computer typesetting of music from A Music Of Sighs, for solo flute, by Daniel Powers (Swans Wing Press http://www.swanswingpress.com)
Block printing probably started in China as a development from the use of seals used for stamps, and may have had its beginnings in the Sui Dynasty (581 - 617 AD). The earliest printed book in existence, known as the Diamond Sutra is dated at 868 AD. It was block printed onto sheets and pasted together into a roll.
92 PART B Operational Information Management
It has found its way to the British Museum. By the tenth century, block printing had been mastered. Some vast works were completed using woodcuts - the cutting and printing process is known as xylography.2 For example over a thirteen year period during the tenth century, the Buddhist canon the Tripitaka was produced on 130,000 blocks. There is a Korean version of the Tripitaka, the P'Alman Taejangg_ong, of which the approximately 80,000 original blocks, cut back-to-back, still are resident in their entirety at Haein-sa temple in Korea. The invention of movable type was a precursor of the publishing industry that we have today. It is attributed to Pi Sheng between 1041 and 1049 during the Sung Dynasty. Single characters were formed in clay, which was baked for hardness. The characters could be regrouped and reused for printing. Clay type was succeeded by wooden type. Wang Cheng invented wooden movable printing in 1314 and used a revolving wheel as a typesetting device (Huang 1970). It was in Korea that printing from movable type was developed in a signiÞcant way. Bronze typefaces were developed there in the thirteenth century. The earliest extant book known to be printed from movable type is Korean, the 1377 Buddhist treatises of the Buljo Jikji Simche Yojeol. It is now resident in the Bibliothèque Nationale in Paris. The Europeans came to use movable type in 1450, notably through Gutenberg. The roman script of the European languages was much more amenable to rearrangement and reproducibility than were the ideograms of the orient. Printing was a catalyst of the Renaissance. This has led some modern reappraisal of the times to call it a period of communications revolution. It is as well to remember that our cultures have had centuries of familiarity with the book style, and the layouts that have been developed for it. These styles of presentation have inevitably been used as the foundation for computer-based documents. Although computer-based presentation of information often endeavours to make people more comfortable by emulating the book, we can expect that as we all develop more familiarity with the computer, presentation styles will evolve to take more advantage of the medium. For example, two forms of presentation styles that lend themselves to the computer medium are overlapping images that may be cascaded and hypertext for linking parts of compound documents. Figure 5.2 depicts cascading of images that have been invoked using hypertext links.
Figure 5.2: A cascaded compound document with Microsoft Word; screen shot reprinted by permission from Microsoft Corporation 2
The woodcut continues to be used creatively to this day. Artistic printmaking has ßourished at different periods in countries as diverse as Japan and Australia and France.
CHAPTER 5 93 Creation of Information
The recording of information that is created today is carried out in many situations. However there are two general procedures that are regularly a concern of information management, because they involve a considerable amount of information about information. These are the processes of business records control and of publication.
5.2
Business records
We have become used to seeing Þgures that indicate the increasing size of the services sector of the economy at the expense of the primary and manufacturing sectors. A signiÞcant proportion of the services sector comprises clerical and administrative tasks, in contrast to trade and construction tasks. Business records are a primary tool of trade for the clerical and administrative workforce. There are many examples of business records. Some are primarily for internal organisational use, and others will be used both within and outside the originating organisation. Examples include: • Internal Finance applications, inventories, duty statements, employment applications, work schedules, reports, policy documents, and databases. • External Purchase orders, statistical reports, invoices, requisitions, cheques, account statements, correspondence, databases. Many business records are created using information collection forms. It is an imperative of business to maximise the utility and minimise the cost of such forms. Forms may cause businesses all sorts of problems because of the many ways that they may be interpreted. For example, in the US, a Þnancial magazine sent out identical proÞles of a hypothetical family to be completed by Þfty tax preparers. From these tax professionals they received Þfty different answers, varying from each other by as much as several thousand dollars (Wurman 1989). The public sector too must wrestle with the problems of forms. One of the more signiÞcant pieces of legislation concerned with information management in the United States has been the Paperwork Reduction Act. Investigatory material prepared as part of the evolution of the act (United States Commission on Federal Paperwork, 1977) reported such problems as: -
Cumulative processing burdens:
“One company had to comply with Federal requests for 8,800 reports from 18 different agencies each year”.
-
Poor program design:
“The Truckers daily log, required by the Department of Transportation to ensure that drivers do not drive more than 10 hours a day, results in 1.2 billion sheets of paper annually. Unfortunately, the log which a truck driver should Þll out every 15 minutes of each day, whether driving or not, neither identiÞes possible violators nor helps in their prosecution”.
These and many other examples are quoted by Barnett (1996) as he draws attention to the costs associated with business forms. In particular there is a major cost of addressing errors when forms have been incorrectly completed, often because of poor design.
94 PART B Operational Information Management
5.2.1
Forms management
A forms management program should be concerned with monitoring the process of providing information onto forms that are being completed at the behest of other organisations to fulÞl regulatory and legislative requirements. It must also be concerned with forms generated by the enterprise itself, and attend to: -
Forms design and constitution
-
Forms control and audit.
Figure5.3: A business form
CHAPTER 5 95 Creation of Information
A typical business form is shown in Figure 5.3. In this case it is a form used for requesting creation of another form. Elements of the information management framework that apply with most forms are: • Forms design and constitution This comprises -
Title
which should be meaningful and clearly distinguishable from the collection information
-
Code/Reference number
as means of identifying the form for management, edition and reproduction purposes, including version and revision indication, and possibly numbering mnemonics for particular work-groups – bar codes may be used for this purpose and to assist scanning
-
Dates
to indicate the date of form creation and review
-
Circulation
information to indicate where the form must be routed – this is particularly in the case of forms that must have data entered or be acted upon by different departments within an organisation
-
Instructions
and help that clearly and unambiguously indicate how a form should be Þlled in, or additional help information that may be sought if necessary, such as examples of completed forms
-
Presentation features
relating to the design of the form that should endeavour to enhance utility by effective use of: ·
Typography (boldness, size, case and fonts)
·
Boxes and box captioning including drop-out colour that may be used for captioning but is not sensed by scanners
·
White space and margin settings
·
Line spacing and ruling, and shading
• Forms control and audit Forms management programs endeavour to reduce the number of forms in use by eliminating redundant ones. At the same time, best practice for use of the existing ones may be pursued by ensuring that the components referred to above are taken into account during analysis and design of business procedures, and that forms control takes place by: -
Data administration that involves indexing, registration and identiÞcation of print forms, or recording of computer screens within data dictionaries, or more desirably an integration of the two within an information repository (see Chapter 8)
-
An ongoing forms review program, requiring coordination and liaison with departments that use the forms, and procedural analysis of how this takes place
-
Stock control of paper-based forms involving budgeting, requisition and procurement from printers, checking of received forms against speciÞcations, monitoring of inventory, and distribution and storage (thus avoiding being the bank that runs out of loan application forms)
96 PART B Operational Information Management
5.2.2
-
Periodic program reporting on each of the above for performance analysis
-
Utilising appropriate standards and guidelines so that there is a coordinated corporate approach to: ·
Materials matters such as paper size, paperweight, and inks used, for example, paper sizes, if complying to the international standard (ISO 216, 1975), appear in size ratios of 1:√2
·
Use of corporate logos and symbols on documents for corporate identity
·
Implementing forms consistently within the framework of an institutional policy of conventions for components and design.
Computer-based forms
Many forms are used in conjunction with data entry on computer screens. The computer screen design is either an equivalent for a print form that has been used to collect the information, or is the source form itself, being used to capture data during telephone conversation or using direct entry by users into an online system.
Figure 5.4: Part of a screen-based form at http://www.moneyideas.com.au/onlineform.asp
A screen-based form for use via the Internet is shown in Figure 5.4. The constituents that have been itemised for print forms in the earlier paragraphs also apply to screen-based forms, but screen design, particularly when the forms are used for multiple transaction data entry, must also account for: -
Transaction organisation so that editing may be carried out in the most amenable fashion as a result of the grouping of Þelds on the screen. This means that attention must be paid to: ·
Minimising the number of different screen formats for a transaction
·
Juxtaposing Þelds according to the function they perform, or the department that is using them, or the frequency of use
CHAPTER 5 97 Creation of Information
· -
-
-
5.3
Introducing pull-down screens or break points or links to other screens at points that are natural for workßow
Keying procedures so that cursor positioning takes place by default, as much as possible paying attention to: ·
Automatic positioning at the beginning of Þelds, and justiÞcation of contents
·
Provision of delimiters (such as the dash '-' within dates) or units of measurement (kilos, dollars...) so that these do not have to be entered with data
·
Presentation of Þelds in such a way that data entry for them is an obvious requirement, perhaps by use of reverse highlight or underscore
·
Straightforward error correction
Data structure in such a way that Þeld lengths are minimised to reduce keying errors and that data entry is assisted where appropriate by: ·
Labelling of captioning in brief familiar or mnemonic form to explain Þeld content
·
Segmenting long Þelds into smaller subÞelds
·
Aligning Þelds for visual ease
Validation of data so that field content is verified where practicable and so that: ·
Fields are conÞrmed as speciÞc data types when feasible (numeric, date, alphabetic...)
·
Fields are validated within ranges, for example with age data or salary data so that impossible values are not permitted, or users are warned if values are outside speciÞed ranges
·
Field content is checked against a controlled vocabulary such as a dictionary of allowed terms or a table of permitted values held in a data dictionary that controls the Þelds.
Publication Times have changed since a certain author was executed for murdering his publisher. They say that when the author was on the scaffold, he said good-bye to the ministers and to the reporters, and then he saw some publishers sitting in the front row below, and to them he did not say good-bye. He said instead, “I’ll see you again’”.
J.M. Barrie After dinner speech at the Aldine Club, U.S.A., 1896 5.3.1
The concerns of print publication
Prior to the printing of books, publishing could be described as the copying processes that took place in the scriptoria of the monasteries. The reading public was very limited in extent. It remained circumscribed for quite some time, even after the widespread adoption of printing. However, as the reading public grew, so did a role for publishers in looking after production processes for authors, identifying material of potential interest to the readers, and reorienting idiosyncratic material to reach a wider audience - a role much wider than the typesetting and printing of books. For creative writers in particular, this has often been an uneasy association, witness the quote above, or Oscar Wilde’s dismissal of the publisher as “..simply a useful middle man”. That part of publishing which involves book production has evolved from the days of the incunabula, those printed books produced prior to the Þfteenth century, to a point where it can be said to involve a
98 PART B Operational Information Management
formal process of creation, transformation and distribution. This craft has been well honed over the last Þve centuries, and now there is a wealth of experience and diversity in its stages of copy preparation, typography, illustration, presentation, display and packaging. From preparation to presentation may all be regarded as part of the information creation process, and one that has been profoundly affected in recent times by electronic publication. Before looking at some aspects of electronic publication, consider some of the information about information (called here metainformation), which has been part of the publication process prior to electronic publication: • Copy preparation The process of copy preparation, or composition, has been brought much closer to authors by word processing and desktop publishing software. Nevertheless, there remains a need for a created work to be fashioned in the house style of a particular publisher. Characteristics of text that may need to be modiÞed include margins, pagination, headings, subsidiary text, inclusion of illustrative material, contents and indexes. Formal preparation procedures have existed for a long time per medium of proofreaders’ marks. An example is shown in Figure 5.5, although such annotation may now be quite unnecessary if the changes can be made directly on wordprocessed documents.
Figure 5.5: Copy proofing from Style Manual....Australian Government Publications, 3rd ed., 1978, copyright Commonwealth of Australia reproduced by permission
CHAPTER 5 99 Creation of Information
There are many international standards relating to document creation. For example: -
Content organisation and presentation of indexes (International Standards Organization 1996b)
-
Spine titles on books (International Standards Organization 1985b)
-
Presentation of theses (International Standards Organization 1986b)
There is even a standard on how you go about internally numbering a document (International Standards Organization 1978) that suggests an Arabic numbering system with levels indicated by dots - it has been used in this text! Of course, innovative creation of documents may lead to deliberate avoidance of such standards. 1st level
2nd level
3rd level
1 2
2.1
.
2.2
.
2.3
2.3.1
.
.
2.3.2
.
.
.
.
.
.
.
.
2.3.10
.
.
2.3.11
.
2.9
.
.
2.10
.
12
.
.
13
.
.
.
Figure 5.6: Numbering a document (ISO 2145, 1978); reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
• Readability The readability of a page is inßuenced by both the typography and the presentation. The typography (or type face design) when appropriately balanced with other factors such as type size, line length, leading, page pattern (including margins), contrast of type and paper, and suitability to content, can markedly improve the readability (and reduce the noise) of the content. A diverse range of type has been developed over the centuries. Books on the history of publishing often bring together a number of different fonts for comparison (Winckler 1978).
100 PART B Operational Information Management
Figure 5.7: Font examples
• Illustration Creators of books may produce their own illustrations, or work together with illustrators. In either case, it is necessary for the creators to work in conjunction with designers and printers to ensure that reproduction of illustrations is optimal. Book illustration methods developed from the early woodcuts to the use of other production methods such as etching and lithography, and in more recent times by a great variety of photographic processes. The processes in general are often divided into: -
Relief
such as with woodcuts, and modern half-tone blocks, where the image is carried above the surface of the printing plate like type, and impressed onto the paper
-
Recess
also known as intaglio such as with metal engraving, where the image is carried below the surface of the plate
-
Lithography
or collotype where the image is transferred from ßat surfaces.
Image composition is limited only by the creativity of the illustrators and their inspiration. Figure 5.8 illustrates the possibilities.
CHAPTER 5 101 Creation of Information
Figure 5.8: Illustration examples for print documents
5.3.2
Periodical publications
Serial publications occupy a signiÞcant part of the publication sphere. Because there is much metainformation that has been developed speciÞcally for them, they are worthy of being singled out for special mention. People are generally most familiar with them on the stands at a newsagency. They are the primary means of both mass and scholarly communication. The shelves of the world’s libraries would have a lot of shelf space to spare if we didn’t have periodicals. Electronic periodicals have been produced for many years, but it was only during the 1990s that they started to become a signiÞcant component of electronic publishing, as networking and low cost personal computers in homes have made access more widespread. Julius Caesar is often credited with starting the issue of periodicals, although Osborn (1980) notes that annals of a sort were transcribed onto the tombs of Þrst-dynasty Egyptian kings, and that the Chinese TingPao began as a handwritten newspaper during the Han Dynasty (206 BCE - 220 AD). When consul of Rome
102 PART B Operational Information Management
in 60 BCE, Julius instituted the Actus Senatus, a sort of precursor of the Hansard reporting system that we have been used to in more recent times for proceedings of Westminster-style parliaments. His successor, Augustus proscribed publication of these proceedings, though he did permit records to be deposited in the imperial archives and public libraries where they might be inspected following the permission of the city prefect. At around the same time, Julius also initiated the Acta Diurna. These were written on whitewashed boards and posted daily in well-frequented places. These predecessors of newspapers persisted until Constantine made Constantinople capital of the Roman Empire in 330 AD.3 • Printing and the scholarly journal A little over two centuries after printing was introduced to Europe, the Þrst scientiÞc journals appeared - Journal des Sçavans (later Journal des Savants) in Paris and, shortly afterwards in 1665, the Philosophical Transactions of the Royal Society of London, still being issued at the time of writing. The scholarly periodical has grown to occupy an important place in scholarship and particularly in the sciences where it has played a key role in an information cycle being integral to: -
Formal dissemination of information
-
A peer review process by means of which editors and referees, who are specialists in the same subject area, adjudicate upon, suggest modiÞcations, and ultimately accept or reject a paper for publication
-
Assignment of priority - as being the Þrst to have accepted and to document discoveries, processes or models is an important reward for scientiÞc endeavour
Figure 5.9: Scientific publication cycle (with permission P Carey & Green, University of Washington Libraries at htttp://courses.washington.edu/info220a/pubcycle_files/frame.htm) 3
They contained ofÞcial news such as imperial decrees, and social news such as births, deaths and marriages. J. Warrington 1961, Everyman’s classical dictionary, Dent, London, p.6.
CHAPTER 5 103 Creation of Information
-
A secondary and tertiary dissemination process by means of abstracting and indexing services (themselves periodicals), and handbooks, sets of tables and the like, derived subsequently to the primary publication process
-
An archiving process within which periodicals are gathered in libraries for future reference so that future researchers may more readily ‘stand on the shoulders of giants’.
Periodicals have been intrinsic to the process of scientiÞc information reporting, and are usually shown as taking their place within a publication cycle such as in Figure 5.9, where both the journals and the abstracting and indexing services are periodicals, and may be print or digital. The increasing production of the electronic scholarly journal, has led to re-appraisal of what is meant by publication and of the respective roles of e-journals, publishers and digital reprint repositories.
5.4
Electronic publishing
Electronic publishing is a complex and dynamic Þeld as a consequence of the many technological and procedural relationships involved. Standera (1987) examined the relationship between the technologies and processes of the day. We have moved on from the way that these associations were made. However we can still depict, as in Figure 5.10, a position for electronic publishing that records the many inßuences on its development.
Figure 5.10: A position for electronic publishing
This diagram is useful for reminding us how the technological convergence of computing and telecommunications is concomitant with an association of the media and the printing industries, and a relaxation of the distinction between published information and information produced primarily for internal systems.
104 PART B Operational Information Management
Electronic publishing has evolved in a number of areas, some as a transition stage in the production of print products, others as an output from databases with no printed equivalent in mind. For example, publishers of abstracting and indexing services before the advent of interactive computers used computer-typesetting to produce the many print pages of their publications. Once computers reached the capacity for online access, the information for publication was collected into databases for remote information retrieval. More recently, the documents have also been made available on disk, principally CD-ROM (Compact Disk Read Only Memory), for access through associated software on personal computers. Increasingly, electronic publication is taking the place of documents for which no print equivalent is intended. Examples include encyclopaedias, reference databases and electronic journals. It is also reinforcing our awareness of the game as a communication device, to the extent that games software may be associated with databases to enable information to be communicated and to entertain at the same time.
5.4.1
Desktop Publishing
Desktop publishing is one of those marketing terms that have been used to represent the processes of document creation and page layout on a personal computer, with subsequent output to print. It may be characterised by: -
Integration of high-quality mixed text and graphics displays
Graphic design and typesetting to rival phototypesetting
-
Network interface and integration with facsimile transmission
-
A personal, rather than a institutional approach.
Since the early desktop systems were introduced, the page layout and typesetting software has been developed to the extent where it has all the capabilities of commercial phototypesetting. It may therefore be used as a front-end for commercial systems. Flexible output to paper has been made possible by the use of page description languages, notably PostScript,4 for applying the page composition software that has been used for layout. Distinction between desktop and commercial publishing in terms of functionality therefore no longer exists, except that the printing industry uses the expensive photographic plate-making equipment to enable multi-page plate making for large-scale reproduction. If the three headings considered earlier for print publication are reviewed, it can be seen that electronic print publishing has addressed each of them well. • Copy preparation This is now much closer to the author than before, providing fully ßexible layout and display capability, and enabling authors to deal with self-publication more easily, avoiding publishers and going directly to printers if they so desire. Unfortunately, the technical capacity for doing something does not necessarily mean it gets done effectively. Years of experience in publishing houses with typography, blocking and shaping of sections of text and graphics on the page, and grid patterns such as consistent use of a column presentation may be lost - or perhaps the results should be thought of as post-punk presentation. 4
Developed by Adobe Systems for Apple Computer and subsequently used widely on other platforms.
CHAPTER 5 105 Creation of Information
• Readability A great range of fonts has been produced for wordprocessing and they have been applied within publishing software. They have been used not just for the linear fonts, examples of which are shown in Figure 5.7, but also for ßexible presentation of decorative text. • Illustration Use of clip art, incorporation of drawn art in transferable image format from graphics packages, or scanned paper illustration material (with cropping and image adjustment), and colour utilisation from a wide range of palettes are restricted only by the imagination of the designer/authors.
5.4.2
Compact disk publishing
Electronic publishing with intention of providing information for computer use rather than print use, may be carried out using computer networks as a dissemination medium, or may involve production of some type of digital storage medium for local use. A number of these media are examined in Chapter 14. The most popular for publishing purposes to this time has been the CD-ROM, principally because of the effort that has gone into standardising the mastering and replication processes for the audio compact disk market. CD-ROM has found widespread application for dissemination of abstracting and indexing databases, games, maps and charts, forms, software and instructional manuals. The range and diversity of multimedia applications incorporating text, sound and imagery are now signiÞcant. The production process must take the following information management aspects into account: • Design An overall structure for a document should be laid out so that the importance and required production time of the different components can be assessed, and so that the proportion of text, imagery and other multimedia formats, and their interrelationships can be mapped. • Authoring One might well ask why one needs a word like ‘authoring’, when ‘writing’ has been used up until now. Is it another example of the inßated language that we so often see in the IT arena? It depends how one applies it. Authoring is simply the writing of digital documents, but this writing gives the author more personal control over design features such as layout and image incorporation. If one is going to write them electronically, it is important to assess which is the most appropriate software for one’s purposes. Wordprocessing software might do the trick if one’s emphasis is primarily on text. However, should one wish to design animation for example, into a document, then more specialised authoring software will be appropriate. • File organisation The CD-ROM production software will generally handle Þle organisation automatically. However thought will still have to be given between the extent of information that will be present on the CDROM, the different Þle formats that are to be integrated, and associations between the CD-ROM and remote networked information by means of links established within the Þles.
106 PART B Operational Information Management
• Indexing Indexing processes for print and many digital documents have mostly been carried out after the production of the document itself. Because CD-ROM publishing lends itself to hypermedia, the indexing process needs to be considered at the time of authoring. It can be argued that the index might be at least partially constructed before the document itself is written. In this way, the document is expanded around its inbuilt conceptual relationships. • Mastering It is appropriate to have a role for a producer who will oversee the integration of design and authoring, ensuring that the production of multimedia segments is scheduled, and then that the whole package is integrated as a complete composition. A master copy may be produced on magnetic disk based software, preparatory to mastering for CD-ROM reproduction.
Figure 5.11: Document produced for compact disk (“This is IT” CDROM familiarisation material for new students)
5.4.3
Network publishing
The immense capacity for fast distribution of ideas via the Internet has given rise to an abundance of what might be called electronic publications. Many of them are publications in the sense of being publicly available through avenues such as World Wide Web pages. They would not be called publications in the sense of going through some type of quality control procedure carried out by another party. Typical of these are the many personalised home pages with attachments of interest to their creators that are available over the Web. This same avenue is being adopted by commercial and professional organisations for making available documents that are designed and edited speciÞcally for network distribution. In many cases these documents must be viewed on a personal computer by means of interpretative software that is also loaded onto the machine (‘plug-ins’). Figure 5.12 shows an example of an interface to a network document that may then be viewed through such software - in this case Adobe Acrobat.
The periodical, and its publication signiÞcance were mentioned earlier in the Chapter. Because of this signiÞcance, production of electronic periodicals was attempted early in the development of networked systems.5 However, although informal newsletters were immediately successful, adoption of the more formal publications has been much slower. A number of factors have inhibited development. These included: -
The market existing for the print form of a publication was fully represented on or conversant with the Internet
-
Network bandwidth availability has only comparatively recently been able to support the graphic imagery carried by the print equivalents
-
Major publishers have had difÞculty coming to terms with pricing, marketing and equilibration between print and electronic equivalents
-
The research community has been tentative, except in some specialised areas, because of the entrenched practice of having hard copy publication in hand to represent evidence of research for advancement.
However, now that visibility is increasing, digital archiving and exchange of documents is prevalent, editorial and reviewing processes are being implemented and better use is being made of alternative interfaces, the periodical is becoming a prominent publication on the Internet. In the case of scholarly publication, there is a growing realisation that the electronic periodical provides, together with the prestige of authoring: 5
Electronic distribution per medium of CD-ROM has been the case for many years with abstracting and indexing services. Additionally, early programs such as ADONIS involved distribution of images of the printed journals themselves.
108 PART B Operational Information Management
-
Automatic archival processes
-
Easier submission and reviewing processes
-
More interactive scholarly discussion through ‘interactive’ communication supplanting ‘letters to the editor’ in the print form
-
A more straightforward way of bringing together collaborative work.
Figure 5.13: Electronic journal D-Lib Magazine via http://www.dlib.org/ (originating site)
Additionally, despite the fact that many important journals are not produced primarily as digital publications, the efforts by major publishing houses to provide digital portfolios of their publications in online databases has been matched by companies that act as aggregators of periodicals, making many products from smaller publishers available online in consolidated full-text databases.
5.4.5
Computer-supported collaborative work
Collaboration in publishing was mentioned above in relation to electronic journals. The concept of computer-supported collaborative work (CSCW), a term coined in the 1980s with ‘cooperative’ then used for ‘collaborative’, is however a much broader one with respect to creation of information. CSCW is about providing computer-based tools and techniques for the effective support of the collaboration of two or more people in reaching a common goal or jointly performing a task. Groupware is the term used for the software that provides the shared workspace for the participants. It may be as simple as electronic mail facilities. It may be as developed as purpose-built software that provides a tool for the organisation of development teams of experts, or distributed product design and development, or business transformation. The results of the collaboration may lead to a document that is to be shared further, so groupware can be regarded as publication support software.
CHAPTER 5 109 Creation of Information
Increasingly, groupware is being used in association with Information and Communications Technology (ICT) activities for cooperative work (Diamond Bullet Design 2002; European Telework Online 2002). This teleco-operation facilitates document creation, and further diffuses the concept of what is meant by publishing.
5.5
Markup
As digital documents are created, they are structured, particularly for text, by a process known as markup. At its simplest, this provides for the spaces, punctuation and other symbols that we use when laying out the format of text. Without markup, this phrase would be: Withoutmarkupthisphrasewouldbe However, there is more to markup than embedding characters for readability purposes. It provides an ability to manage documents that permits: -
SpeciÞcation of the structure of a document independently of the presented format
-
Creation of multipurpose documents for repeated utilisation
-
Integration with word processing and document authoring systems
-
Coupling or fragmenting of documents that may be linked by formal pointers to each other
-
Full-text query and retrieval
-
Incorporation of metainformation that self-describes documents for information retrieval
-
Cooperative work in a networked environment.
For digitised text, two types of markup can be identiÞed: procedural and descriptive. • Procedural markup This consists of speciÞc commands embedded in text to indicate to wordprocessing software how text is to be formatted. The controlling software is able to separate the markup from the text by reference to a limited table of control codes indicated in the text by an unambiguous delimiter that the software does not confuse with text. For example some procedural markup may appear as follows: In this example, anything proceeded by ‘.’ is a formatting command, so the markup includes instructions relating to size of heading, indents (both temporary and permanent) and spacing. Many other formatting instructions relating to such matters as line length, hyphenation, tabbing and font displays such as underlining and subscripting could also have been incorporated.
.h1 The document’s title is here .sp 2 .in +6 .h2 This is a secondary heading .sp .P This is a paragraph of normal text... .sp .ti +4 * First item in a bulleted list .sp .ti +4 * Second item in a bulleted list .sp .P This is another paragraph of normal text
Descriptive markup does not use speciÞc formatting commands. Instead it outlines a logical document structure that may be interpreted for presentation purposes according to local software. For example, Figure 5.15 shows that both a structure and a form for a document may be identiÞed.
110 PART B Operational Information Management
Figure 5.15: Document form and structure
Descriptive markup such as that shown in Figure 5.16 is concerned with describing the structure. The particular logical form may then be implemented for speciÞc circumstances using styles appropriate to those circumstances. This document heading is at first level The title of the document is here This document heading is at second level This is a paragraph of normal text.... First numbered item in the list Second numbered item in the list Third numbered item in the list This is another paragraph of normal text... This is another paragraph of normal text...
Figure 5.16: Descriptive markup
CHAPTER 5 111 Creation of Information
5.5.1
Generalised markup
Generalised markup assumes that descriptive markup has been used so that the structure may be deÞned and then the form applied later according to requirements. It therefore requires formalism for deÞning documents that may be used, shared, and re-used in systems. Two logical document models for generalised markup have been carried through to the point of international standardisation. These are Standard Generalized Markup Language (SGML) and Open Document Architecture (ODA). The two are conceptually very similar. The differences tend to be in usage rather than deÞnition. ODA contains a lot of presentation semantics for document exchange between wordprocessor software, but SGML does not (it is for authors and publishers, so perhaps the publishers did not want the authors to get carried away with presentation and layout). They are appropriate to use with electronic publishing whether or not print presentation of a product is required because: -
Draft formats may be edited on screen and presentation style adjusted within the document structure
-
Different editions for different countries or different environments (for example large print readers) may be produced relatively easily
-
Secondary publications such as bibliographies consisting of surrogates of sets of main documents may be produced
-
Electronic information retrieval can make use of structured sections within documents.
• Open Document Architecture ODA makes provision for both a logical and a physical layout structure for a document (International Standards Organization 1994-1997). It can be regarded as an evolution from standards developed with document interchange in mind, where the interchange is by means of facsimile transmission. Unlike SGML, ODA texts make use of ODIF, the binary Open Document Interchange Format for storage and transmission. ODA’s layout facilities have features for: -
Document structure:
this involves the logical view of the document (components such as paragraphs and headings), and the provision for different views of the same document - the layout in terms of pages, sets of pages, and blocks, permitting different layout structures to be used for the same document
-
Content architecture:
the blocks may contain text, geometric graphics such as computer drawings, or raster graphics from image encoding. Content descriptions include such information as identiÞcation of that part of the text which has to be italicised, or areas of text shown in highlight form
-
Styles:
these contain the information necessary to control layout and make the connection between the logical view and the layout view.
• Standard Generalized Markup Language William Tunnicliffe of the Graphic Communications Association in the United States registered the concept of generic coding of documents as GenCode during the 1960s. Goldfarb subsequently used
112 PART B Operational Information Management
GenCode to develop the Document Composition Facility Generalized Markup Language (GML) at IBM. GML was used as the basis for an American, and then an international standard (International Standards Organization, 1986c). A document deÞned by SGML consists of: -
The SGML declaration that speciÞes the character set such as ASCII or EBCDIC and any variations from the character set that apply. It also assigns delimiters that are used. It is normally common to all documents at a site. An example of a declaration is shown in Figure 5.17. This declaration could be applied to a variety of different document types, which are themselves each deÞned by a DTD.
-
The Document Type DeÞnition (DTD) that deÞnes the structure of a class of documents (for example memoranda or technical reports), of which the document is one. DTD components include: - Þeld of a memo could optionally be conÞdential or public, as in the Figure with a default value of ‘public’
An example of a DTD is shown in Figure 5.18, in this case to deÞne a simple class of documents called NOTE. -
The document itself, marked up in SGML as an instance of the DTD.
Figure 5.17: SGML declaration example
CHAPTER 5 113 Creation of Information
ELEMENTS
MIN
CONTENT
(EXCEPTIONS)
--->
Figure 5.18: DTD: Document Type Definition example
• Hypertext Markup Language Just as there are DTDs for print documents such as memoranda and articles, so a DTD has been created for use on the Internet World Wide Web. Hypertext Markup Language (HTML) is an application of SGML. There have been several stages in HTML development each of which has had its own DTD. HTML is therefore one family of DTDs among many, and has now developed to XHTML. Web browsers such as Netscape and MS Internet Explorer are able to interpret documents (Web sites) that have been created within the bounds of the DTD framework. To show how this is applied, Þgure 5.19a contains a frame off a page from the Cambridge ScientiÞc Abstracts Website The Nutshell. A frame is itself permitted by the DTD to enable sites to be displayed from different component parts. The frame itself has a different address, which is invoked to help make up the document. Figure 5.19b shows most of the HTML markup that has been used to create the frame. The example shows how a simple but effective Web page can be established with minimal use of the Hypertext Markup Language. In this case the following aspects of HTML are utilised: -
Header information:
this is enclosed within the tags and does not display on the Web page. This area of the document may be used for various META tags that help to describe the page. In this case, only a tag has been used.
-
Body information:
all tags within the body refer to material that is to be displayed by Web browser software. The tag itself may be extended by attributes that deÞne colours, text, or background for the page. In this example a ‘.gif’ graphics Þle is deÞned to represent the background of the page.
114 PART B Operational Information Management
Figure 5.19a: The Nutshell frame http://www.thenutshell.co.uk/content/free/welcome.htm
Figure 5.19b: HTML source & explanation for The Nutshell frame http://www.thenutshell.co.uk/content/free/welcome.htm
-
Paragraphs:
the
is used to delineate strings of text. It may also be extended by attributes as in the example where the paragraphs are modiÞed by an attribute that centres the paragraph.
-
Tables:
HTML provides limited formatting capacity, but page designers often use tables to force the placement of different parts of a site within cells. The
tag initiates the table, in this case without any borders showing on screen; the tags
start and stop rows
CHAPTER 5 115 Creation of Information
in a table – there is just the one row here and the
tag is used to deÞne the row cells in such a way that an image is incorporated side by side with text. -
Images:
the tag refers to images in local Þles that are to be displayed – in this case there is an image of the Nutshell logo incorporated within the table.
-
List:
different types of formatted lists may be created with HTML to improve organisation of information – lists may be created with several different formats including automatic numbering of bullet points, or simple bullet point creation as in the example shown here which incorporates
tags in order to generate the bulleted list.
-
Anchors:
the key to establishing links to other pages is the ability of browsers to recognise links that have been established by anchors within HTML; in this example, within the list there are two anchors. The Þrst of these is: Institute of Information Scientists which encloses the linked text within an anchor and points in this case to a site name. If a speciÞc Þle name is not speciÞed for the site, the link defaults to an index.html Þle within the named directory. The second anchor provides a similar link to Bowker-Saur.
Although standard versions of HTML exist, browsing software has been developed to interpret non-standard extensions. For example Internet Explorer permits the deÞnition of margins within the tag, and it can interpret these while other browsers ignore the attributes. • Extensible Markup Language (XML) XML is a subset SGML, developed by W3C (World Wide Web Consortium: W3C, 2002a) for universal document transfer on the Web. Unlike HTML, which is a speciÞed application, it is in fact a meta-language that permits speciÞc applications to be deÞned. However, it is a simpliÞed version of SGML, and is designed to enable easy interchange of structured documents over the Web. XML provides for DTDs in order to check that documents are created and interpreted with valid structures. These DTDs verify for example that users embed headings of different levels within the correct structure – facilities that HTML does not provide. However it does not require a DTD and may assign a default deÞnition for undeclared markup components. Among its characteristics are: -
It is not a predeÞned set of tags, of the type deÞned for HTML, that can be used to markup documents
-
It is more stringent than HTML, requiring that both start and end of all elements is indicated
-
It is not a template for a particular type of document – it provides the framework within which templates may be deÞned, and is ßexible enough to deal with all document types
-
It can produce compound documents from multiple Þles
116 PART B Operational Information Management
-
It provides support for software such as validators and browsers per medium of processing control information
-
It can encode illustrative material within text Þles.
As is the case with structured database deÞnitions, XML provides the ability to deÞne using document elements. These can be applied to the many document types that appear on the Web. For example if one had a DTD deÞned for a document type ‘weather forecasts’ that stated elements and attributes for all types of forecasts, one could provide an instance of one looking as follows: November 11, 2001 06:00EST 12.00EST Sydney NSW Australia clear 27 22 E 3-5 57 6 This may then be displayed according to whichever display formats are required, perhaps using a style sheet within a browser.
5.6
Further reading
Forms In his book on screen format design, Galitz (1993) looks in detail at screen graphics, menu and inquiry screens. He devotes sections to components of data entry screens and their relationship to printed equivalents. Shneiderman (1998), in a wide-ranging text looking at human factors and the computer interface, provides some guidelines for form ‘Þllin’. Barnett has published a number of books on forms management procedures with an emphasis on print forms. In a relatively recent one (Barnett 1996), he looks at elements of forms programs such as identiÞcation, organisation and design standards.
CHAPTER 5 117 Creation of Information
Publishing A detailed description of all aspects of the book production process is given by Peacock (1995). Vervliet (1972) has edited a large number of contributions about the book in different cultures, before and after the invention of printing, in a book that is profusely illustrated and includes colourful examples of much historically and aesthetically signiÞcant material. Winckler (1978) has edited a book similar in scope of contribution but with a more modest budget for illustration. Cookman (1993) provides many examples of desktop publishing design styles. Freedman (1992) gives design examples but also explains principles and skills of the DTP process. Standera (1987) looks in detail at both print- and non-print-based electronic publishing and associated issues such as electronic document delivery, and considers the implications for participants in the process. Ernst et al. (1993) have brought together a number of documents from the Harvard program on information resources policy that look at the way digital technology is inßuencing publishing processes and human communication in general. Bielawski (1996) describes electronic document production using examples based upon Lotus SmarText software. Eisenhart (1996) looks at management aspects of what is seen as a paradigm shift in publishing. His emphasis is on business strategy for publishing companies, with a comparison between print and electronic media approaches.
Markup The SGML standard itself is ISO 8879 (International Standards Organization 1986c). A detailed concordance to the syntax, characters, abbreviations and representations of the standard has been produced by Smith and Stutely (1988). Texts on SGML include those by Bryan (1988), Goldfarb (1990) and van Herwijnen (1994). Four document type deÞnitions are provided in ISO 12083 (International Standards Organization 1994). Numerous XML texts have been produced in recent times. It is probably most useful to look at one that takes account of the derivation from SGML (Flynn 1998), and then make reference to the many Web sites that provide guidance and pointers to SGML and XML material, notably that of Cover (2002), which provides a comprehensive lead-in. Instructional HTML material is found in many of the books dealing with the Internet and its use, but material that concentrates upon its use for authoring includes works by Graham (2000), and by Musciano and Kennedy (2000). The HTML site maintained by W3C (World Wide Web Consortium: W3C 2002b) is the starting point for all developments in the language. There are numerous developers’ guides for working with markup on the Web, for example Webmonkey (2002).
CHAPTER 6
6
Distribution of information ............................................................... Ships that pass in the night, and speak each other in passing; Only a signal shown and a distant voice in the darkness;
Tales of a Wayside Inn, ‘The Theologian’s Tale: Elizabeth’, H.W. Longfellow Having created information, we need to communicate it. In this chapter some aspects of communication leading up to telecommunication are reviewed, followed by an outline of the basis for information transfer facilities on computer networks. This is to provide a framework for looking at the information management that is required for two applications of information transfer, namely trade and bibliographic information interchange.
6.1
Getting the message across
The capacity for using devices additional to our direct physical senses for communication is a distinctive feature of humanity. We may have started off by gesturing and grunting to each other face to face, but our ability to communicate was expanded as we got the idea of coding systems, and started to convey messages at a distance. Smoke signals or beacon Þres, drums, or rams’ horns have all proved effective in their time. Although we have continued through time to make a lot of use of smoke signals and drum beats for communication, we have developed increasingly structured methods of communication when the message transfer is required to cover a distance. There is some evidence that the Egyptians had organised systems of relaying messages in the second millennium BCE, but the ancient system that is most recalled, possibly because it was written about with admiration by early Greek historians such as Herodotus, was the relay system used by the Achaemenid kings in Asia Minor from about the sixth century BCE. Using relays of horses they were able to get messages from Susa (in present day Iran) to the Aegean coast, about 2,500 km away by their royal road, within a week. The Greeks themselves were less organised in this respect. Maybe this is why in 490 BCE, Pheidippides was reputed to have run by himself from the battle of Marathon to Athens, to report victory of the Athenians over King Darius.1 The Romans developed a highly organised message sending system, the cursus publicus, which may be regarded as a cornerstone of their empire. (Variations on this Latin have been used to describe our 1
The race that we now know as the marathon was revived over a similar distance at the Þrst modern Olympics in Athens in 1896.
CHAPTER 6 119 Creation of Information
equivalent contemporary systems!) Their system was essentially an ofÞcial one. Separate arrangements would probably have been required to communicate unofÞcial material such as Paul’s epistles to the Romans. The fall of the Roman Empire caused a slow disintegration of the system. Some subsequent rulers saw the advantages of trying to maintain the message distribution process. However by the Middle Ages of Europe there was no equivalent, and certainly nothing comparable with the extensive messenger services that Marco Polo noted during his oriental perambulations in the thirteenth century. Perhaps this is what prompted the Tassis (later von Taxis) family from the region of Northern Italy to initiate message delivery services. These ßourished in later centuries, still managed within the family network, and spread across Europe under the patronage of the Habsburg Empire. The posthorn symbol that they adopted is still used by some European postal administrations. During the eighteenth and early nineteenth centuries, the beginnings of the state-based postal systems that we know now came into being. This was when states began to realise that their ofÞcial ‘packet switching’ services could have revenue supplementation by delivering private mail as well. In the United States, their Þrst general postmaster, Ben Franklin, did much to widen the scope of delivery services. The structuring of the postal system that we have today was shaped greatly by Rowland Hill in Britain, who in 1840 successfully introduced prepaid postage and fees that were relatively independent of distance.
6.2
Figure 6.1: The posthorn symbol postmark for a postage stamp http://www.norbyhus.dk/skerso.gif
Telecommunication
Since the tele- part of telecommunication implies distance, one might think of the smoke signals that were referred to earlier as telecommunication. However, we seem to have settled on an understanding that implies electronic facilitation of communication. Even so, it is necessary to look back prior to electricity for the origins of our approaches to digital coding.
6.2.1
Coding antecedents
The ancient Greeks had a system for distance communication, hill to hill, conveying the letters of their alphabet by an arrangement of large vases. Medieval prisoners used a similar coding system using tapping on their cell walls (rather than vases on a podium) during their long periods of incarceration in dungeons. As one might expect, the military has long been involved in coding systems. By the late eighteenth century, semaphore devices were being used by the French and the English as a sophisticated form of ßag waving of military instructions. In fact the term telegraph (distant writing) was used to describe Chappe’s system of signalling at the time. Our understanding of telegraph changed as we developed our ability to encode signals electrically.
6.2.2
Wired communication
The nineteenth century saw the invention and development of the message sending technologies that have led to the ICT that we take for granted today.
120 PART B Operational Information Management
The telegraph Needle-telegraph systems in which an electrical current deßected a set of pivoted magnetic needles were developed by the likes of Cooke and Wheatstone2 in Britain in connection with coded dispatch information for the newly developing railway systems early in the nineteenth century. These systems, which relied upon internal coding rules, had other applications too - there was one linking the summer and winter palaces of the Tsar in St Petersburg. Figure 6.2: Wired communication
Figure 6.2b: Overland telegraph – a privately constructed ‘branch line’ off the telegraph in Lake Eyre; with permission from Centenary of Federation South Australia’s Connecting the Continent website; photo by Adrian Adams http://www.connectingthecontinen t.com/ctcwebsite/lakeeyre/
Figure 6.2a: Cooke and Wheatstone’s telegraph, with permission as depicted at http://www.btinternet.com/ ~britishempire/empire/images/ fiveneedletelegraph.jpg
In 1842, Alexander Bain in Scotland developed a form of telegraphy that had the potential to transmit text and simple line drawings - the forerunner of telefacsimile. However the name that we tend to remember in connection with the development of telegraphy is Samuel Morse. He devised his dot-dash code based on the absence or presence of electronic pulses for serial transmission of messages, after realising the implications of Faraday’s work on electromagnetism and applying electromagnets for increasing the intensity of magnetic Þelds. He demonstrated a test version of what would become telegraphy to the US President and cabinet in 1838, and received funding for a practical test, which was successfully demonstrated between Washington and Baltimore in 1844. His telegraphic system was also applied in Europe, but such is the way of things that his ‘standard’ coding systems did not prevail. There were separately developed European (continental) and US Morse codes. The European ones catered for diacriticals used on characters in several European languages, but there was also variance with some common characters, for example: Letter
US Morse
E
•
É (E acute) F
2
•-•
They patented their Þve-needle device in 1837.
European Morse
• ••-•• ••-•
CHAPTER 6 121 Creation of Information
Most numerals were represented differently in the two systems as well. Morse’s telegraphy was also associated with train dispatching information. In the USA, railways and telegraphy evolved together. Although some wiring was subterranean, it was found preferable to string the lines from posts. Thus depending upon your point of view, we have another contribution to urban visual pollution, or a romantic landscape tamed by humanity. The telegraph key was the encoding device for Morse code. Other key developments in wired communication were the introduction in Germany of full duplex transmission enabling concurrent two way communication, later extended to quadruplex by Edison, and Baudot’s development of time division multiplexing for encoding signals using a Þve-position code (echoes of those Greek vases). This led on to automatic telegraphy, which used keyboard/teleprinters to transmit and receive the Þve-bit Baudot code. These advances gave rise to monumental projects involving cabling to carry signals. One such project is depicted in Figure 6.2b. The Australian overland telegraph involved installation of 36,000 poles between Port Augusta and Darwin between 1870 and 1872. Eleven repeater stations each staffed by two telegraphists and four linesmen maintained the line. The Þrst message, sent by Todd, the superintendent of telegraphs, was: “We have this day, within two years, completed a line of communications two thousand miles long through the very centre of Australia, until a few years ago a terra incognita believed to be a desert”. The telephone The principles of sound vibration and the idea that vibrations could be produced by electrical pulses were understood by the 1830s, but it was some forty years before this was exploited. On the east coast of the US in 1875, Bell experimented with an apparatus for transmitting sounds, which he patented as a device to transmit speech by electric circuits, and the following year was able to speak to his assistant across a line.3 The Þrst commercial telephones went into operation in the USA in 1877. Manually operated switchboards were introduced to enable connection of callers on single lines. These installations gave way to dial-up of numbers through automated switchboards. Distance communication was improved and extended by use of ampliÞers inserted into the lines as repeaters. Multiple channel communications moved from twisted wire to coaxial cables and then optical Þbres, as we have sought more and more bandwidth, and more and more channel capacity.
6.2.3
Wireless communication
Faraday had demonstrated that electrical currents could produce a magnetic Þeld, and in 1864 Maxwell had shown theoretically that electrical disturbances moving at the speed of light could be detected at considerable distances. Hertz was able to demonstrate this experimentally over short distances, then Marconi was able to extend this work and by 1895 had developed a telegraph system without wire. Fessenden in the USA achieved radio transmission of voice, and by 1901 Marconi was able to transmit radio messages across the Atlantic Ocean. Vacuum tubes could amplify signals received at an antenna; so interested enthusiasts began radio broadcasting over short ranges. As early as 1919 there was a radio station operating in the Netherlands. If sounds could be encoded for transmission, could images also be conveyed? Images may be coded in the same way, but as they carry much more data, they take up a far greater range of the frequency spectrum. Although television requires much greater bandwidth than radio, it has been developed following the same principles. Early work on image transmission by Nipkow in Germany in the 1880s used a mechanical spiral pattern of holes on a rotating disk. Much development work was done in the USA (for example, by Zworykin who in 1923 invented the iconoscope for television transmission), Europe and the then Soviet Union after the 3
“Mr Watson, come here; I want you” (perhaps a portent of many messages to come via telephone).
122 PART B Operational Information Management
turn of the century, although possibly the most publicly known work is that of Baird who during the 1920s took the mechanical principles and worked them up to a point of being able to transmit moving pictures. However it was the cathode ray tube that formed the basis for widespread exploitation, and Shoenberg who systematised its use in Britain so that the BBC was able to launch a service in 1936 - abruptly terminated by the onset of World War II. The distances and bandwidth over which radio signals may be conveyed were continually extended. Transoceanic transmission was carried out using shortwave signals in the 1920s, and microwaves were applied during the 1940s in World War II for line of sight communication across multiple channels. Satellites have now been used extensively for many years as microwave relay stations, relatively stationary in geosynchronous orbits. They were used commercially for example during the 1964 Olympic Games in Tokyo. Increasingly, wireless communication is being used within organisations via radiowave transmission, cutting down on cabling costs. Figure 6.3: Wireless communications
Figure 6.3a: Early TV star, Felix the cat beamed in experimental transmissions in 1928 by RCA http://www.cinemedia.net/SFCV-RMIT-Annex/ rnaughton/DEAD_MEDIA_MECH_TV.html
Figure 6.3b: Ceduna satellite receiving station with 30m antenna (photo: P McCulloch) http://wwwra.phys.utas.edu.au/observatories/ceduna.html
Whatever the channel that is used for telecommunication, there is a set of protocols that are necessary before computers are able to exchange messages. Since the message form for transfer is fundamentally in binary code (bits), it is necessary for a sending computer to be able to convey its bits onto a communication channel, and for a receiving machine to be able to take the data in such a way that they may be successfully interpreted. This means for example, that bits moved around in parallel within a computer may have to be converted to serial representation of the same data to be carried on a communication line. From there they may require further conversion to analogue signals carried for remote telephone systems. This process is carried out by modems (modulating-demodulating devices) linked with the computers at either end of a message interchange. However the telecommunications network component of the ICTs is now fundamentally based upon a core digital network that is capable of processing signals from all types of media including television and telephone in the same way, and effectively operating as a ‘bit pipe’, for whatever it is that customers wish to connect.
6.3.1
Transmission conventions
Whether machines are transferring data between themselves locally or remotely, nothing can happen unless a series of conventions has been adopted to enable machines to share data. Innovation in computing has inevitably been accompanied by incompatibility between the computer equipment of different manufacturers. The widespread networking that we now take for granted would not be possible if it were not for agreement on standards for data transmission, so that differently encoded data may be shared between otherwise incompatible equipment. The ponderous process of developing and agreeing to standards on an international basis is often overtaken by a surge of organisations adopting a de facto standard by reason of its popularity. This has been the case with transmission protocols, where the Transmission Control Protocol/Internet Protocol (TCP/IP), which is the protocol suite deÞned for the US Defense Advanced Research Projects Agency Network (ARPANET), has been adopted by other networks, enabling ARPANET to grow into the Internet. It has therefore outpaced the years of development work that have been undertaken by the International Standards Organisation to provide an architecture for standards development with Open Systems Interconnection (OSI). The two protocol suites have in common a layered approach to functionality, but there is not equivalence between their layers. OSI is the more formally documented approach, and the reference model is shown in Figure 6.4. In TCP/IP, the process layer handles application, presentation and some session functions, and the network access layer handles physical, data link and some network functions, but generally there is not direct correspondence between the functionality of the levels. Process Host-to-host Internet Network access
Application Presentation
Application Presentation
Session
Session
Transport
Transport
Network Data link
Network Data link
Physical
Physical communication path
TCP/IP
Figure 6.4: Open Systems Interconnection Layers with TCP/IP correspondence
Physical
124 PART B Operational Information Management
In information management, one is concerned with managing information at the application layer, but in order to make the application layer possible, the layers beneath it must also detail the conventions necessary for data transmission, beginning at the fundamental physical layer, and building up to the applications layer as follows for OSI: -
Physical layer
this deals with the electrical and physical connections to a network of the users equipment
-
Link layer
this organises the bit stream for the physical layer and provides reliability using error-free packets
-
Network layer
this establishes a network connection to the transport layer and handles any conversion necessary for dealing with other network types
-
Transport layer
this is to ensure reliability of data transfer from the underlying layers to the session layer. It has a message transfer facility that is independent of the network type
-
Session layer
this synchronises interaction and manages data exchange between application layer sessions, so it must take data from the presentation level and manage the duration of a network transaction as a structured dialogue
-
Presentation layer
this is concerned with the syntax of message transfer so that a shared coding system is used by the messages being carried during the established session
-
Application layer
this makes provision for the application program at the user interface.
OSI (International Standards Organization & International Electrotechnical Commission1989-1994) has given rise to a great many standards that give expression to the ISO protocol suite. The layers permit standardisation of the operation of the different physical devices and software used for networking. Figure 6.5 shows the respective functionality of repeaters, bridges, routers, application gateways and switches with respect to the layers. Application
gateways
Application
Presentation
gateways
Presentation
Session switches
Transport
Physical
repeaters
routers
switches
gateways
Network
routers
switches
gateways
Data link
bridges
routers
switches
gateways
Physical
Physical communication path
Figure 6.5: Connection mechanisms
Session Transport
bridges
Network Data link
gateways gateways
CHAPTER 6 125 Creation of Information
The transfer protocols make possible the distribution of data by a variety of methods for user application services. The most widely used protocols are the following: -
File transfer:
from one machine to another, as its name implies, enables the sending of a Þle containing data in whatever form: wordprocessed, software, spreadsheet etc, so that a copy goes to a remote machine – under TCP/IP this is known as Þle transfer protocol (FTP), and within OSI it is known as Þle transfer access and management (FTAM). It is independent of machine and operating system. The respective protocols enable users to differentiate structured and unstructured Þles and data types, for example binary and American Standard Code for Information Interchange (ASCII).
-
Message handling:
electronic mail known as simple mail transfer protocol (SMTP) for TCP/IP and message-oriented text interchange standard (MOTIS) by ISO, provides the basis for user interchange of messages via unique system mailboxes for user. On the Internet, transfer of messages is made possible by the domain naming system whereby the remote IP addresses must be established before the transfer can take place. MOTIS provides for different types of user agents, for example electronic mail is one, and Electronic Data Interchange (EDI) another.
-
Remote terminal:
this protocol makes it possible for a user to work with a remote computer carrying out database searching, software development or whatever application tasks are desired, as if the user were connected directly to that computer with no intervening network – the TCP/IP application protocol that permits this is Telnet, and the OSI equivalent is virtual terminal application service element (VT ASE).
6.3.2
Character codes
One of the problems that network development has had to overcome has been that of different terminal equipment manufacturers supporting different character sets. Although this is concerned with the data level of information transfer, it must be understood for as shall be seen subsequently, coded character sets inßuence matters of information management at the application level. The Baudot 5-bit code used internationally for telex transfer showed the worth of character-level standardisation for distribution of information. ASCII was developed in the 1960s in order to make provision for a larger character set for computer communication.4 A 7-bit code provided for 128 (27) characters and therefore accommodated both an upper and lower case set of characters used for the English language, together with numerals and punctuation symbols. With the relatively universal acceptance of a limited character set for encoded data, it has been standardised internationally as ISO/IEC 646 (International Standards Organization & International Electrotechnical Commission 1991b). Figure 6.6 shows the character set from the standard. Although this table accommodates a standard UPPER CASE/lower case character set, it does not make provision for extensions to the Roman set that have diacritical marks on letters as used in many European languages, nor of course does it make provision for other alphabets such as Cyrillic or Kanji.
4
Another widely adopted code was that introduced by IBM, the Extended Binary Coded Decimal Interchange Code (EBCDIC), having an 8-bit code and providing for a 16x16 matrix of 256 characters.
126 PART B Operational Information Management
b7
0
b6
0 0
b5 b4
b3
b2
0 0
0
b1
0 1
1
1 1
0
1 0
1
1 0
0
1 1
1
1 0
1
0
1
2
3
4
5
6
7
0
0
0
0
0
NUL
TC7
SP
0
@
P
`
p
0
0
0
1
1
TC1
DC1
!
1
A
Q
a
q
0
0
1
0
2
TC2
DC2
“
2
B
R
b
r
0
0
1
1
3
TC3
DC3
£
3
C
S
c
s
0
1
0
0
4
TC4
DC4
$
4
D
T
d
t
0
1
0
1
5
TC5
TC8
%
5
E
U
e
u
0
1
1
0
6
TC6
TC9
&
6
F
V
f
v
0
1
1
1
7
BEL
TC10
‘
7
G
W
g
w
1
0
0
0
8
FE0
CAN
(
8
H
X
h
x
1
0
0
1
9
FE1
EM
)
9
I
Y
i
y
1
0
1
0
10
FE2
SUB
*
:
J
Z
j
z
1
0
1
1
11
FE3
ESC
+
;
K
[
k
1
1
0
0
12
FE4
IS4
,
<
L
1
1
0
1
13
FE5
IS3
-
=
M
]
m
1
1
1
0
14
S0
IS2
.
>
N
^
n
~
1
1
1
1
15
S1
IS1
/
?
O
_
o
DEL
l
Figure 6.6: Character set from ISO 646; reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
Other points to note include: -
Positions 5/12, 7/11, 7/12 and 7/13 are unassigned and may be used for agreed-upon characters between partners in a particular exchange
-
Characters in the Þrst two columns are non-print and are reserved for communications codes that enable hardware to establish sessions and synchronise.
There has been much extension of the character set to accommodate other than the basic set of characters. ISO developed a series of standards for other alphabets such as Latin and Cyrillic, in some cases making use of the additional 128 positions in an 8-bit code set to extend the basic Roman set, in others using escape sequences from the standard character set. This means that when certain codes are used within a data sequence, they may be used as an indication to move to a different character set. These codes or escape sequences are also the subject of standardisation in a separate standard. However, the development of many character sets was impaired by such problems as character sets that included glyph variants, such as ligatures, that are a difference in presentation form rather than a different character. So there has been evolution of a standard called UCS (International Standards Organization & International Electrotechnical Commission 2000a) that endeavours to embrace all variants. This is a 32-bit code set in order to accommodate Chinese/Japanese/Korean (CJK) symbols, and has grown from a combination of ISO’s own work, and that of Becker and associates in the US, on UNICODE (universal,
CHAPTER 6 127 Creation of Information
covering all modern written languages; unique, having duplication of characters that appear in more than one language; and uniform, with each character being the same bit length). The Þrst 256 code positions are identical to ISO Latin-1 as shown in Figure 6.7. These are within the 8,192 positions for standard alphabetic characters, followed by 4,096 positions for punctuation, mathematical operators, technical symbols, shapes, patterns and ‘dingbats’. Then there is a section of 4,096 positions is for CJK symbols, and more than 20,000 spaces for CJK ideographs. Languages covered include all modern languages used for communication, as well as historic forms of languages such as Greek, Hebrew, Latin, Pali and Sanskrit. This type of ßexibility is important for the documentation and library communities for representation of texts and effective transliteration.
Figure 6.7: Latin component of UNICODE(ISO/IEC 10646-1:2000) reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
6.3.3
Facilities for information transfer
The protocols referred to in the earlier section make possible remote terminal connection, mailing and Þle transfer which are three of the principal telecommunications mechanisms for distributing digital data. Others are also available, and this section is a brief review of facilities and their functions.
128 PART B Operational Information Management
•
Facsimile transfer A way of sending images over telephone lines between compatible sender/receivers such as a facsimile machine and a personal computer Þtted with a fax modem.
•
File transfer The ability to exchange Þles of data between computers without establishing a remote interactive session is a much-used procedure on the Internet through the medium of the FTP protocol. Many sites act as repositories for heavily used Þles of data or software, to the extent that ‘mirror’ sites are often established that carry the same Þles. These sites will often be established in several countries, and users in those countries are encouraged to access their own sites in an endeavour to cut down on international data trafÞc. Of course users also need searching facilities in order to identify the location of Þles. The most prominent search facility for Þle identiÞcation is Archie, which was developed at McGill University in Canada. There are many Archie server machines around the world that permit free access (which means that they carry a heavy load and may have poor response times) and enable simple Þlename (but not Þle contents) searching using case-sensitive search strings.
•
Email The prevalent means of exchanging computer messages is electronic mail. Email software normally makes provision for a free text message that must be accompanied by header information to permit successful distribution. At a minimum, the sender’s address is also sent, but the sender may also add signature information for further identiÞcation. Email software now normally also incorporates the ability to:
•
-
Put mail into designated folders
-
Establish distribution lists
-
Reply to or forward mail, cutting and pasting those parts of the original mail that are pertinent
-
Keep addressee Þles
-
Attach software or wordprocessed Þles created independently; for example Multipurpose Internet Mail Extensions (MIME) is a standard for transferring non-textual data attached to email on the Internet.
Mailing lists Software such as LISTSERV5 and Majordomo has been established for administration of correspondence between discussion groups. A group may establish itself in a special interest area such as the use of a particular application program, or a recreational activity, maybe bungee jumping (bungy if one is from New Zealand), and agree to have the list managed by the administration software. The software will automatically handle people who wish to subscribe to or leave the list, provide help on request for subscribers, provide lists of subscribers and the like. It handles all of this information using a separate administrative mail address from the one used for posting messages to the list itself.6
5
There is no E on LISTSERV because it was originally developed to run on IBM’s VM/CMS operating system, which had program names limited to eight characters.
6
A common pitfall for users (not just novices) of mailing lists is to send administrative requests to the mail list address whereupon they may be distributed to the full mailing list.
CHAPTER 6 129 Creation of Information
When a mailing list carries a lot of messages, a choice may be available to digest the messages. This means that all the messages on a periodical basis, perhaps daily, are cumulated into one message for the beneÞt of a user who may be on a number of mailing lists and wishes to avoid confusing messages. Many mailing lists choose to have a moderator who monitors all incoming messages and Þlters out material that is considered inappropriate for the list. As one may imagine, this may lead to controversy and the disintegration of a list, if the policy of the moderator is perceived to diverge from that of the majority of subscribers. •
Newsgroups There is software on the Internet that makes mailing lists publicly available if there is enough support to do so. The most prominent mailing list distribution facility is Usenet for which there is no fee for service. Usenet was developed in the late 1970s by graduate students at the Duke University and the University of North Carolina, and was later developed further at other universities. It is now widely used internationally in a variety of software environments. There are commercial equivalents such as ClariNet, for which there is a subscription fee. There are not deÞned lists of subscribers to a list such as Usenet, because software is set up to enable the contents of a mailing list to be forwarded to a multi-user machine where anyone may use the News software to view contents of and communicate with lists that are available. This has the distinct advantage of only having to carry one copy of each list message to each of the distributed multi-user machines. The process of establishing a newsgroup, or of turning a mailing list public on Usenet involves going through a proposal and response process with potential users, unless it is an ‘alt’ group. If it is accepted as a newsgroup, it will be named within the hierarchical newsgroup-naming scheme. The top level of this scheme has directories that have names such as follows: alt Alternate bit BITNET (US based, predominantly IBM sites) comp Computers ieee Institute of Electrical and Electronics Engineers misc Miscellaneous rec Recreation sci Science There are also many localised top-level names. This system of categorisation leads to some classiÞcation headaches. For example, one can Þnd discussion on CD-ROMs on many public lists including newsgroups at: aus.computers.cdrom bit.listserv.cdromlan comp.publish.cdrom.hardware comp.publish.cdrom.multimedia
A type of software, generally called newsreaders, has been developed for accessing and presenting newsgroup information. Figure 6.8 shows how an extract from a set of newsgroups may display at the top level of entry for such software. The number of messages within each group at the time is displayed.
130 PART B Operational Information Management
u 1479 434 rec.sport.swimming u 1480 1070 rec.sport.tennis u 1481 816 rec.sport.triathlon u 1482 501 rec.sport.volleyball u 1483 1075 rec.travel.air u 1484 832 rec.video u 1485 291 rec.video.cable-tv u 1486 743 rec.video.production u 1487 187 rec.video.releases u 1488 730 rec.windsurÞng u 1489 2121 rec.woodworking u 1490 4 sci u 1491 104 sci.aeronautics u 1492 195 sci.anthropology u 1493 181 sci.aquaria u 1494 1182 sci.archaeology u 1495 1220 sci.astro u 1496 6 sci.astro.Þts u 1497 37 sci.astro.hubble u 1498 105 sci.bio.technology u 1499 1084 sci.chem u 1500 292 sci.classics =set current to n, TAB=nextunread, /=search pattern, c)atchup, g)oto, j=line down, k=lineup, h)elp, m)ove, q)uit, r=toggle all/unread, s)ubscribe, S)ub pattern,u)nsubscribe,u)nsub pattern, y)ank in/out Figure 6.8: Extract from a newsgroup display of categories that may be generated by Usenet
•
Bulletin boards Bulletin Board Systems (BBSs) usually provide a combination of Þles that users may download, and a specialised mailing list. Individuals on personal computers often maintain them. Many now provide gateways to wider Internet services such as News.
•
Telnet Telnet provides interactive connectivity to another computer, so as long as one has an account on that computer, or if there is public access to online software such as a library catalogue, one can work directly on that platform.
•
Online discussion Another name for this might be called ‘talking to someone’ or telephoning them, but many people have warmed to the idea of interactive communication via typing at the keyboard. A popular way of doing this on the Internet is by use of software called Internet Relay Chat (IRC) developed in Finland. IRC discussion may take place in a channel focused on a subject of mutual interest to a group of people.
•
Hypermedia The conceptual basis of hypermedia is discussed in Chapter 11. It has been applied in many standalone software packages, but probably its best-known implementation is as the Web. The Web
CHAPTER 6 131 Creation of Information
software was developed by Tim Berners-Lee at the Research Centre for High Energy Physics, CERN in Switzerland. It was developed to make use of Hypertext Transfer Protocol (HTTP) for sharing of research documents in physics. As a result of the development of browsers such as Mosaic, Netscape, and later MS Explorer, that provide point and click interfaces to the Web, and of markup languages and authoring software, the Web has become ubiquitous on the Internet. •
Simulated reality A heading like this might easily lead into a debate about the nature of reality, but it is used as a catchall here to represent the kind of software that simulates a circumstance with which a user may move around. It involves communication in the sense that the user must issue instructions to software that has been given certain parameters by someone else. A typical example is the Multi-User Dimension (MUD) or MUD Object-Oriented (MOO) software that provides for the creation of an environment or game scenario in which the user may participate. Communication may be with other participants or with the person who set the scene and who may modify it as the game progresses. The term computer-mediated communication (CMC) is sometimes used to describe the above facilities when they involve personal or group interaction.
6.4
Distribution of trade data
Internal processing of accounting information within organisations is subject to processes based upon the data modelling that has taken place for the organisation’s systems. If data related to trade and Þnance is exchanged with other organisations, the procedures are known as electronic data interchange (EDI). With the advent of the Web and the development of digital business-to-customer relationships, electronic commerce has captured the imagination of the sector of the population that has access to these trading facilities. It is as well to remember that electronic commerce has been with us for a much longer period on a commercial basis using either ad hoc business-to-business data-sharing arrangements, or the more formal approach of EDI. We will examine application of EDI as it provides an example of the extent to which structuring may be deÞned for business-to-business data transfer.
6.4.1
Electronic data interchange
EDI is the transfer of structured trade data by agreed message standards, from one computer system to another by electronic means. Structured is included in the deÞnition so that electronic messaging such as email does not come under the EDI umbrella, and trade is included to show that the emphasis is on business documentation relating to purchasing, invoicing, order fulÞlment and so on, rather than such applications as bank transfers, airline reservations, library cataloguing or design data for engineering applications. Normally the systems in question are those of different organisations, rather than within one organisation, so the emphasis is on trading partners. Early applications of EDI took place in the USA in the 1960s, but the technology at the time including the techniques for encoding and transmitting digital data, inhibited developments until the 1970s. Since that time applications have been put into place in industries such as customs and excise, shipping via air and seaports, and automobile component supply such as for example through ODETTE, the Organisation for Data Exchange by Tele-Transmission in Europe. EDI supplanted paper-based systems where it was found to have the advantages of: -
Introducing staff savings by reduction of repetitive data entry, transcription error correction and veriÞcation
132 PART B Operational Information Management
-
Speeding transactions between trading partners, compared to manual mailing systems, and providing automatic transaction acknowledgment
-
Reducing paperwork
-
Enabling reduction of inventories, reduction in payment periods and improvement in customer service
-
Helping to promote integration of corporate information services.
However, its introduction faced challenges such as: -
The cost and reorganisational necessity of introducing a new system
-
The need for structured operating procedures so that the companies may interface successfully
-
Security and telecommunications reliability
-
Procedures for indication of proof of contract need to be accommodated by procedures such as electronic signatures, or by signing interchange agreements.
The international standard for EDI is the EDIFACT standard (International Standards Organization 19881999), which was brought about from the conßuence of work on the American National Standards Institute X12 set of standards7, and a working group of the United Nations Economic Commission for Europe, each of which had been developing standards for local areas. The general framework that EDIFACT provides includes an encapsulating structure with such data elements as message headers, references and trailers. To take invoice information as an example, this information would encompass speciÞc data elements for names, addresses, currencies, payments, allowances, taxrelated information and the like. There are a number of EDIFACT transaction types, for example invoices, orders, manifests, receipts and customs forms. Figure 6.9a shows an example data set, and Figure 6.9b shows part of the corresponding message for an implementation, in this case the time-series subset of the general statistical EDIFACT message (European Commission 1995). Country X - Exports by commodities and destination country 1980 prices in millions of US dollars (Export of machinery is in 1985 prices, and since QI 92 in 1990 prices) Reporting Country Commodity Country X Food and animals
Partner Country
QI 91
QII 91
QIII 91
Country Y Country Z
cll c21
c12 c22
c13 c23
Mineral fuels
Country Y Country Z
c31 c41
c32 c42
Machinery
Country Y Country Z
c51 c61
Other goods
Country Y Country Z
c71 c81
QIV 91
QI 92
QII 92
QIII 92
c14 c24
cl5 c25
c16 c26
c17 c27
c33 c43
c34 c44
c35 c45
c36 c46
c37 c47
c52 c62
c53 c63
c54 c64
c55 c65
c56 c66
c57 c67
c72 c82
c73 c83
c74 c84
c75 c85
c76 c86
c77 c87
Figure 6.9a: Example EDIFACT data set tabulation for time series statistical data from ISO 9735:1-9; reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
7
BISAC, the book industry standard for procurement of books, was in turn derived from X12.
Figure 6.9b: Example of EDIFACT data set corresponding to time series data in Figure 6.9a from ISO 9735:1-9; reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
134 PART B Operational Information Management
A transaction can contain a number of messages. An EDIFACT interchange message consists of a series of segments, each of which has a three-character identiÞer. Those identiÞers beginning with ‘UN’ are called service segments and are deÞned in the EDIFACT syntax (International Standards Organization 19881999). In Figure 6.9b UNH introduces a message header, but this itself would have been preceded by UNA (service string advice) and UNB (interchange header). Some segments are data elements. Others contain multiple data elements. The message encloses a number of user-deÞned segments that support the business function required of it. Within these segments, standard delimiter syntax is used. In the example: + Represents data element separator If there are unseparated sequences of these in the example then data have not been supplied. Each data element in the format has a Tag and a Name but these do not appear in the transmission. Their interpretation is determined by their sequence in a segment.
:
Represents component element separator
‘
Represents segment terminator
The segments comprise one or more data elements which themselves may be simple or composite. In the example some of the identiÞers used for segments are: ASI
Array structure identiÞcation
BGM
Beginning of message (may contain message identiÞcation number)
FTX
Free text allowing full name of metadata and language if required
NAD
Name and address of sending organisation
SCD
Structure component deÞnition
To take one of the more straightforward segments in the example: ASI+STRUCTURE+++1’ This is deÞned to have 4 elements called: 1. array structure identiÞcation,
value ‘STRUCTURE’
2. party identiÞcation details,
no value supplied
3. status, coded,
no value supplied
4. maintenance operation, coded
code value 1 = ‘new’
Detailed segment rules of this kind with tables of code representations are provided in the User guide (European Commission 1995), and directories of elements are available online (United Nations 2002). EDI has been operational for some time independently of the Web on the Internet. Because the Web is so ubiquitous, there has been motivation to provide an underpinning for business-to-business electronic commerce by combining markup development, where effort has lately been focussed on XML (Chapter 5) with EDI standards, for example within the ebXML framework (United Nations Centre for Trade Facilitation and Electronic Business & OASIS 2002).
CHAPTER 6 135 Creation of Information
6.5
Distribution of bibliographic data
The distribution of bibliographic data is of consequence to organisations like database producers that produce reference databases such as abstracting and indexing services, and to organisations whose libraries exchange cataloguing data. Standard formatting approaches predate telecommunications network applications. They were developed for exchange of information in digital form, but on the basis of passing media such as disks and tapes between participants. The formal standardisation approach to such data has parallels with ASCII and ISO 646, in that a model was developed in the United States, and then adopted as an international standard. In this case the development of Machine Readable Cataloguing Format (MARC) by the Library of Congress led to an American National standard ANSI Z39.2 in 1971 which was subsequently adopted internationally as Format for bibliographic information interchange on magnetic tape (International Standards Organization 1996a)
Figure 6.10: Bibliographic interchange format (from ISO 2709); reproduced with permission of ISO; Standard may be obtained via http://www.iso.ch
136 PART B Operational Information Management
ISO 2709 deÞnes a general vessel within which a variety of formats describing materials bibliographically may be contained. The structure is illustrated in Figure 6.10. The components of the format are: -
Record label:
this is a unique character string that identiÞes the record and contains control information indicating the type and size of record.
-
Directory:
this contains an entry for each of the Þelds present in the data part of the record – each of the entries comprises a representation of the Þeld identiÞer, often known as a tag, the Þeld length based upon the number of characters and delimiters in it, and an address representing a position relative to the starting position of the data Þelds.
-
Data:
the various data Þelds follow the directory. They contain the data for a bibliographic record. Some of the data are in coded form – these are sometimes termed control Þelds. Then follow textual data in variable length Þelds – the extent to which they are deÞned will depend upon the implementation. Accompanying the data are content designators such as indicators and subÞeld identiÞers that provide information about the tag.
The conventions used for such data Þelds will depend upon implementation. For example UNESCO, under the auspices of its General Information Programme, produced a format envisaged for use with abstracting and indexing services generally known as the Reference manual (Dierickx & Hopkinson 1986). The format embodied in this was used by the American Geological Institute to produce its Georef service, by UN agencies such as Comisión Economica para America Latina in Latin America and as the foundation for MEKOF-2, a speciÞcation of data elements for use between several Eastern bloc countries. Some time later a more inclusive format, the Common Communications Format (CCF) was produced by UNESCO. It endeavoured to Þnd commonality in bibliographic description used by libraries and abstracting and indexing services, but then evolved into distinct formats for bibliographic (Simmons & Hopkinson 1992a) and factual (Simmons & Hopkinson 1992b) information. A typical example using this format and describing a periodical document is illustrated in Figure 6.11.
Figure 6.11: Data fields within ISO2709 based upon CCF (Simmons & Hopkinson 1992a) reproduced by permission of UNESCO
CHAPTER 6 137 Creation of Information
The tags that appear in the Þrst column of the example do not appear in the data Þelds. They appear in the directory. So if we take the example of the Þeld with tag 610, its data Þeld entry would comprise: 0000†A341.16:63†CU$ Symbols used in Þgure 6.11 have been used to represent non-print characters from ISO 646 in Figure 6.6 as follows: † corresponds to ISO IS1, that is hexadecimal 1F. $ corresponds to ISO IS2, that is hexadecimal 1E. ‘A’ following † represents the subÞeld containing classiÞcation notation, in this case 341.16:63 ‘C’ following † represents the subÞeld identifying the classiÞcation scheme, in this case U for UDC. Its directory entry would appear as: 610001900329 In this example, the Þrst three digits ‘610’ represent the tag in question, the next 4 ‘0019’ represent the length of the data Þeld corresponding to the tag, and the ‘00329’ represents a relative position within the data Þelds where the data for this data Þeld commence.
6.5.1
MARC
Prior to the advent of computing, many libraries had sought ways for reducing the workload of cataloguing copies of the same materials that they each had in their collections. One avenue adopted was that of the purchase of standard cataloguing records from institutions that had major collections. The Library of Congress in the US was one of these, and in the 1960s it started experiments with magnetic tape distribution of cataloguing data. By 1968 it had, along with participating recipients of its service, moved to MARC II format, a reÞned version of its original machine-readable cataloguing. It was this format that formed the basis for ISO 2709 as well as many national implementations of the same format. The Universal MARC format, UNIMARC, was developed to enable sharing of records internationally, so that countries having developed idiosyncratic national formats could share data between each other via their national bibliographic agencies, usually national libraries. UNIMARC was another speciÞc implementation of ISO 2709 (International Federation of Library Associations and Institutions 2002c). An example of a MARC implementation is shown in Figure 6.12, in this case based upon USMARC. Components of this Þgure are considered in Chapter 7. In this example, the object being described is a Cannon ball. The three-digit tags shown on the left hand side are the tags that would appear within the directory shown in Figure 6.10. The indicators of Figure 6.10 in this case occupy the Þrst two positions of each of the data Þelds. In this example, they are usually blank (shown as ‘b’ with ‘/’). The identiÞers in this example are the single characters following the †. For example, in the title Þeld, which is 245, the two indicators are 0, the Þrst identiÞer is ‘a’ which in this Þeld represents the title proper, and the second indicator is ‘h’ which in the title Þeld represents what is called the general material designation or type of object.
138 PART B Operational Information Management
Figure 6.12: MARC example, from (United States Library of Congress Network Development and MARC Standards Office 1994, Appendix B, p.7)
The interest in MARC format in this section has been to exemplify a distribution format. The full expression of MARC is for more than distribution. It is for description of information objects within the framework of cataloguing rules, so it will be revisited in the following chapter, especially its cataloguing aspects. Because MARC was originally developed for tape transfer, it might be regarded as a legacy system. However, the tagging procedures and rules are so enmeshed within most online library cataloguing systems that its use remains pervasive.
6.5.2
Web distribution
The structures for digital distribution of bibliographic information predate the Internet, but as Internet protocols are developed, distribution mechanisms are being developed for the Internet also. The Web is self-distributing, in the sense that the hypertext transfer protocol and client-server architecture enable access to any data that are mounted on a Web-accessible server. A signiÞcant standard that arose from aspiration to share data between library cataloguing systems is the Z39.50 protocol (International Standards Organization 1998b). This is an application level application within the OSI framework, though it is minimally dependent upon lower layers. It speciÞes data structures and interchange rules that permit a client to retrieve records from a server (or target), and is much more generic in application than for cataloguing data. It is possible to enable multiple concurrent connections to machines by building on top of the Z39.50 protocol, which doesn’t specify an applications program interface to the client or the server. Z39.50 works with databases in a more abstract way than for example SQL (Chapter 11), conÞning itself to logical entities based on the kind of information that is stored in the database, rather than speciÞc database implementations. Its functions (Lynch 1997) include: -
Searching using a SEARCH protocol that allows the client to transmit a search request to a server and produce a result set of records that are maintained on the server along with a report of the number of records comprising the result set.
CHAPTER 6 139 Creation of Information
-
Presentation using a PRESENT protocol that permits retrieval of result sets (or speciÞc records from the sets) by the client, and options for controlling the contents and format of the records that are returned
-
Search management so that progress reports on an active search may be provided, or active searches may be terminated, or intermediate results provided based upon individual terms
-
Result set management, for example by sorting
-
An explanatory protocol, EXPLAIN, for detailing database availabilities and access points and record structure
-
Record syntaxes for transferring records, including a generalised record syntax, and speciÞc domain ones such as one that accommodates MARC. oai:arXiv:quant-ph/9901001 1999-01-01 Quantum slow motion Hug, M. Milburn, G. J. We simulate the center of mass motion of cold atoms in a standing, amplitude modulated, laser Þeld as an example of a system that has a classical mixed phase-space. 1999-01-01 e-print http://arXiv.org/abs/quant-ph/9901001 The Los Alamos arXiv Metadata may be used without restrictions as long as the OAI identiÞer remains attached to it.
Figure 6.13: Example OAI record http://www.openarchives.org/OAI/openarchivesprotocol.htm
Prior to the impetus that the Web has given to distribution of marked up documents, a distribution format had been developed for SGML, the document interchange format SDIF (International Standards Organization 1988). Now that there has been extensive commitment to its simpliÞcation per medium of XML markup on the Web (Chapter 5), there have been moves to establish a distribution mechanism for metainformation
140 PART B Operational Information Management
so that it may be shared for construction of databases. For example, the Open Archives Initiative (2002) has proposed an OAI distribution structure that makes metainformation available in response to a request as follows: -
Header:
comprising a unique identiÞer and a datestamp (relating to creation, modiÞcation or deletion)
-
Metadata:
a single manifestation of the metadata from an item, though the OAI protocol supports multiple manifestations (formats) of metadata for any single item
-
About:
an optional area to hold data about the metadata part of the record, for example concerning rights, and terms and conditions of use.
An example of such a record is shown in Figure 6.13. This chapter has looked at just some of the many forms of information distribution taking place digitally. They are representative examples, as in each case they require the expression of structured protocols for identifying the various elements particular to the application, which in turn is distributed according to the lower level systems interconnection protocols.
6.6
Further reading
Data communications Halsall (1996) is a comprehensive text on data communications that gives detailed description of network protocols. Stallings has produced a considerable body of work in this area, including business data communications work with Van Slyke (Stallings & Van Slyke 1998), and Goldman (1998) provides a more business-oriented approach. Works that focus on OSI include Dickson and Lloyd (1992) and Jain and Agrawala (1993).
Internet There is a plethora of books written about the Internet. Its speed of development means that for much material it is more appropriate to refer to publications online, for example the list of topics in the ‘Uncover’ area of december.com (December Communications 2002), or the IFLANET resource site (International Federation of Library Associations and Institutions 2002b), and GRNT (TERENA: Trans-European Research and Education Networking Association 2001), than to printed material. Among many books that explain the various communication facilities in addition to identifying resources are those by Aboba (1993), Connor-Sax and Krol (1999) and Moody (1996). Wilde (1999) is good for the emphasis on the architecture that markup such as SGML and XML provides.
EDI The UN directories for EDIFACT and trade data (United Nations 2002) give details of all formats. Berge (1991) provides an introduction with principles; Hendry (1993) details the functions of an EDI system with reference to applications such as transportation, retail, manufacturing, electronic funds transfer
CHAPTER 6 141 Creation of Information
and the standards that are involved. Parfett’s (1992) work on EDI gives an overview of principles from a European perspective. Krcmar, Bjørn-Andersen and O’Callaghan (1995) bring together a number of case studies on practical EDI application in Europe.
Transfer of bibliographic data Gredley and Hopkinson (1990) give detailed description of bibliographic exchange and standards for MARC format for both national and international formats. Hagler (1997) looks at these same formats, but within the wider framework of bibliographic organisation and retrieval. There are many detailed manuals that give descriptions of national formats, the most inßuential, and the one that provided the initial framework for bibliographic data, is the US MARC, which has been consolidated for a range of different information materials (United States Library of Congress Network Development and MARC Standards OfÞce 1994), and for which there is a Website (United States Library of Congress Network Development and MARC Standards OfÞce 2001). A guide to the many national MARC databases and services (International Federation of Library Associations and Institutions IFLA Universal Bibliographic Control and International MARC Programme 1993) has been regularly produced. There are many World Wide Web pages dealing with different MARC format matters of interest, and these should be sought with resource discovery tools. The UBCIM core programme has a page that includes pointers to UNIMARC (International Federation of Library Associations and Institutions 2002c) Erickson (1997) provides an explanation of Unicode along with a summary of character coding conventions preceding it.
CHAPTER 7
7
Organisation of information by agent ............................................................... Today we have naming of parts. Yesterday, We had daily cleaning. And tomorrow morning We shall have what to do after firing. But today, Today we have naming of parts.
Henry Reed (1946) Lessons of the War, `Naming of Parts'. The previous chapter concentrated upon the distribution of information. Once the distribution has happened, information that has been communicated may be acted upon immediately, or it may be stored for subsequent reference and action. When the storage process involves signiÞcant heterogeneous accumulation of recorded information, some form of organisation is required for the stored collection in order to make retrieval possible. The forms of organisation depend upon the environment in which the repository exists and the ways in which users approach the information. The organisation of a newspaper archive differs from that of a public library, or a set of personnel records or a collection of insurance claims. The organisation often requires the creation of additional records or documents (metainformation) to assist access to the collected information. That is why we have data deÞnitions and catalogues and indexes. Until we started to create electronic records, organisation and creation of metainformation was typically carried out subsequently to the creation of the original recorded information, so it was possible to conceive of organisation being part of an information life cycle. Now that so much information is created electronically, much of the organisation process may be carried out prior to, or simultaneously with, the creation of the recorded information. Therefore a database is structured before record content is produced. A hypertext document may be written by Þrst producing its linked structure, and then producing the text and images to which the links point. For typical business processing transactions, metainformation is conÞned to describing the entities and attributes for data elements of objects identiÞed during the analysis process. For example, in a payroll application it will comprise named elements such as employee name and salary and the characteristics of those elements, such as how many numerical characters are set aside for salary. The metainformation
CHAPTER 7 143 Organisation of Information by agent
may be carried separately from the actual source information database in another database such as a data dictionary. For many information management applications that involve description of bibliographic records, metainformation can operate at the level of the business transaction described above, but it may also be applied to additional descriptions of the original record, that have been developed for a variety of contextual and subject content purposes. For this reason, it is useful to consider descriptive metainformation from two aspects, agent and content as follows: 1. Agent The description deals with what the information medium is, the form that it takes and through which vehicle or instrument it is conveyed. For example, if the medium is a computer disk, the agent description includes such things as who wrote what is on it, what it is called, how much data it contains, and what type of equipment it may be read upon. Metainformation about an agent may be of many types. In the examples that follow, detail concerning each of the individual standards may be found via the IFLANET site (International Federation of Library Associations and Institutions 2002a). Aspects of description that have been addressed for digital applications include: -
Document description:
this has been most inßuentially developed for libraries per medium of MAchine Readable Cataloguing (MARC) format, for which there are many national instances within the ISO 2709 format for bibliographic information interchange (Chapter 6).
-
Responsibility:
this is information pertaining to the intellectual creation of the material. It has been adopted, for example, in the Text Encoding Initiative (TEI) using library cataloguing rules as a basis; it enables speciÞcation of elements such as author, sponsor, funder, principal researcher and other contributions, although the form of the creator is not categorised into personal, meeting or corporate, as in library cataloguing.
-
Administrative:
an example is embodied in the Encoding Archival Description (EAD) format; here it is applied to elements like access, or appraisal for retention scheduling information; it may also be used for location and Þnancial aspects.
-
Provenance:
an example is document source that is provided for in the Computer Interchange of Museum Information (CIMI) format to track origin and ownership. Acquisition information may also be pertinent here.
-
Configuration:
this provides descriptions to assist with processing of data such as Þle format or record size. These are typically carried in header or label information of digital records; alternatively it may be an element such as type used in Summary Object Interchange Format (SOIF) for specifying Þle types such as ‘binary’.
144 PART B Operational Information Management
-
Connections:
an application is the relation element speciÞed within Dublin Core that is intended to provide a means to express relationships between a discrete resource and other resources that may also be considered as discrete – these may be for example a periodical article and the periodical itself, an item in a collection and the collection itself, a Þle within a database and the database itself. Alternatively the relationship can be to resources that control the content of information in a particular Þeld, such as a thesaurus of descriptors or an authority Þle of organisation names.
-
Conditions of use:
this refers to elements that describe or link to availability statements, such the rights management element of Dublin Core that makes provision for links to a copyright notice, or a service for provision of information about terms of access, or rights management of the resource.
-
Preservation:
documentation of physical condition or requirements for preservation or digitisation or backup of originals.
2. Content The description deals with what the information on the medium is about. If the computer disk described above as the agent contained data on water level readings, then one might say that it is about hydrology. The process of content description is called knowledge representation, because it requires the rendering of people’s knowledge into a form that other people can draw upon. The knowledge is represented as information, and it is the information that may then be organised and managed. Content may be characterised by: -
Topic:
this may be expressed as keywords or descriptors from controlled vocabularies that are used to describe subject matter. Alternatively, a facet from a scheme embodying a standard notation may be used.
-
Coverage:
this relates to extent of the intellectual content encompassed spatially or temporally.
-
Role:
as used, for example, in museum description, this may be the context in which the subject matter may be used, for example, game playing or a type of exhibition.
For both agent and content organisation, it has been found useful to assist organisation of information by developing additional controlling information. This metainformation exists in such collections as data dictionaries that control contextual information about agents, and thesauri that control content description. These are dealt with in some detail in following chapters. Here the focus is more upon the data naming, modelling and description processes as they are used to organise information within the framework of software development irrespective of application. Then description processes in the particular case of the bibliographic environments common to information management are examined, again pointing to organising principles. Finally some typical personal computer software is considered to see the extent to which it makes use of such organising principles.
CHAPTER 7 145 Organisation of Information by agent
7.1
Data description
It should be noted that in the section above, information description was discussed but the heading for this section uses data description. This is in deference to common usage over time in the systems analysis and data processing disciplines, which has possibly come about as a result of the disinterested approach that an analyst will adopt when deÞning and naming data subjects - hence data elements rather than information elements. The very process of deÞning and naming the data that the analysts carry out, is an organising process that helps to build the data into information by providing context. Consider the need to come up with some data description for the purpose of building an inventory of different compact disks that is to be maintained for a repository. A typical item to be described is shown as it may appear in a catalogue record in Figure 7.1.
HF1106 .M Mazany, Pete, 1957Mike’s bikes [computer Þle]: integrated business learning online/ Pete Mazany, Andrew Sharpe. - 2nd ed. International ed. - [Auckland, NZ]: Irwin McGraw-Hill, 2000. 1 computer laser optical disc: sd., col. ; 4 3/4 in. Title from title screen. “A powerful tool that enhances your business skills” -- Cover. A computer-based business simulation system that enhances the integrated learning of a variety of business-oriented subjects. Issues included are: business strategy, marketing, new product development, operations, Þnance and accounting. System requirements: IBM PC or compatible, minimum 486; 8Mb RAM (16Mb recommended); 20Mb free hard disk space; CD-ROM drive; Windows 95/98/NT; sound card and speakers; Microsoft Excel. 1. Management - Simulation methods - Software. 2. Business - Simulation methods - Software. I. Sharpe, Andrew, 1969-
Figure 7.1: A record describing a compact disk
The punctuation used in this record and the parts of it in the top right hand and bottom left hand corners are conventions used by some libraries when printing or displaying catalogue records. Although catalogues are predominantly online, some legacy conventions remain and may be carried through to online displays. The top right entry indicates the way that the record has been classiÞed, and the added entries at bottom left indicate additional author and subject headings that may be used for lookup. In hypertext systems it may be possible to link directly to such lookup points (shown in bold to indicate the links) to Þnd additional records listed with them as data elements. The concern here is not the principles underlying why this catalogue record is displayed as it is. Instead, this record could be considered to be representative of all other records describing instructional disks in a database designed to describe software. If one establishes a description in a particular database package, it is likely to contain metainformation that does at least three things: deÞnes the names of elements, speciÞes a size in characters for each one, and nominates the characteristic of each element’s characters along the lines:
146 PART B Operational Information Management
Data element Title Author Publisher Date_of_Pub Disk_size . etc
Length 30 25 25 4 4 . etc
Type Alpha Alpha Alpha Date Numeric . .
.
If the database were to be established for personal use, one would not worry much about what to name the data elements. One would know what one meant. If the data relationships are simple, for example if the database packages being catalogued only ever have one author, one version and so on, then one may be able to get away with deÞning the database in a single table such as the one above. However, if there are multiple instances possible for Þelds, such as the subject headings that appear at bottom left in the example of Figure 7.1, or if the database is to be built for multiple users in an organisation, who may approach the data in different ways, then signiÞcant attention must be paid to both the data naming and the data relationships. Creation of consistent computer programs and databases leads to easier maintenance of those programs and databases over time. An important aspect of this is to have consistent naming standards for the data that are to be manipulated. When software development is being carried out in such a way that integrated data architecture for an organisation is sought, naming consistency between different applications assists in user understanding and quality control. The software for assisting consistency is incorporated within data dictionary or information repository systems. This is examined later in Chapter 8. Here the focus is on formalised approaches to description. Although there are a number of widely known formal approaches to data analysis that are important techniques for deÞning data relationships in systems analysis, there is little in the way of a generally accepted approach to standardised naming of the data that are being described. One who has tried to remedy this is Brackett (1994; 1996). Within the framework of what he calls a common data architecture, he has established a methodology of naming. His terminology and method are considered here.
7.1.1
Data naming What’s in a name? That which we call a rose By any other name would smell as sweet.
Shakespeare Romeo and Juliet Act II, scene 2 Brackett anchors his approach in semiotic theory (another inßuence of semiotics was examined in Chapter 3, in the context of a communication of information model), and maintains that data naming must be considered in terms of the syntactic, semantic and pragmatic approaches to signs and symbols: their representational syntax, meaning and use. If one assumes that one is deÞning databases and transaction processes for an organization, then any things that have to be described are subject data, which will be stored in a data repository, and must be named. • Data subject This is a person, place, thing event or concept about which an organisation collects or manages data. Data subjects are identiÞed by an enterprise’s view of the real world. For example, a bank would
CHAPTER 7 147 Organisation of Information by agent
have data subjects such as ‘accounts’, ‘customers’ and ‘payments’. An educational institution would have data subjects such as ‘teachers’, ‘students’ and ‘results’. A law enforcement agency would have data subjects such as ‘offenders’, ‘crimes’ and ‘apprehensions’. Many organisations share at least some similar data subjects. For example, most have to pay their employees, so they may have data subjects such as ‘employees’ and ‘payrolls’. However, consistent naming of the data subjects is often not employed. For example ‘staff’ may be used in one context and ‘employees’ in another. Data subjects are commonly known as entities for data description. • Data characteristic This is an individual feature or trait of a data subject. A data subject is described by a set of data characteristics such as ‘birth date’, ‘salary’, ‘height’, ‘colour’ and ‘account type’. Different data subjects may have the same data characteristics. For example, both a student and employee can have a ‘birth date’, though of course the values that are adopted may well vary. Data characteristics are commonly known as attributes for data description. • Data value This represents the individual facts and Þgures contained in data characteristics. An employee’s age may have the value ‘2 years’; an account balance may have the value ‘$150’. The value may also be in encoded form, for example age may be represented as ‘1’, ‘2’ or ‘3’ where ‘1’ represents under 30 years, ‘2’ represents between 30 and 60, and ‘3’ represents over 60. The set ‘1’, ‘2’, ‘3’ in this example would be called a data code set as it is a set of related codes. • Data occurrence This is a logical element that represents one existence of a data subject in the real world. For example, ‘Commonwealth Bank’ is represented by a data occurrence in a ‘Financial Institution’ data subject. Data occurrences may also be called instances in data description. • Data repository This is a speciÞc location where data are stored. Formalising the naming for each of the above elements assumes that: -
Syntactics
requires that the formal data names used for any of the preceding elements have a speciÞc, consistent format; if the format varies from one data name to another, it becomes more difÞcult for both software developers and users to understand the data names. Using names that are formed differently like ‘Name_of_employee’ and ‘Acct:no’ in the same repository is a confusingly different approach to syntax.
-
Semantics
requires that a formal data name have a speciÞc meaning; therefore a name like ‘Name_of_employee’ needs to have a deÞnition such as ‘the family name of an employee followed by all forenames spelt out fully’.
-
Pragmatics
requires that the formal data name should have practical value; elements should not be deÞned unless values are going to be retained as data occurrences for database or transaction use. ‘Name_of_employee’ may not require use if there are already elements for ‘Family_name_of_ employee’ and ‘Forename_of_employee’, unless for some applications it may be appropriate to group these two together under one name.
148 PART B Operational Information Management
Using a naming convention is meant to avoid problems such as those that occur in large organisations when the same data are used for different applications within the organisation. Symons and Tijsma (1982) describe an example of this for the name ‘delivery date’ that came from an analysis of data libraries in different parts of N.V. Philips Gloeilampenfabrieken in the Netherlands. Figure 7.2 shows different deÞnitions of data characteristics that seem to be referring to the same data. Name
Definition
*(Translated from Dutch)
1
‘Required Delivery Period’
Date on which the customer expects the ordered goods available.
2
*’Asked Delivery Period’
*Asked period (in weeks or months) within which goods must be sent to the customer.
3
*’Date Outstanding Order’
*Date on which a still outstanding order on a supplier for an article code must be available for issue by the factory store.
4
*’Delivery Date’
*Indicates the Monday of the week in which the component must be delivered.
5
‘Delivery Date (Despatch)’
(No definition)
6
*’Order Delivery Date’
*Date on which a particular order must be delivered. According to Concern calendar code.
7
‘Delivery Date’
Date on which a purchase order is scheduled to be delivered by a supplier.
Figure 7.2: Naming of the same subject in different data libraries from Symons & Tijsma (1982) by permission of British Computer Society TYPE CLASSIFICATION
EXAMPLES
(Identifiers) 1
Points or areas in geographical ‘space’
Countries, districts, ports, addresses, etc.
2
Points or area in organisational space
Organisations, departments, functionaries
3
Points or extents of time
Periods, dates, time of day
4
Individual persons Examples of subclass: sex, grade
Employees, pensioners, etc.
5
Products and resources that are necessary for manufacture, transport an packaging Example of subclass: Inspection-procedure
Machines, tools, containers, ships
6
Information units and the resources necessary for information storage Example of subclass: Processing status
Transactions, orders files, volumes, media
7
Units of measure and currencies
8
Accounts, projects, activities and the like
9
Miscellaneous including abstract concepts
Colour, language
(Measures) M
Amounts
P
Prices and Tariffs
Q
Quantities
R
Ratios, percentages, Indices, Factors
T
Times
Financial amounts
Lengths, weights
Figure 7.3: Type classification method used by Philips from Symons &Tijsma (1982) by permission of British Computer Society
CHAPTER 7 149 Organisation of Information by agent
This is typical of enterprises where different divisions of the enterprise are using the same data but in different ways. Any form of system integration should take into account the different naming approaches that have been undertaken within an organisation and endeavour to rationalise them. In this case, a systematic naming methodology was implemented with success. It is particular to their company, but has an approach that can be adopted generally. They adopted a classiÞcation scheme for data elements, part of which is shown in Figure 7.3. For each of these type classiÞcations they produced a List of Permitted Aspect-Terms (LOPA) that were the limited number of terms needed to represent the aspects across different type classes. An example of an entry from LOPA is shown in Figure 7.4. This example shows how to resolve an unambiguous deÞnition of the ‘delivery date’ problem in Figure 7.2. When detailed records of such standardisation are maintained, they are kept in an information repository or data dictionary. ASPECTS ASPECTTERMS
KIND OF VALUE Code Description
SUBJECT CONCEPT Date Week
VALUE FORM Abbreviated Full YWW YWWD YYMMDD YYDDD
STATUS Ordered Confirmed Actual
ROLE Issue Shipment Receipt
RELATED CONCEPT Order Product
Example “Formal definition” Code, Date, YWWD, Ordered, Receipt, Product Corresponding possible “User definition” Date, coded in the form YWWD, stated in an order, at which the customer wants to receive the goods
Figure 7.4: Extract from List of Permitted Aspect-Terms from Symons & Tijsma (1982) by permission of British Computer Society
7.1.2
Generalised naming
A data naming convention should make possible the successful management of the various data in an enterprise by enabling users to communicate effectively using shared understanding of meanings. A naming convention should mean that different people independently trying to establish a name for the same data subject should produce the same name. One approach is to try to develop a standardised approach to abbreviating names, for example ‘month end payment’ EOM-PMT, and ‘automatic customer payment’ AUTO-CUST-PMT. This would entail a standardised list of abbreviations. It also implies a distinction between a business name by which an element is known to general users (‘automatic customer payment’), and an access name (AUTO-CUSTPMT). Attempts have been made to go beyond abbreviations to establish rules for naming. Is it possible to establish a generalised syntax that can be used for data naming in any speciÞc context? Brackett (1996) has proposed one. He notes that a commonly used naming convention is entity-attributeclass. Others have alternatively described this as: role-type-class prime-descriptor-class
150 PART B Operational Information Management
entity-adjective-class entity-attribute-class word entity-description-class entity keyword-minor keyword-type keyword entity keyword-descriptor keyword-domain Other terminology that is applied includes ‘modiÞer’ which is usually an alternative to attribute, and ‘qualiÞer’ which refers to a further elaboration of class. For example, if class were date then qualiÞer might be month. In Brackett’s terms the entity is the data subject deÞned above, the attribute is the data characteristic deÞned above, and the class is a standard keyword that indicates the type and general format of the attribute such as ‘date’, ‘number’, ‘description’. However, such conventions often do not follow semiotic principles in that name components do not provide any meaning about the structure of the data name, they may have inconsistent abbreviations, and they do not indicate the data use or structure, the physical location in repository (data site) or variations in data characteristic that need to be uniquely identiÞed. He therefore proposes the standard naming taxonomy: Data site:Data subject.Data characteristic,Data characteristic variation -
Data site name uniquely identiÞes physical data location. Examples might be: Payroll Head office payroll Vehicle register Tool catalogue
-
Data subject names uniquely identify data subjects, hence: Payroll:Employee Vehicle register:vehicle
-
Data characteristics names differentiate data subject attributes, hence: Payroll:Employee.Name Vehicle register:Vehicle.Model
-
Data characteristic variation consists of words that uniquely identify the variation of a data characteristic. In the case of Vehicle.Model the differentiation may be by text name ‘Ford’ Falcon sedan’ (Body type) or ‘Ford Falcon XRS8’ (Model code).
So examples of differentiated names would be: Vehicle register:Vehicle.Model,Body type Vehicle register:Vehicle.Model,Model code Similarly, a telephone directory may be deÞned to include Directory: Person.Full Name,Inverted For a particular enterprise, the types of names used would be evolved within a set of categories. This approach has been used for example, in the ‘OF’ language. IBM introduced this as a generalised naming system or data description language in connection with its data dictionary software DB/DC. The principle behind this was to introduce a naming convention of the general form: CLASSWORD/connector/PRIMEWORD/connector .../MODIFIER
CHAPTER 7 151 Organisation of Information by agent
The classwords are a limited set of terms for general description of data. For example (Holloway 1988, p. 86): N
Name
Alphabetic data that identify specific entities
Customer name Supplier name
#
Number
Alphanumeric data that identify specific entities Order number
Part number
The connectors are limited to a set of six with corresponding symbols. These symbols are used to connect classwords establishing a semantic relationship between them. Of
.
(Dot, full stop or period)
Which is/are
*
(Depending on whether preceding term is singular/plural)
Hyphen
-
(to produce a compound phrase from 2 or more terms)
Or
:
And
&
By/Per/Within
/
Therefore a data name that is meant to represent the ‘vehicle model in coded form that is damaged or stolen’ might be: MODEL*CODED.VEHICLE*DAMAGED:STOLEN In this case, the classword category is likely to be ‘Code’ (symbol C).
7.2
Data modelling
The application of data description to computer databases must be accompanied by data modelling that makes it possible for the analysis that has taken place of processes in the real world to be converted into something that the physical computer system can handle. We normally separate data models into physical data models that reßect the internal constraints of the computer system on which the data are stored and logical data models that reßect the way that users conceive of the data irrespective of how it is stored. These may be understood within the structure of a three-level architecture, the internal level representing the machine-storage view, the external level representing the various user views of the data, and a conceptual level that reconciles the different user views through creation of logical models.1 The separation of levels achieves data independence. This means that changes in the logical models may be made without affecting how the data are stored, and vice versa. The application programs are unaware of the physical parameters such as Þle sizes applied to data sets or the number of individual records. These parameters are handled by a database management system. Data modelling procedures are extensive and have a theoretical framework developed enough to warrant textbooks in their own right, of which there are many. Some of these are referenced in the further reading at the end of this Chapter. A brief outline of the central concepts follows.
7.2.1
Internal data models
The physical organisation of data on digital storage devices must take into account the physical characteristics of the devices, such as magnetic disks, the requirement of retrieving data quickly from large Þles, and the maintenance of such Þles by inserting, modifying, and deleting data while making best use of the available storage. A database administrator will want to implement the relationships between hardware, software and data that permit appropriate space utilisation and user access. Figure 7.5 summarises different approaches to Þle organisation.
152 PART B Operational Information Management
Organisation
Access
Sequential
Record order in files is the same as the order in which they were written to file
Indexed
Separate tables are established to provide an index to the records which may be stored sequentially or nonsequentially
Direct
A hashing algorithm produces a place to locate individual records situated at relative addresses
Figure 7.5: File organisation
• Sequential organisation This provides merely for a Þle being ordered in the sequence that records are written onto it. The sequence is typically in the order of a primary key (the data element that uniquely identiÞes a record). This type of organisation is used in transaction processing such as for payroll work where every record in the database needs to be accessed for a particular processing run. For the sake of illustration, in the example the key is assumed to be a data element named SPORT containing a value. Each record contains the key on the left hand side, and all of the remaining data within the record on the right hand side. A sequential organisation would appear as in Figure 7.6:
Archery
Cricket
Hockey
Netball
Tennis
Figure 7.6: Sequential organisation
• Indexed sequential organisation This has the records arranged in the same way as for sequential organisation, but in addition other Þles are built that provide keys and indexes to the records so that traversal of a complete Þle of records is unnecessary in order to get to a desired record. Such organisation is appropriate for Þles that need to provide random access capability but periodically need transaction processing on the entire Þle. An equivalent in print form is the type of dictionary you see that indicates at the top of the page outside the main text, what the Þrst word is on the page. If the stored records are non-sequential, a full inverted index is created to provide access. If the records are sequential, an indexed sequential organisation may appear as in Figure 7.7: It is noticeable that this organisation produces a tree-like structure. Those who imagine that the records at the base are the roots of the tree need to stand on their heads and think again! In data modelling, the top-level node (level 0) is termed the root node. Successive branches produced to get to a record are sibling nodes and given level numbers. The base level where the required records are contained, and where there is no further branching, contains the leaves. Trees have a strong inßuence on the access times from databases, and much work has gone into optimising their structures.
1
Database management systems are usually considered at three levels of abstraction as a consequence of work done during the early 1970s by the American National Standards Institute’s DBMS Standards Planning and Requirements Committee (ANSI/X3/SPARC).
CHAPTER 7 153 Organisation of Information by agent
Figure 7.7: Indexed sequential organisation
Figure 7.8: Direct organisation
• Direct organisation This means that records are assigned numbers as they are created. The records are retrieved based upon correspondence of the key with the record number. The numbers may be relative to the beginning of a Þle, in which case the Þle appears ordered like the sequential Þle, but a record may be retrieved by its number without traversing the Þle. Alternatively the records may be ordered by what is called a hashing routine. In this case, an algorithm (or set of instructions) takes the primary key value and converts it into a record number. The object of this is to provide a randomised distribution of new records into storage.2 These types of Þles are created in order to facilitate random updating and retrieval for databases such as those used by the stock exchanges, or with reservations systems.
7.2.2
Conceptual data models
The many people who make use of a database do so in quite different ways. The conceptual modelling process produces a logical model to accommodate each of these views. A director of an organisation is concerned to see summary and condensed data that reßect factors of interest to the organisation as a whole; a data administrator is concerned with the standardisation and particularities of all data needed to describe the organisation’s processes; an applications programmer needs to invoke relationships between a limited amount of speciÞed data by means of the software available to the organisation; a business analyst may wish to extract and manipulate certain data elements that contain statistical data. They differ in the ways that they view the data. This variety of external views must be analysed before the description and organisation of data is undertaken. The analysis is carried out by examining, understanding and modelling the information relationships within 2
Provision must be made for organising records that initially have the same record number generated for them, as is possible with hashing routines.
154 PART B Operational Information Management
an enterprise, as described in Chapter 17. Many techniques may be adopted to assist this process. There is computer-based assistance available using CASE (Computer-aided Software Engineering) tools. These tools are capable of modelling using different types of schematics for data and process relationships. Many are able to support the naming and description of databases using modelling of relations, as well as maintaining information repositories as described in Chapter 8. If one were to reconsider some elements of the database that contain records like those shown in Figure 7.1, one could think in terms of a Þle that describes computer software or data as in Figure 7.9 that takes some of the elements, including the repeating ones for author and author_loc. Record #
Title
Version
Type
Author #1
Author_loc #1
Author #2
Author_loc #2
Publisher
24
Mike’s bikes
2
CAI
Mazany, P.
Auckland
Sharpe, A
Auckland
IM-H
38
Arrac
1.4
Program
Austin, G.
Amsterdam
Scott, M.
Manchester
Snooz
56
MicroSQL
1.20
Program
Infosystems
Brisbane
134
Verex
4.1
Data
Grant, K
Sydney
Lee, P
Singapore
Infplaza
198
Arrac
2.0
Program
Austin, G
Singapore
Scott, M.
Brussels
Snooz-Allen
222
Verex
4.2
Data
Dorn, D.
Sydney
Infosystems
Mirrin
Figure 7.9: Flat file representation of software description
Different logical models have been developed for describing information that is to be carried in database management systems. They are distinct logical ways of looking at what in Figure 7.9 may be called a ßat Þle representation of part of the information to be considered. In this Þgure one can regard the software package as the data subject or entity to be described, or alternatively the agent to be organised. Its data characteristics (attributes or Þelds) are represented by the column headings. The data values appear in the tables below the headings. However there a number of alternative ways in which we may conceive of the data in order to organise it for database processing software: • Hierarchical and network modelling When organisation of computer databases was Þrst taken seriously during the 1960s, the model Þrst used for structuring data was the hierarchical model. Although database systems that have been created in more recent times do not use this approach, there are still many applications in operation. Typical of these are applications based upon IBM’s Information Management System. This model worked on the basis that data were organised in parent-child relationships. Therefore we could regard our description of the software as a parent, and the author of the software as a child. Initially this model did not allow for children to have more than one parent. This meant that for a data element like ‘location’ there would have to be separate children for author location and publisher location. A group called the Conference on Data Systems Languages Data Base Task Group (CODASYL/ DBTG) promoted development of the so-called network data model for databases during the early 1970s. They developed a model that was implemented on a number of mainframe computers by organisations such as DEC, CDC and Univac. The network model grew from the hierarchical model, and in particular permitted any record type to be associated with any other record type. Network modelling of the same data is illustrated in Figure 7.10. A ‘location’ element appears and may be used in association with both author and publisher. Similarly, authors may be associated directly with publishers, as well as with software. The arrows joining these would not have been permitted in the hierarchical model, which would have required separately deÞned author elements for software and publisher, and separately deÞned location elements of author and publisher.
CHAPTER 7 155 Organisation of Information by agent
Figure 7.10: Database organisation from network model
• A relational modelling approach If one were to take the ßat Þle data from Figure 7.9 from earlier and decompose it into tables, according to a set of principles that enable an organised data structure, manipulation by software standardised according to SQL principles (Chapter 11), and with an ability to maintain data integrity, then one would be following a relational modelling approach. Relational modelling is the process of turning tables into relations. Each relation is named and comprises a set of named columns that organise the undeÞned number of rows that will ultimately hold instances of data – the data values for each entity. We achieve this state if we have minimised inherent redundancy, and can allow modiÞcation of tables without introduction of inconsistencies. This is achieved when designing the tables by going through a process of normalisation. -
First normal form
is achieved by removing repeating groups from a table; the data in Figure 7.9 are converted to this in Figure 7.11 making it a relation in the Þrst normal form.
-
Second normal form
is achieved by eliminating partial dependencies. This means taking a Þrst normal form table and making every non-key attribute fully functionally dependent upon the primary key; in the example already used, ‘record#’ does not fully deÞne a row – a row is deÞned by a composite key comprising ‘record#,author’ – therefore attributes such as ‘title’ and ‘version’ are only partially functionally dependent upon the key because they depend upon the ‘record#’ only. Figure 7.12 shows the creation of two relations that together create full functional dependence upon ‘record#’ as primary key.
156 PART B Operational Information Management
-
is then achieved by taking relations in second normal form and eliminating transitive dependencies – a transitive dependency is dependency between non-key attributes. In the example above, ‘title is dependent upon ‘record# and ‘type’ is dependent upon ‘title’ so one should establish further relations in order to remove this dependency. These are shown in Figure 7.13. Here the relations ‘softvers’ and ‘softitle’ have been created from the software relation to transform it to third normal form.
Third normal form
Record#
Title
Version
Type
Author
Author_loc
Publisher
24
Mike’s bikes
2
CAI
Mazany, P.
Auckland
IM-H
24
Mike’s bikes
2
CAI
Sharpe, A.
Auckland
IM-H
38
Arrac
1.4
Program
Austin, G.
Amsterdam
Snooz
38
Arrac
1.4
Program
Scott, M.
Manchester
Snooz
56
MicroSQL
1.20
Program
Infosystems
Brisbane
Infosystems
134
Verex
4.1
Data
Grant, K
Sydney
Infplaza
134
Verex
4.1
Data
Lee, P.
Singapore
Infplaza
198
Arrac
2.0
Program
Austin, G
Singapore
Snooz-Allen
198
Arrac
2.0
Program
Scott, M.
Brussels
Snooz-Allen
222
Verex
4.2
Data
Dorn, D.
Sydney
Mirrin
Figure 7.11: First normal form of flat file from Figure 7.9 eliminating repeating groups Software
Author_pl
Record#
Title
Version
Type
Publisher
Record#
Author
Auth_loc
24
Mike’s bikes
2
CAI
IM-H
24
Mazany, P.
Auckland
38
Arrac
1.4
Program
Snooz
24
Sharpe, A.
Auckland
56
MicroSQL
1.20
Program
Infosystems
38
Austin, G
Amsterdam
134
Verex
4.1
Data
Infplaza
38
Scott, M
Manchester
198
Arrac
2.0
Program
Snooz-Allen
56
InfoSystems
Brisbane
222
Verex
4.2
Data
Mirrin
134
Grant, K
Sydney
134
Lee, P
Singapore
198
Austin, G
Singapore
198
Scott, M
Brussels
222
Dorn, D.
Sydney
Figure 7.12: Second normal form derived from Figure 7.11
CHAPTER 7 157 Organisation of Information by agent
Softvers
Softitle
Record#
Title
Version
Publisher
Title
Type
24
Mike’s bikes
2
IM-H
Mike’s bikes
CAI
38
Arrac
1.4
Snooz
Arrac
Program
56
MicroSQL
1.20
Infosystems
MicroSQL
Program
134
Verex
4.1
Infplaza
Verex
Data
198
Arrac
2.0
Snooz-Allen
222
Verex
4.2
Mirrin
Figure 7.13: Third normal form of software relation from 7.12
This may seem normal enough for the novice, but there are circumstances where it becomes necessary to carry out further normalisation to prevent anomalies. This happens in situations where a relation has multiple functional dependencies, or multivalued dependencies where there are at least three attributes. For example in the author_pl relation of Figure 7.12, each ‘record#’ may have multiple values of ‘author’, and each ‘author’ may have multiple values of ‘author-loc’, so a fourth normal form may also be derived. • Object-oriented modelling Object-oriented programming techniques have been adapted for use in database modelling in what is generally termed an object-oriented data model (OODM). In OODM, the object is like that which has previously been called a data entity or data subject. However it may be deÞned to have not only attributes, but also methods that act upon it. An object is an abstraction from something in the real world. From the sample database earlier that describes software on CD, the program ‘Mike’s bikes’ may be regarded as an object that has attributes such as ‘author’ and ‘version’. The OODM approach encapsulates both attributes and methods. One might for example, also include a method called ‘Presentsum’ in the object description. ‘Presentsum’ could carry out a computation on the cost of the individual contracts contributing to the software, adding them up and presenting a total. Encapsulation means that such properties in the description are not readily visible - they are hidden from the user. The data may be used and methods may be employed nevertheless, using interfaces between objects. Assuming the example of description of software were also to include amounts paid to the authors for creating the software, then the object ‘software’ that is invoked by a user to itemise software automatically determines the cost based upon a calculation of the amounts paid to authors. From within that object, there may be an interface to another object that contains a summing calculation. The person who invokes the task is not conscious of the calculation being performed. Objects that have similar attributes and methods are called classes. A group of publishing programs may be grouped into a broader class called ‘publishing software’, which may be part of a greater class called ‘software’. Inheritance in OODM means enabling each class to take on methods and attributes of the broader class. In the above example, the types for programs, ‘CAI’ and ‘data’, are all part of the larger class software, and as depicted are mutually exclusive. Figure 7.14 shows an example of object modelling that works on the basis that the record of Þgure 7.1 is one of the class of objects that may appear in a catalogue. If the class were items that are available in a library, then there could be subclasses for different types of material. The attributes and methods of the class are common to all subclasses – for example any item in the collection may be lent, so lend is a method applying to the object, and is inherited by subclasses such as books and
158 PART B Operational Information Management
sheet music. Some attributes and methods are particular to subclasses. They are ‘uncommon’ such as the method ‘backup to disk’, and the attribute ‘computer’ (meaning which computer the software should be run on), which are speciÞc to the software subclass. Inheritance is just one of the relationships that may exist between objects, but it is the strongest. A weaker relationship is aggregation, which applies when an object forms part of another object. The catalogue object may form part of another object called collection (which has its own attributes such as size and location). Then there is association, which applies when some attributes of one object are determined by its association with another. In Figure 7.14 the catalogue object may interact with another object called loan record. If the catalogue object has attributes such as loan status or return date, these would be determined by the attributes of the loan record with which it is associated. Weaker still is the dependency association. This means that one object needs to be informed about another. So in Figure 7.14, the catalogue object may have a dependency upon another object that deals with purchase orders. A purchase order for another copy of a document that is an instance in the catalogue will affect the catalogue object (for example an attribute dealing with number of copies).
Figure 7.14: An OODM with attributes, methods, and exclusive subclasses
The concept of primary keys does not apply in OODM. Objects have an identity external to data values that are stored in the object. The identity of an object is immutable, and will not change as the values of its attributes change. The need for normalisation that we have for a relational database is unnecessary with the object modelling, and objects can include other objects whose properties are inherited. The data code set of instances (domain) for any attribute in OODM may be of any data type such as integers, decimal, etc., or abstract data types deÞned by users that may include rules and event conditions with respect to applications such as geographic information systems, computer-aided design, or process control.
CHAPTER 7 159 Organisation of Information by agent
This section has attempted nothing more than a superÞcial look at data modelling. The intention of this introduction, however, was to give some feel for why the description of information for transaction databases and the description of information for bibliographic databases are sometimes at odds. This is despite the fact that in each case metainformation is being created in order to encompass information processing requirements.
7.3
Document description
Whereas data description is concerned with analysing information ßows in organisations and creating documents in the form of databases to capture and utilise the information relationships between any entities or objects, document description is concerned with taking existing documents in whatever form as the entities and describing their information content. Some general principles have been developed for this.
7.3.1
Levels of description
When describing published documents (where a document in a broad sense means the medium of record for any information that has been recorded), it is useful to categorise levels of description as follows: -
Analytical level
is description of a component part – the level of description where the bibliographic item is part of a larger item, and for which the bibliographic description cannot stand alone. Description of the component is dependent upon description of the containing item. Examples are an article in a journal, a track on a sound recording, a conference paper on a microform.
-
Monographic level
of description means that the item being described is complete in one physical part, or is intended to be completed in a Þnite number of parts. It is a self-contained bibliographic entity such as a book (which may be in several volumes), one map from a series, one issue of a compact disk containing a database that is updated on a regular basis.
-
Serial level
of description refers to an item issued in successive parts, and intended to be continued indeÞnitely (even if not another part appears after the Þrst). A periodical is the most obvious example, but another example would be the compact disk database referred to under monographic level, but considered as the ongoing series rather than as an individual item.
-
Collective level
of description refers to a made-up grouping of at least two, but usually more separately titled works. Examples are a collection of paintings in an art gallery, or grouping of memorabilia in a variety of formats about an individual.
One can see that items may be addressed at different levels within themselves. For example, some digital mapping data may be described analytically as a Þle relating to a particular city, which is part of a magnetic tape (described monographically) containing several Þles of cities in a country, which is part of a series of Þles (described serially) issued for various countries, which has been grouped together with some printed photographic images of the same places and the lot described as one thing, collectively! An approach to levels has been followed, but in a somewhat different manner in the Þeld of archives and records management. A key document for guiding archival description is Manual of archival description
160 PART B Operational Information Management
(Procter & Cook 2000). It interprets archival arrangement in terms of levels (Cook 1993) along the following lines: Level 0 Repository holdings 1 Management groups
2 Groups 3 Series
4 Items
5 Pieces
Examples - Any system of multi-repository Þnding aids representing more than one collection - OfÞcial archives for different levels of government, health care, schools - Business - Ecclesiastical - Finance committee of a group - SpeciÞc school - Land use reports - Board minutes - Registers - Log books - One volume of minutes - A log book - Discrete Þle of correspondence - Pages - Memoranda
The four lowest levels are based upon observable physical entities: -
Level 2 Group:
for a whole accumulation from the work of some distinct organisation or activity – it could reasonably be equated with collection for published material. In archives it is given substance by ‘respect de fonds’, the principle of provenance. Archivists avoid mixing materials of different provenance, and provide clients with explanation of background, context and origin of materials held. Level 2 is thus also referred to as ‘fonds’. Subgroups (sub-fonds) may contain archives of subordinate functional or administrative groups that are natural units in an enterprise.
-
Level 3 Series:
characterised by physical unity and deÞnable function in the administrative system that created the archive; therefore the set of documents has been created by the same original process of Þling or compilation. It is sometimes also known as the class. In large repositories, it may be preferred as the basic unit for archival and intellectual control.3
-
Level 4 Item:
this is the physical unit of handling, the object that may be picked up, packed for storage and produced for usage, somewhat equivalent to monographic level for published material. It may also be known as the Þle.
-
Level 5 Piece:
the smallest indivisible units, akin to analytic description.
The general international archival description standard (International Council on Archives 2000) accommodates multilevel description, under a broadest grouping category of ‘fonds’, but is not prescriptive concerning intermediate levels, and suggests a lowest level of ‘item’.
CHAPTER 7 161 Organisation of Information by agent
7.3.2
Bibliographic description
The format and content of library catalogue records have a long history of development. In recent years there has been increasing standardisation, particularly among English speaking countries with respect to catalogue records. This has been codiÞed in International Standard Bibliographic Description (ISBD) and the Anglo-American Cataloguing Rules (AACR). ISBD consists of a formalised approach to description of all types of information materials for libraries, ISBD(G); accompanied by specialised subsets that deal with particular types of materials such as serials ISBD(S); books ISBD(M); so-called non-book materials ISBD(NBM); electronic resources ISBD(ER) (formerly computer Þles - CF); printed music ISBD(PM); antiquarian materials ISBD(A); and cartographic materials ISBD(CM). Most of these formats were devised during the 1970s, and appeared as revisions during the 1980s and 1990s. Figure 7.15 shows some examples of bibliographic description for cartographic materials, in this case maps, based upon ISBD(CM) (International Federation of Library Associations and Institutions 1987). ISBD essentially deÞnes a layout and a group of delimiters that separate what are called the different elements. (Note that in data deÞnition terms for databases, these may well be called attributes of an entity – where the entity is the bibliographic description. This is another example of the terminological variation between different disciplines in information management.) 1.
Middle East / J. Bartholomew & Son. - 1:4 000 000 ; conic proj. - Edinburgh: Bartholomew, cop. 1985. - 1 map : col. ; 71 x 93 cm. folded in cover 26 x 14 cm. - (World travel map). - ISBN 0 7028 0337 5.
2.
B.N.O.C. U.K. continental shelf licence interests, 31st December 1981 / cartography by BNOC Exploration Drawing OfÞce. - [1:2 000 000 approx.] ; Lambert conical orthomorphic proj. two standard parallels (W 15°-E 6°/N 64°-N 48°). - Glasgow : British National Oil Corporation, [1982?]. - 1 map : col. ; 93 x 54 cm.
3.
Plan de Paris à vol d’oiseau / original dressé et dessiné par G. Peltier de 1920 à 1940. - Réfection en 1974 par André Charon. - [1:7 000 env.] (E 2°20’/N 48°52’). - Paris: Blondel La Rougery, 1974. - 1 plan en 2 feuilles: en coul. ; chaque f. 128 x 87 cm, sur f. 132 x 91 cm.
4.
Germany = Deutschland = Germania = Allemagne = Alemania / Philip International. - 1: 1 000 000 (E 5°-E 14°/N 55°-N 47°). - London: George Philip, cop. 1976. - 1 map : col. ; 96 x 85 cm, folded to 25 x 13 cm. - (Road map = Carta automobilistica = Carte Routière = Mapa de Carreteras ; 3)
Figure 7.15: Examples of ISBD for cartographic materials
For presentation in print form, different areas of the description have prescribed punctuation. For example, in what is called the Title and statement of responsibility area that begins each of the four descriptions shown, the layout is as follows: Element Accompanying delimiter Title proper General material designation enclosed by [] Parallel title preceded by = Other title information preceded by : First statement of responsibility preceded by / Subsequent responsibility statements preceded by ; Similar prescriptions apply to each area of the description.
3
In Australian practice, there is a standardised form for series registration.
162 PART B Operational Information Management
International standard description has now also been adopted for unpublished documents as well, through the medium of archival description embodied in ISAD(G) (International Council on Archives 2000). General archival description of a document consists of the elements depicted in Figure 7.16. 1. Identity Statement Area [where essential information is conveyed to identify the unit of description] Reference codes Title Dates of creation of the material in the unit of description Extent of descriptive unit (Quantity, bulk or size) Level of description 2. Context area Creator(s) Administrative/Biographical history Archival history Immediate source of acquisition or transfer 3. Content and structure area [To enable users to judge relevance based upon summary of scope and content of unit of description appropriate to level of description] Scope and content Appraisal, destruction and scheduling Accruals (to inform about foreseen additions) System of arrangement 4. Conditions of access and use area [To provide regulatory information] Conditions governing access Conditions governing reproduction Language/scripts of material Physical characteristics and technical requirements Finding aids 5. Allied materials area [where information is conveyed about materials having an important relationship to the unit of description] Existence and location of originals Existence and location of copies Related units of description Publication note 6. Notes area [for information not accommodated by other areas] 7. Description control area Archivist’s note (explaining how and by whom the description was prepared) Rules or conventions Date(s) of descriptions
Figure 7.16: Standard class description elements for archives, reprinted with permission ICA
7.4
Markup for document organisation
When markup was examined in Chapter 5, it was in the context of document creation. Over the last few years standards have progressed to the point where markup can embody the way the agent of information (a document) is described, so that markup may now be regarded as an organising structure. 7.4.1
Using data definitions for documents
The Data Type DeÞnition (DTD) that was introduced in Chapter 5 is that part of SGML that determines the organisation of a document. It speciÞes a class of document and then is used to declare the elements within a document. For example, the DTD described there and shown here as Figure 7.17 is for a communication called a Note that may be used internally in a company ofÞce environment.
CHAPTER 7 163 Organisation of Information by agent