Data Base Systems: Proceedings, 5th Informatik Symposium, IBM Germany, Bad Homburg v. d. H., September 24 - 26, 1975 (Lecture Notes in Computer Science)

Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis 39 Data Base Systems Proceedings, 5th informatik S...

Author: M. Hasselmeier | W. G. Spruth

16 downloads 679 Views 20MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis

39 Data Base Systems Proceedings, 5th informatik Symposium, IBM Germany, Bad Homburg v.d.H., September 24-26, 1975

Edited by H. Hasselmeier and W. G. Spruth

Springer-Verlag Berlin-Heidelberg • New York 19 76

Editorial Board P. Brinch H a n s e n . D. Gries o C. Moler • G. Seegm~iller. J. Stoer N. Wirth

Editors Helmut Hassetmeier Dr.-Ing. Wilhelm G. Spruth IBM D e u t s c h l a n d EF G r u n d l a g e n e n t w i c k l u n g S c h 6 n a i c h e r Stra6e 2 2 0 703 B0blingen/BRD

Library of Congress Cataloging in Publication Data

Informatik S~,~Do!~iL~a~ 5th~ }I~,~teg ,zo2 de." ~6he~ 19~'~. O&ta base system. (Lecture note~ .illeoa2%lter sciemce ; 39) Engl~ ~h o.r German. Sponsored by I~[~ G e ~ n y s~u& the I&~1 ~Torli T ~ e Co!~por atlono Bibliogr~p!~: p. Include-', i u ~ 1. Data base ~%nagement--Congresses. I. ~m,sse3~eia~ TI. I[o Spruth s W~ G. III. IBM De~Itschlan&o IV. IBM Wot'Id Trade Corporation. V. Title° VIo Series° QA76.9°D3152 19T~ 001.6'442 75-46~0 L

AMS Subject Classifications (1970): 00A10, 68-02, 68-03, 68A05, 68A10, 68A20, 6 8 A 5 0 CR Subject Classifications (1974): 4.30, 4.33, 4.34, 4.0, 4.22, 4.6

ISBN 3-540-07612-3 Springer-Verlag Berlin • Heidelberg • New York ISBN 0-387-07612-3 Springer-Verlag New Y o r k . Heidelberg • Berlin This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and. storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin • Heidelberg 1976 Printed in Germany Printing and binding: Offsetdruckerei Julius Beltz, Hemsbach/Bergstr.

Contents Uberlegungen H.

Remus

zur E n t w i c k l u n g von D a t e n b a n k s y s t e m e n

.......................................................

On the R e l a t i o n s h i p b e t w e e n G.

Richter

D a t a Base Research: A.

I n f o r m a t i o n and Data 21

.....................................................

B~aser/H.

Schmutz

A Survey ...........................................

Grundlegendes

zur S p e i c h e r h i e r a r c h i e

C.

..................................................

Sch~nemann

44

114

S y s t e m R - A R e l a t i o n a l D a t a Base M a n a g e m e n t S y s t e m M.M.

Astrahan~

D.D.

Chamberlin,

W.F.

King,

I.L.

Traiger

........

139

G e o g r a p h i c Base Files: A p p l i c a t i o n s in the I n t e g r a t i o n and E x t r a c t i o n of D a t a f r o m D i v e r s e S o u r c e s P.E.

Mantey/E.D.

Carlson

.......................................

D a t a Base User L a n g u a g e s P.

Lockemann

for the N o n - P r o g r a m m e r

...................................................

Ein S y s t e m zur i n t e r a k t i v e n Messdaten U.

Schauer

149

183

Bearbeitung umfangreicher

.....................................................

213

D a t e n b a n k o r g a n i s a t i o n bei der H o e c h s t A k t i e n g e s e l l s c h a f t O.

Saal

........................................................

N u t z u n g von D a t e n b a n k e n einer H o c h s c h u l e E.

Edelhoff

im n i c h t - w i s s e n s c h a f t l i c h e n

R.

Heitm~ller

Clark

Data Base S y s t e m E v a l u a t i o n Hill ......................................................

H,L.

H.

Wedekind

Data Base S t a n d a r d i z a t i o n Steel

279

291

in D a t e n b a n k s y s t e m e n

....................................................

On the I n t e g r i t y of Data Bases and R e s o u r c e L o c k i n g R. B a y e r .......................................................

T.B.

266

Implementation

.....................................................

Datensicherheit

249

beim Hessischen

..................................................

Relational Data Dictionary I,A.

Bereich

....................................................

E i n s a t z eines D a t e n b a n k s y s t e m s Landeskriminalamt

232

315

339

- A Status R e p o r t

.....................................................

362

PREFACE

The papers in these Proceedings were presented at the 5th Informatik-Symposium which was held in Bad Homburg, Germany, from September 24 - 26, 1975. The Symposium was organized by the Scientific Relations Department of IBM Germany and sponsored by IBM Germany and the IBM World Trade Corporation.

The aim of the Informatik-Symposium is to strengthen and improve the com~unication between universities and industry, by covering a subject in the field of computer science, both from a university and from an industry point of view.

During the last 5-10 years, Data Base Systems have developed from a highly speculative "Management Information System (MIS)" approach to a practical production tool. In the late 5O's and early 60's, the application program was viewed as the nucleus of an application, with multiple data sets as accessories to the application program, and multiple, more or less unrelated application programs serving the needs of a larger enterprise or organization. The modern approach views the data base as the nucleus of a data processing operation, surrounded by multiple application programs operating on its data.

This switch has significantly increased the need for features and characteristics, which permit quick adaptions to an ever changing set of external requirements. In the old approach, external changes usually could be contained to one or a few application programs and their associated data sets. Because of the tight coupling between application programs and their data in a Data Base System, external changes are much more pervasive than they used to be. As a consequence, practical Data Base System implementations require a degree of universality and generality unknown in previous data processing installations.

In organizing this Symposium, we structured the subject matter into four topics~ The topic of data structures covers the logical view the user has on internally stored data. This topic is closely related to the subject of data base languages. In doing this, we specifically tried to avoid a repetition of the popular argumentation of the pros and cons of the various data representation models, e.g. the hierarchical, network, and relational models.

VL

The second topic deals with components and technology~ Today the magnetic disk is the main technology for the storage of large amounts of data. Its peculiarities impact to a large extent the structure of today's data base systems. A major change in data base structures can be expected, if and when we succeed to replace the magnetic disk storage by another, more amenable storage structure.

System aspects is the third topic° It includes problems of data security and data integrity. The evolution of data base systems has generated numerous ethical, social and moral questions. It is the responsibility of the data processing community to assure technically acceptable solutions for those issues°

User aspects is the fourth topic of the Symposium. Data Base Systems require a number of tools for their installation, maintenance, and evaluation. Refinement and enhancement of these tools may be one of the major prerequisites for the further development of Data Base Systems.

The editors would like to express their thanks to everybody who contributed to the Symposium by preparing a talk, providing advice for its content and organization or assisting in its administration~

Boeblingen, October 24, i975

H. Hasselmeier

W.G. Spruth

@berlegungen

zur Entwicklung

yon Datenbanksystemen

Horst Remus,

IBM Palo Alto, Californien

Zusammenfassung Bei der Entwicklung te besonders

zur integrierten

Datenverarbeitung

sind zwei Schrit-

bemerkenswert:

- Die Datenbank

als Zentrale,

wobei die Anwendungsprogramme

lichen den Verkehr mit der Datenbank

regeln

im wesent-

(Abfrage oder Aufarbei-

tung). - Das Datenfernverarbeitungsnetzwerk,

das den gleichzeitigen

Zugriff

einem Programm oder einer Datenbank yon mehreren Benutzerstationen

zu aus

gestattet. Die Datenbankzentrale

des Datenverarbeitungssystems

Datei als Zugriffsdatei

fur ein bestimmtes

der Datei yon diesem einen Programm) bezUglich

ihrer Organisation.

genereller

Datenbanksysteme

re @berlegungen Benutzer

-

Programm

Ein weiterer

Schritt

0berlegungen

ist die EinfUgung

mit der Idee der Datenunabh~ngigkeit. ("integrity"

zu der

(mit OPEN und CLOSE

erfordert bestimmte

haben mit der Beantwortungszeit

schutz und Datensicherung

im Gegensatz

("performance"),

und "recovery")

AndeDaten-

zu tun° FUr den

stellt sich das System in zwei Teilen dar:

Das Datenmodell

- Die Sprache mit der diese Daten manipuliert KUnftig

werden

("user interface").

zu 15sende Probleme weisen in die Richtung yon Datenbanken mit

gleichzeitigem schiedene

Zugriff von mehreren

Knotenpunkte

verteilte

Systemen und in Netzwerken

Datenbanken.

auf ver-

]~

ENTWICKLUNG ZUR DATENBANK

Wir betrachten Mengen~ deren Elemente aus alphanumerischen Zeichen zusammengesetzte Daten oder Informationen sind. F@r diese Mengen ergeben sich folgende Operationen: a) Die Abfrage~ d.h. die Herauskristallisierung

gewisser Teilinformation

aus der Gesamtmenge° b) Die Berichterstellung,

d.h. die (meist summarische)

der Informationsmenge,

Zusammenfassung

oder Teilen daraus, nach gewissen nicht not-

wendig automatisch in der Mengenstruktur gegebenen Merkmalen. c) Die Aufarbeit~ng der Informationsmenge,

d.h. HinzufSgung, Ausstreichen

oder Ver~ndern von Teilen der Informationsmenge.

(Eine spezielle Form

der Aufarbeitung ist die Format~nderung, d.h. das Hinzuf~gen oder Fortlassen yon Information relativ zu jeder vorhandenen Teilinformation.) Historisch gesehen ergibt sich bez@glich der Struktur oder Organisationsform yon Informationsmengen folgende Entwicklung

(Abbildung ] zeigt

einen Versuch zur schematischen Darstellung): Der erste Schritt zur Zusammenfassung yon Information ist die Liste, wobei die einfachste Form die fortlaufende Liste ist. Als Datentr~ger in der urspr@nglichen Form dienen Medien auf denen lesbar geschrieben werden konnte. Die Abfrage erfolgte manuell, die Liste wird nach dem infrage stehenden Eintrag

(normalerweise startend am Anfang der Liste)

durchsucht. Eine Berichterstellung

ist in den meisten Fallen unmSglich,

da Einzelabfragen sehr zeitraubend sindo Die Aufarbeitung erfolgt manuell durch Hinzuf~gung eines neuen Eintrags am Ende oder dutch Streichung ~berflSssig gewordener Eintr~ge. Eine ~nderung im Listenformat fiche Information per Eintrag) keiten, da die zus~tzliche

(zus~tz-

f@hrt normalerweise nicht zu Schwierig-

Information ohnehin nur f~r die neu hinzuge-

f@gten Eintr~ge verf~gbar ist. Der n~chste Schritt ist die geordnete Liste mit den gleichen Medien als Datentr~ger.

Eine geordnete Liste entsteht aus einer fortlaufenden Liste

durch Sortierung nach einem Ordnungsbegriff.

Es ist auch m~glich, dab

eine fortlaufende Liste automatisch geordnet ist, z.B. bei chronologischen Listen wie Kirchenbuchregistern.

Die Abfrage ist wesentlich vereinfacht und erleichtert damit die Berichterstellung.

Bei der Aufarbeitung treten Probleme mit der Einschiebung von

Eintr~gen auf. Jede Menge daf~r vorgesehener Platz ersch6pft sich. Das f@hrt entweder zu einer Zerst~rung der Ordnung oder es muss eine neue Liste erstellt werden. Ein gewisser Ausweg sind die Erg~nzungslisten und Hinweise auf solche in der Basisliste Gesamtinformation). @bersichtlichkeit,

(anstelle des Eintrags der

Derartige Verfahren f@hren jedoch schnell zur Unz.B. werden er6ffnungstheoretische

Werke f@r Schach

immer wieder neu aufgelegt. Der n~chste Schritt ware das Auseinanderbrechen der Liste in Einzeleintr~ge, die Kartei. Sie stellt gewisse spezielle Anspr~che an die Medien. Die Schwierigkeiten in der geordneten Liste bez@glich Hinzuf~gen von Eintr~gen si~d beseitigt. Die Erfindung der Lochkarte und die damit verbundene elektromechanische Behandlung von Information bedeutete die M6glichkeit, einzelne manuelle Verarbeitungsschritte

zu automatisieren. Die semi-automatisc~e Einzel-

abfrage ist jedoch im Normalfall zu zeitraubend. Die Berichterstellung kann weitgehend automatisch erfolgen, jedoch mu~ die Lochkartenkarte~ f~r das Programm, d.h. die Tabelliermaschinenschaltung, reitet werden

speziell vorbe-

(Sortieren, Mischen und andere spezielle Arbeitsg~nge).

Die Aufarbeitung erfolgt semi-automatisch.

Problematisch wird die For-

mat~nderung, die meist zur Erstellung einer neuen Kartei f~hrt. Benutzung anderer Medien wie Platte oder Band erm~glichen vollautomatische Verarbeitung und f@hren zur Datei. Normalerweise ist diese, ~hnlich wie die Lochkartenkartei, relativ zu einer bestimmten Anwendung organisiert. Der Programmierer "~ffnet"

(OPEN) und "schlie~t"

(CLOSE) die Datei,

je nachdem ob die zugeh6rige Anwendung l~uft oder nicht. L~uft die Anwendung nicht, wird die Datei unter Umst~nden sogar physikalisch vom System entfernt; jedenfalls ist sie normalerweise nicht f@r andere Anwendu~gen zugriffsbereit. Abfrage und Berichterstellung sind auch nur f~r bestimmte Anwendungsprogramme m6glich. Die gleichzeitige Bearbeitung mehrerer Anwendungen yon ein und derselben Datenstation oder yon einer oder mehr Anwendungen von verschiedenen Datenstationen wird problematisch. Aufarbeitung und Format~nderung erfordern die automatische Erstellung einer neuen Datei.

Eine Vielzahl

yon Anwendungen

menge f@hrt zur Datenbank.

und Benutzern

fNr ein und dieselbe Daten-

Ihre speziellen Erfordernisse

werden im fol-

genden n~her erl~utert.

2o

DATENBANKEN

Implizit

enthalten

minimalen

in der Definition

Redundanz

st~ndlichen Zugriff

UND DATENBANKSYSTEME

Struktur,

zu einer Datenbank

erfolgt normalerweise

@berwachung

ter. Neben der Erhaltung

Systemprogrammierer Beantwortungszeit physikalische

Anwendungen

der Datenbank

der Integrit~t

eine optimale und Speicher

Organisation

weise von Indizes

ist das Konzept der

einer f~r den Benutzer ver-

dem Datenmodell.

Benutzern mit verschiedenartigen eine fortlaufende

der Datenbank

und die Notwendigkeit

von einer Reihe yon

gleichzeitig.

durch einen Datenbankverwal-

der Datenbank

Erzielung

streben diese

von Leistungsfaktoren

an. Sie interessieren

der Datenbank~

Das erfordert

wie

sich daher f@r die

einschlie$1ich

der Wirkungs-

und Zeigern°

Die Anwendu~gsprogrammierer logische Datenmodell

oder "Enduser '~ interessieren

und f@r Wege zum Wiederauffinden

sich f~r das

und zur Aufarbei-

tung yon Datenbankelementen. Um zu verstehen~

welche Forderungen

der Anwendungsprogrammierer, wendungen Zun~chst

yon Datenbanken

oder Begriffs

yon Stapelverarbeitung

oder nachdem eine bestimmte Menge der Echtzeitverarbeitung tenmenge

(batch processing)

erinnert

erfolgt die Verarbeitung

gruppenweise

und

haben, m~ssen die An-

werden.

(real time processing)

Bei der Stapelverarbeitung Merkmales

n~her untersucht

sei an den Unterschied

und Echtzeitverarbeitung

beide, der Datenbankverwalter

an Datenbanksysteme

an bestimmten

zur Verarbeitung

(Abbildung

bez~glich

2),

eines

festgelegten angesammelt

Terminen ist. Bei

wird jeder Schritt sofort auf der gesamten Da-

ausgef~hrt.

Au~erdem sind bei den Anwendungen

zwei Parameter

von besonderer

tung: . die Voraussehbarkeit die H~ufigkeit gleichartiger

Zugriffe

(Repetivit~t).

Bedeu-

Hierbei gibt es bezNglich beider Merkmale eine Reihe yon Mischungen. Man wei~ z.B. nicht im voraus, nach welchem Tell eines Lagerbestands ein Magazinverwalter fragt. Was er darOber wissen will, ist jedoch genauestens bekannt.

Im allgemeinen kann man Datenbankoperationen

folgende verschiedenartige Operationen einteilen

in

(Abbildung 3):

I. Wirkungsvolle Ausffihrung sich wiederholender Arbeiten

(traditionelle

Stapelverarbeitung). 2. Im voraus definierte Abfragen 2

("Wie gro$ ist der Lagerbestand an

Zoll N~geln ?").

3. Zuf~llige, schlecht strukturierte und unvorhergesehene Abfragen

("Wie-

viele Ingenieure in Hamburg haben ein Monatseinkommen von mehr als DM 6000.-- ?"). Ein System, das Nr. I und 2 behandelt, wird "Operational"

oder "Supervisory System" genannt, ein System, das Nr. 3 behandelt, ein "Informa,ions" oder "Executive System". Beispiele for beide Gruppen w~ren: "Operational" Systeme: Bank mit Datenstationen an jedem Schalter, Flugreservierung, Flugsicherung. Informationssysteme;

BOcherei mit Aufsuchen von Information nach Kenn-

wort, Marktinformation fNr Management, Datenbank mit Personaldaten. Ein und dieselbe Datenbank sollte normalerweise die Anwendung beider Systeme erlauben.

3.

SPEZIELLE ANFORDERUNGEN AN DATENBANKEN

Es wurde bereits auf die Forderung der minimalen Redundanz hingewiesen. Die meisten Band-Bibliotheken enthalten eine FOlle von redundanten Daten. Unkontrollierte Behandlung der Frage der Redundanz kann (wie z.B. bei vielen BOroablagesystemen)

zu der Notwendigkeit h~ufiger Um- oder Neuord-

nung fOhren. Eine weitere Frage ist natOrlich der Verbrauch an Speicherplatz und die damit verbundene Kostenfrage. Mehrfache Kopien derselben Daten k6nnen au~erdem wegen eines m6glicherweise verschiedenen Aufarbeitungsstandes zu verschiedener Information fOhren. Ziel einer Datenbankorganisation sollte es also sein, Redundanz zu vermeiden, w o e s

6kono-

misch richtig

erscheint.

chen Wiederherstellung erforderlich

Aus Gr8nden der Datensicherheit

fehlerhafter

Daten kann jedoch einige Redundanz

sein.

Eine weitere Forderung

ist die Vielseitigkeit in der Darstellung von

Datenbeziehungen.

Verschiedene

logische

die jedoch alle auf derselben

Dateien,

Sehr bedeutend

Programmierer

Entscheidende

Benutzer

einer Datenstation

einheit,

die ein System bew~itigen

Verkehrsvolumen, (throughput)

Leistungsfaktoren

Bedeutung

erwarten

(Hinzuf@gen

Leistungssteigerung

der Obertragungen

tere Ma~nahmen in mehrere Datenbank

in Betracht

yon mehreren

beitungssysteme

in der Sekunde rasche

etc.).

ohne Bedeutung.

ist ein Dialog mit einer Antwortzeit

cheneinheit

yon Einflu$

Es ist notwendig,

Nat~rlich

and privacy"

Kontrollen

so gestaltet

nicht

zerst6rt werden

System mu~ daher die M6glich-

= Datenschutz).

gesch~tzt wer-

Diese Forderung

kann ~ber-

da~ das System die Authorisation

und seiner Aktionen ~berpr~ft sollten

der Re-

beinhalten.

tragen werden auf die Forderung~ Benutzers

von 2 Sekunden

untereinander

In vielen F~llen m~ssen Daten vor dem Zugriff Unbefugter ("security

F~r

des Datenbanksystems.

oder andere "Unf~lle"

( D a t e n s i c h e r h e i t ). Jedes

den

Stapelverar-

ist die Leistungsf~higkeit

da~ Daten und ihre Beziehungen

keit yon Datensicherheitstests

der Datenbank

(Stapelverarbeitung).

auf die Leistungsf~higkeit

durch Maschinenfehlverhalten

sind wei-

Ihr Entwurfskriterium

gewisse

erforderlich.

Um die erfor-

aus. F~r traditionelle

des "batch processing"

Anwendungen

zu

oder Zugriff zu einer

ist die Effektivit~t oder weniger

erfor-

Steigerung

wie z.B. Aufspaltung

(Dezentralisierung)

ist die Antwortzeit

Es gibt heute

in den Griff zu bekommen,

zu ziehen,

Rechenanlagen

je Zeiteinheit

und Gro~banken.

Bank-Zweigstellen

besser

Einzeldatenbanken

je Zeit-

ist. Systeme mit hohem Verkehrsvo-

ist eine weitere

von weiteren

fur die

der Obertragungen

die 10 und mehr Obertragungen

Bei derartigen Anwendungen

derliche

beruhen.

kann. Es gibt Systeme mit geringerem

lumen sind z.B. Flugreservierungssysteme bereits Anwendungen,

Datenbank

sind die Antwortzeit

und die Anzahl

bei denen die Anzahl

von geringer

benutzen unterschiedliche

der Leistungsf~higkeit eines Datenbank-

sind die Aspekte

systems.

dern.

und zur mOgli-

(z.B. durch ein Passwort).

sein, da~ geschickte

nicht ohne weiteres

umgehen

k6nnen.

und notiert werden,

soda~ falscher Gebrauch

Programmierer

Auch sollten die Aktionen nachtr~glich

eines Die sie

~be~acht

herausgefunden

werden kann. Ebenso ist es erforderlich,

da~ die Datenbank selbst lau-

fend @berpr~ft werden kann. Au~erdem tritt die Forderung auf, Anwendungsprogramme unabh~ngig yon der Datenorganisation und Zugriffstechnik zu schreiben (Datenunabh~ngigkeit). Z.B. bietet IMS [3] einen gewissen Grad yon Datenunabh~ngigkeit, indem neue Datensegmente an bestimmten Punkten der Hierarchie ohne Programm~nderung hinzugef@gt werden k~nnen, oder auch die L~nge eines Datensatzes oder die Aufteilung der Datenbank in Datengruppen ge~ndert werden kann.

4.

DATENBANKSTRUKTUREN

Die Funktion einer Datenbank ist das Abspeichern der Daten und der Beziehungen zwischen den Daten. Die logische Beschreibung einer Datenbank wird das Datenbankschema genannt. Ein Schema definiert also das Datenmodell fur den Anwender. Ein Subschema ist die Aufgliederung der Datenbank f~r ein spezielles Anwendungsprogramm. Abbildung 4 zeigt das Zusammenwirken der verschiedenen Teile innerhalb eines Datenbanksystems und insbesondere die Bedeutung der Begriffe Schema und Subschema. Abbildung 5 zeigt die Aufgliederung einer Datenbank zur Arbeitsplatzbeschaffung. Die Beziehungen zwischen den einzelnen Dateien sind klar ersichtlich. Die Arbeitgeberdatei gibt die Einzelheiten zu dem Feld "Arbeitgebernummer",

die Talentdatei die Einzelheiten zu dem Feld "Gefor-

dertes Talent" in der Arbeitsplatzliste. form f~r Datenbankstrukturen:

Hierbei zeigt sich eine Haupt-

die hierarchische Gliederung.

Die Dateien

"Arbeitgebernummer"

und "Talentgruppe" sind Untergliederungen der Datei

"Arbeitsplatzliste"

~Eltern-Kind-Beziehung).

Die M@glichkeit Beziehungen zwischen den einzelnen Datenfeldern in der Datenbankstruktur zum Ausdruck zu bringen, hat zu drei wesentlichen Datenbankorganisationsformen gef@hrt: ]. Die hierarchische Datenbankstruktur

(Abbildung 6). Hierbei hat der

hSchste Level einen und nut einen Knotenpunkt,

die "Wurzel des Baumes".

Jeder Knotenpunkt eines anderen Levels erh~it genau einen Knotenpunkt in dem n~chsth6heren Level zugeordnet.

Knuth

[4] definiert

sprechend

einen Baum oder eine hierarchische

Struktur

ent-

als "eine endliche Menge T von einem oder mehr Knotenpunk-

ten mit a. einem speziell

ausgezeichneten

Knotenpunkt,

der Wurzel

des Baumes

und b. m~O verbleibenden

disjunkten

(unverbundenen)

wobei jede dieser Teilmengen Teilbgume

genannto"

IMS [3] verwendet

die hierarchische

2. Falls ein Knotenpunkt Ebene zurNckgef@hrt

Netzwerk

~'

bezeichnet.

Die entstehende

zeigt einige einfache

Komplexere existieren.

entstehen,

nur ein spezieller

Netzwerkstruktur wenn mehrfache,

ohne Redundanz

den Datenbankelementen

Abbildung

NatNrlich

7

ist

Fall dersel-

ist ein Stammbaum. Level

und Redundanz

zurNckgef~hrt

werden.

k6nnen

Die Aus-

[I] fNhren zu einer Netzwerkstruktur. auszukommen

und die Beziehungen Kalk@l darstellen

data base" nach Codd

zwischen

zu k6nnen,

(siehe ausf~hrliche

Be-

in [2]).

Die Grundoperationen

zur Formung neuer Datens~tze

Die Sprache

aus sehr elegant,

doch haben sich Implementierungen

Leistungsf~higkeit mit Datensgtzen

erscheint

sind Vereinigung

und Durchschnitt.

vom mathematischen

bisher wenig durchgesetzt.

auf dem gleichen Level

keit des Datenmodells manipuliert

des Wortes Sprachbe-

nicht algorithmisch

von Mehrfachindizes

als algebraischen

f@hrt zu der "relational

"Netz-

den Elementen verschiedener

auf Baumstrukturen

der Codasylgruppe

3. Die Forderung

schreibung

zwischen

Unter EinfNhrung

verwendet.

yon Netzwerkstrukturen.

oder Baumstruktur

Netzwerkstrukturen arbeitungen

"plex structures"

Beispiele

Beziehungen

Gebrauchs

wird im angloamerikanischen

einer einfachen

Strukturen

bestimmbare

nicht mehr

Struktur wird als

Wegen des vielseitigen

reich hgufig die Bezeichnung eine hierarchische

einer h6heren

werden soll, kann die Beschreibung

in der Datenindustrie

ben. Ein Beispiel

Datenbankstruktur.

auf mehr als einen Knotenpunkt

durch einen Baum erfolgen. werkstruktur

Teilmengen T I ..... Tm,

ein Baum ist. Diese Teilmengen werden

und Einfachheit

werden k6nnen.

aus Gr~nden der

Die Vorteile yon Datei

gliedern

sich um Obersichtlich-

der Sprache mit denen Beziehungen

Darstellungen

Form k~nnen durch Verwendung

Standpunkt

in "relational

von Mehrfachindizes

data base"-

und Redundanz

auf

obige Formen der hierarchischen oder Netzwerkstrukturen

zur~ckge-

f~hrt werden. Im Zusammenhang mit Datenbankstrukturen wird h~ufig yon Listen und Ringen gesprochen (chains or lists, rings). Diese Strukturen beziehen sich jedoch auf die Art, in der Datens~tze innerhalb einer Datei untereinander verbunden sind. Sie beschreiben daher Techniken, wie logische Strukturen aus physikalischen erreicht werden, w~hrend die unter I-3 beschriebenen Strukturen spezielle Formen logischer Strukturen darstelfen. Ein entscheidendes Element f~r beide, die Listen- als auch die Ringstruktur,

sind die Zeiger (pointer),

die yon einem auf den folgenden

Datensatz weisen. Bei der Ringstruktur sind dabei normalerweise zweiseitige Zeiger gebr~uchlich.

5.

DATENBESCHREIBUNGSSPRACHEN

Eine Sprache, die die logische Datenstruktur beschreibt,

sollte die

folgenden Forderungen erf@llen: Die Gliederung in Datenmengen wie Dateien, S~tze, Segmente, Datenelemente, sollte klar beschreibbar sein. Jeder Typ einer solchen Mengeneinheit sollte spezifisch bezeichnet sein (z.B. sollten 2 verschiedene Satztypen verschiedene Bezeichnungen haben). Die Untergliederung einer bestimmten Datenmenge in bestimmte Untermengen sollte klar erkennbar sein (welche Datenelemente in einer bestimmten Datengruppierung enthalten sind etc.). Die Aufeinanderfolge mug spezifiziert und Wiederholungen sollten aufgezeigt sein. Die Sprache sollte ausdr~cken, welche Datenelemente als Indizes benutzt werden. Beziehungen zwischen Satztypen, Segmenttypen etc., die die Grundlage der Datenstruktur bilden, m@ssen spezifiziert und klar bezeichnet werden.

10 Nach J. Martin [5] ergeben sich je nach dem Gesichtspunkt des Benutzers verschiedene Level der Datenbeschreibungssprachen (Abbildung 8): I. Die Sprache ffir den Anwendungsprogrammierer, schema beschreibt in DL/I

(z.B. die Datendivision

(PSB = program specification

2. Die genere!le Beschreibung bankverwalter

des Schemas der Datenbank,

ion). Die COBOL Datendivision

3. Die physikalische losgel6st

block)). die vom Daten-

angewandt wird (z.B.: DL/I logical data base descript-

einem Schema zu beschreiben. werden.

description).

die das Datenbanksub-

in COBOL oder die PSBs

erlaubt z.B. nicht, die Beziehungen

Datenbeschreibung

Im Gegensatz

(z.B.: DL/I physical data base

zur logischen Datenbeschreibung,

ist yon Hardware- und Speicherfiberlegungen,

doch fur Leistungsoptimierung Auger DL/I ist wahrscheinlich

in

Sie kann daher bier nicht verwendet

die v@llig

sind diese je-

sehr interessant.

CODASYLs data description language DDL

die bekannteste Datenbankoeschreibungssprache.

6.

0BERLEGUNGEN

BEI DER HARDWARE

Es sind Datenbanken yon der Gr6~enordnung Bytes bekannt. denkbar,

yon mehr als 4 Milliarden

Das entspricht 40-50 Platteneinheiten

eine Platteneinheit

igngerer Zugriffszeit

IBM 3330. Es ist

durch eine gr6~ere Speichereinheit

zu unterst~tzen,

mit

ghnlich wie beim virtuellen Spei-

cherkonzept zwischen Kernspeicher und Platte. Die vor etwa einem Jahr angekfindigte IBM 3850 liefert z.B. 103 bis 104 mehr Speicherraum mit einer um den Faktor 102 verlgngerten Zugriffszeit. Der Benutzer sieht das System als ein einziges Plattensystem, ffir Leistungsf~higkeitsbetrachtungen sind die Hardware-Parameter jedoch von gr6~ter Bedeutung. Zum Beispiel bestehen strenge Abh~ngigkeiten zwischen Antwortzeit, Obertragungsrate und Direktspeichergr6~e, oder Speicherverf@gbarkeit in der niedrigsten Stufe der Speicherhierarchie.

Die Antwortzeit wgchst mit der

0bertragungsrate und f~llt mit mehr Direktspeicherverf~gbarkeit (weniger paging). Die Obertragungsrate kann mit mehr Direktspeicher gesteigert werden.

11 Andere Hardware-Parameter sind nat~rlich die Geschwindigkeit des Computers, der Aufbau und die Komponenten des Nachrichtennetzes.

7.

AUSBLICK

Die zus~tzlichen Anforderungen f~r Erweiterungen bestehender oder Entwicklung zuk~nftiger Datenbanksysteme gliedern sich um die folgenden Aspekte: a) Steigerung der Leistungsf~higkeit.

Wachstum der Datenbank und der

Anzahl der Datenbankbenutzer erfordern h6here 0bertragungsraten und k@rzere Antwortzeiten.

Die Antwort liegt in geeigneteren Datenbank-

organisationen und einer Minimisierung von Verwaltungsfunktionen. Gewisse Hilfsmittel der Hersteller erm6glichen gin "tuning" der Datenbank, dazu ergeben sich Anwender-beeinflu~te Verbesserungsm6glichkeiten.

Gewisse Verbesserungen sind dutch geeignetere Verwendung

yon Hardware erzielbar (multiprocessing oder ~hnliche Verfahren). b) Fortlaufende Operation.

Die Forderung einer 24-st~ndigen Zugriffs-

m6glichkeit zur Datenbank f~hrt zu gewissen Konsequenzen bei der Implementierung. Zun~chst wird bei Unterbrechung durch Fehlverhalten eine schnelle Wiederherstellung der Datenbank und kurzfristige Wiederaufnahme der Operationen notwendig. Das erfordert die F~hrung eines schnell zugriffsbereiten "Journals". AuBerdem sollte an den besten Techniken zur Fehlerverh~tung,

-auffindung und -korrektur gearbeitet werden.

Eine weitere Forderung ist, die Datenbank - bei gleichzeitiger Fortf~hrung des Routinebetriebs - zu reorganisieren.

Ein Dictionary

[7]

kann dabei als wesentliche Hilfe zum Management der Datenbanken dienen. c) Einfachheit der Installierung und Benutzung.

Die Parameter, die zur

optimalen Organisation einer Datenbank f@hren, sind sehr komplex. Systemhersteller helfen allgemein mit automatischen Organisationshilfen oder Hinweisen in der Dokumentation. Die Frage der Installierbarkeit ist weitgehend identisch mit der M6glichkeit, die physikalische Representation der Datenbank zu verstehen. Wiederum kann ein Dictionary

[7] n~tzlich sein.

!2 Einfachheit der Benutzung h[ngt wesenzlich mit der Beschaffenheit der Sprachen zur Datenmanipulierung

und -beschreibung und dem "inter-

face" zu den Programmierungssprachen Weitere Funktionen,

ab.

die zur Vereinfachung

der Benutzung f8hren~ haben

mit der automatischen Regelung des Informationsflusses zu tun. wesentlich ist hierbei die Handhabung der Kontrollinformation (Kontrollbl~cke)~ wie sie z.B. bei der standard network architecture Um die sp~tere Benutzung zu vereinfachen, geh6rige Systeme auf die M6glichkeit

erfolgto

m8ssen Datenbanken und zu-

zur sp~teren Ver[nderung bzw.

Erweiterung ausge!egt sein.

Literatur [!] CODASYL~

"1974 Status Report on Data Base Activities"

(Z] Date, C.J.~ "An Introduction Addison-Wesley,

to Database Systems".

Reading, Mass.

~3~ Information Management

Ig75

System, "System/Application

Design Guide"

IBM Form No. SH 20-9025 [4] ~nuth, D.E.~ "The Art of Computer Programming3 Algorithms".

Addison-Wesley,

Reading, Mass.,

Vol. I, Fundamental

1968

[5i ~artin, J.~ "Computer Data Base Organization", Prentice-Hall, Englewood Cliffs, N.J., 1975 [6] Senko, M.E.~ Altman, E.Bo, Astrahan, M.M and Fehder, P.L., "Data Structures

and Accessing

IB~ Systems Journal [7] Uhrowczik,

in Data-Base Systems".

12, 30-93 (1973)

P.P., "Data Dictionary/Directories".

I~4 Systems Journal 12, 332-350

(]973)

Medium, das menschfiches Schreiben und Lesen erlaubt.

Fortlaufende Liste

Lochkarte

Band, Platte

Lochkartenkartei

Datei

Abbildung 1

ENTWICKLUNG ZUR DATENBANK

Datenbank

Medium,separierbar je Eintrag

Kartei

Geordnete Liste

Datentr~ger

Datendarstellung

Semiautomatisch, die Kartei wird fLir das entsprechende Programm vorbereitet

Manuell

Manuell, bestimmt durch zeitraubende Einzelabfragen

Berichterstellung

Automatisch unbegrenzt

Auto matisch soweit Information vorhanden unbegrenzt

Automatisch, beAutomatisch, die Datei wird fLir das schr~inkt auf die zu dieser Datei geh6ren- entsprechende Programm vorbereitet de Anwendung

Manuell oder semiautomatisch (sehr zeitraubend)

Manuelt, unter Benutzung des Ordn ungsbegriffs

Manuelles Durchsuchen (generell: Start am Anfang)

Abfrage

Automatisch t unbegrenzt

Automatisch, mit h~iufiger Neuerstellung

Semiautomatisch

Manuell, unbegrenztes Hinzuf~Jgen m6glich

H&ufige Neuerstellung wegen Aussch6pfung des Platzes fiJr ZufiJgungen

Manuelt, ZufLigung neuer Eintr~ige am Ende

Aufarbeitung

Automatisch unbegrenzt

Erfordert normalerweise Neuerstellung der Datei

Erfordert normaler~eise Neuersteliung der Kartei

Kein Problem, neues Format bleibt auf neue Eintrage beschr~inkt.

Formatanderung

Co

14 STAPELVERARBEITUNG

~ " - " " l m ~

{ BATCHPROCESSING)

GEMEINSAME~ ~

i 125.s,,7o.2~ llp

GEMEINSAME ( 26,5. )

•

J + ( 25.s., ~3.01 ) V

t 29.5. )

y

! !

+ ECHTZEITVERARBEITUNG

(

REALTIMEPROCESSING)

T

I' r

ABBILDUNG 2

15

Operational Systeme

InformationsSysteme

Zugriff

geplant oder vorausprogrammiert

spontan, nicht vorausprogrammiert

Typische Beispiele

Bankschalter Ftugreservierung

Verkaufsanalyse, Personalinformation

Typische Benutzer

Bankschalterbeamte, Vorarbeiter, Unteres Management

lnformationsstab, Mittleres Management, Assistentendes h6heren Management

Normalzweck

Unterstiitzung von Routine Operationen

Unterstlitzung von Planung und dringenden InformationsbediJrfnissen

Antwortzeit

Sekunden

Minuten oder Stunden

Implementierer der Anwendung

Programmierer

Informationsspezialist

lmplementierungszeit

Wochen oder Monate

Stunden

Typische Sprachen

COBOL, FORTRAN, PL/I

IQF, GIS

MERKMALE FOR DATENBANKSYSTEME (nachJames Martin) Abbildung 3

I

DATENBANK SYSTEM

1

ABBILDUNG 4

WIRKUNGSWEISE EINES DATENBANKSYSTEMS

SYSTEM PUFFER

ARBEITSBEREtCH DES PROGRAMMS

ANWENDUNGS PROGRAMM A

17

NAME

ADRESSE

NAME

I

ADRESSE

VERFOGBARKEIT

I

i

ERFAHRUNG

ARBE1TSKLIMA

AUSBILDUNG

t

l-t DATEN

GEHALT

SOZIALE LEISTUNGEN

ABBILDUNG 5

AUFGLtEDERUNG EINER DATENBANK A R B E I T S P L A T Z B E S C H A F F U N G

TALENT GRUPPE

TALENT DATEI

ARBEITGEBER NUMMER

ARBEITGEBERDATEI

ARBEITSPLATZLISTE

I

ABBILDUNG 6

HIERARCHISCHE DATENBANKSTRUKTUR

/ \

WURZEL

jl

1

LEVEL 4

LEVEL 3

LEVEL 2

LEVEL

~o

~BBILDUNG

7

DATENBAN KNETZWERKSTRUKTUREN

411

20

ANWENDUNGSPROGRAMMIERER t

SUBSCHEMA

A

i tSUBSOHEMAI ,,

-..../_...scHEMA ./~ZU

GLOBALE ODER GENERELLE DATENBANKBESCHREtBUNG ( DATENBANKVERWALTER)

AUTOMATISCHE AUSF(JHRUNG DURCH DATENBANKSYSTEM

I

PHYSIKALISCHEBESCHREIBUNG

Oa DNUNG SUBSCHEMA

PHYSIKAL1SCHE J SPEICHERZUORDNUNG

I DATENBANKBESCHREIBUNG

LEVEL DER DATENBESCHREIBUNGEN

ABBtLDUNG 8

On the ~ e l a t i o n s h i R Gernot Richter, (G~D),

Sf.

between Information

Gesellschaft

fuer

and Data und

Mathematik

Datenverarbeitung

Augustin

Summary On

the

background

analyzed

of a general

which explicitly

represeniation.

Using a conceptual

to talk about information on

the

representation

In

the

of

with

a

data base management

For information

discussed.

have

been

characterized their

functional

realization.

of

This

gives

in

[ANSI]

recognized

under to

level present

motivation

to

in the field of

allows for the exchange

roles

work stations than

Years ago this kind of functional (Instanz)

consideration.

In

a

of messages

which units

these functional or

within the system rather

and applied in [ABN]

introduce

functional

Recently

as

offices influence each other by communicating been

the

differentiation

communicating

There the term office

units

The significant

and representation

some topics concerning

in the sense of [DIN]. identified

been introduced

of C. A. Petri.

some ideas

~ystems

only by their function

technical

has already

which has been designed manipulation,

a view has been proven to be very useful

consisting

(Funktionseinheiten)

its

systems.

systems

them

and

and data a definition is outlined.

for conceptual

I. A model view of information

considers

systems a view is

information

are presented.

for the information

are

plea

(IMC)

and their

these considerations

data base technology conclude

units

system

structures

For the concepts of format light

between

of information

role of type declarations is shown.

model of information

distinguishes

following

by

units

a suggestion

has been chosen fox the

information messages.

complementary

systems

So the need has

functional

between offices.

the

To this

unit which kind

of

22

functional

units

the

concept of interfaces concept of channel: communication only

term c h a n n e l

(Kanal)

was given in [ABN].

as used in [ANSI] has a direct relation An i n t e r f a c e

The

to

the

is a system of rules which govern the

via a c o n s i d e r e d channel.

by its function within the system

Also a channel is c h a r a c t e r i z e d serving

as

a

facility

where

messages can be posted and taken by the c o m m u n i c a t i n g offices.

This

yields

a model view of information systems

which provides

d e c o m p o s i t i o n into two d i s t i n c t classes of functional - offices channels

-

gained

some

discussion

by the processes they can perform

characterized

by the states they can assume.

publicity,

base management

since

the

and in the area of s t a n d a r d i z a t i o n

With the above model in mind

publication

of

of

two

we

want

offices

recently

[ANSI]

has

is under

(IFIP/TC-2 and I~G)

(ISO/TC 97/SC 5).

via

adequate minimum c o n f i g u r a t i o n to information

To

systems

both in the world of s c i e n t i f i c r e s e a r c h

communication

units:

characterized

This model view applied to data

for the

to

do

a

close

one channel.

examine

the

look

to

the

This seems to be an

interrelation

between

and data.

i l l u s t r a t e this c o n f i g u r a t i o n

where offices are depicted

we use the graphic notation of [PET],

by boxes and channels

by

circles

(in

the

cited paper only e l e m e n t a r y offices and c h a n n e l s are considered). yields fig. is

I.

In the adopted model c o m m u n i c a t i o n

done by exchanging

messages

This

between both offices

via the linking channel.

The arrows in

the above figure only i n d i c a t e the possibility of access and are functional

n o

units.

A further aspect is depicted in fig. only sense if both c o m m u n i c a t i n g

I:

The exchange of messages

offices have a

common

makes

background

of

understanding,

which allows them to interpret the messages found in the

channel.

assumption

The

useful auxiliary

such a "uniwerse of discourse" is a very

of

model for

between t e c h n i c a l f u n c t i o n a l

the

understanding

units.

of

communication

also

23

2. Model i n f o r m a t i o n and abstraction

So

far no reference has been made to a distinction between i n f o r m a t i o n

and data.

But words as "represent"

mapping between two things. there

are

two

abstraction,

and "interpret" indicate

mappings to be considered.

i.e.

a kind

of

It is the goal of this section to show that Both have the nature of an

omission of features not to be considered - hut they

start at different points.

One

kind

of abstraction starts with the so-called initial i n f o r m a t i o n

(Ausgangsinformation), knowledge

which is to

be

understood

or ideas a person has about something

anything else). intended

For a certain

purpose

pragmatic

as

the

whole

context,

i.e.

pursuing

part of it. The information about a person e.g.

is different

for a d m i n i s t r a t i v e purposes and for medical purposes;

information

about

a

technical

from what is needed for e n g i n e e r i n g purposes.

result of the abstraction process information

has

been

(~odellinformation).

yields

indicates,

the

"engineering

called

In

[STEEL] the above abstraction is called the which

the

process for teaching purposes will be So it

i n t e n d e d purpose which controls the abstraction process.

model

an

it might be that not the whole information is needed

but only the "relevant"

different

of

(of the real world or

model".

is

the

In [DURI] the

the

(respective)

similar c o n s i d e r a t i o n s "engineering

The

term

that we are still on the information

of

abstraction,'

model information

level.

In the present

context

we do not adopt any definition of information;

the concept is

used in

the

sense

of

knowledge

or

idea

(about

something).

Thus

i n f o r m a t i o n is viewed as being of mental nature.

It

is

obvious,

that

depending

on

the

respective intended purpose

various abstractions can be performed on the same initial information.

It

is

not

information

of

interest

"exists"

in

this

presentation,

or not - whatever that

whether

the

model

means. However we found the

approach very useful which assumes a level of model information

(as did

also other authors).

Model i n f o r m a t i o n cannot be communicated directly nature.

There must be a r e p r e s e n t a t i o n of it

handed

out

to

the addressee

(on a medium)

which can be

(or which can he stored for later use).

Such a r e p r e s e n t a t i o n is what usually is called between information

because of its mental

"data".

The distinction

and its r e p r e s e n t a t i o n is the background

all the following ideas have been developed.

on

which

24

Now it is possible to show the other a b s t r a c t i o n is

of

a g u i t e different

sense of data)

nature.

C o n s i d e r some messages

which by a g r e e m e n t

have the same meaning.

mentioned above,

between

the

messages

"semantics"

model

information.

There are

informa±ion.

and the process of

Such

rules a

mapping

for

mapping

the

to

the

"interpretation".

So we have an abstraction

pertinent

representational

There

is

one

model

If

several

they all have the

from various r e p r e s e n t a t i o n s

by

ignoring

the

respective

problem

which

might have been apparent

C o n s i d e r i n g the c o m m u n i c a t i o n

already in the

beween an

author

audience he has the need of r e p r e s e n t i n g model information,

he wants to write reference

about.

language

represented

and

is

the

For

this

purpose

beneficial,

in

interpretation

representation

whenever

a

kind

which

of

the

(graphical)

information

following

emphasis

is

laid

and which

can

of which is agreed upon.

g r a p h i c a l language will be p r e s e n t e d in canonical

of

is called

peculiarities.

above discussion. the

information

mapping

usually

messages are mapped onto the same model information, "same meaning".

As

e x c h a n g e of messages is assumed to have the goal

model information.

to

offices

What is "same meaning" in the present case? Any

pointed out,

to exchange

(here in the

communicating

message is c o n s i d e r e d to be a r e p r e s e n t a t i o n of model already

which

and on

be

Such a

used the

for model

i n f o r m a t i o n rather than on one of its possible representations.

3. O u t l i n e s of a c o n c e p t u a l

model of i n f o r m a t i o n

Before dealing with any problems of r e p r e s e n t a t i o n

the

model

What is an adeguate

information

itself

have to be identified.

view of model i n f o r m a t i o n

with respect to a p p l i c a t i o n s ?

brings

least in the past)

us

into

argumentation models"

a

about

(at

This

network,

of

question

very c o n t r o v e r s a l area of

the a d v a n t a g e s and d e f i c i e n c i e s of so-called

(hierarchic,

considerations

properties

relational,

...).

For

"data general

we can avoid this topic by adopting a view which covers

the various ,'data models".

This view has been outlined in [DUHI]

and is

r e f l e c t e d in a c o n c e p t u a l system called I n f o r m a t i o n M a n a g e m e n t C o n c e p t s (IMC).

These c o n c e p t s have been developed as a means for talking about

model information, systems.

in p a r t i c u l a r in the context

Simultaneously,

rules

for

graphic

i n f o r m a t i o n in terms of IMC were developed. IMC

r e p r e s e n t a t i o n of model

Both the basic concepts of

and the related c a n o n i c a l r e p r e s e n t a t i o n s

section to f a c i l i t a t e the treatment of the

of data base management

will be outlined in this

topic

of

"data"

(in

the

25

sense

In

of representation)

IMC

any portion

communication information library,

to

in a factory.

component

Depending

on

aggregate

is

A

way

either

of a

These

immediate

generic

unordered

a

(mathematical)

constructs.

The domain of a nomination

components

selection

of immediate

components

in the Vienna

To show examples

of atoms, above

the

vertex.

(fig.

always

nomina t i o n s

circles. network"

hy

example

of a "relation"

construct

is given

can

at

the

representation

of)

the same construct

the

nature

of

serve

e.g.

manner

[ZEM]).

framework

the

for the

(in the same

Beyond

of IMC.

we first

have

In IMC a box

is shown

either

In a tree r e p r e s e n t a t i o n is e x p r e s s e d

techniques the

is

by t ~

possible.

representation

by small circles are written

of

attached

close

to

the

and the c o r r e s p o n d i n g

of the nomination

we

we cannot.

For

"set

in [DKR].

may appear

representation

of model i n f o r m a t i o n

point

is a set of names.

cf.

In

The names

at the r e p r e s e n t a t i o n the same

boxes.

a to

aggregate

of names is depicted

representations.

whereas

representation.

3).

in that a

(Name)

of a c o n s t r u c t

an

an

nomination

n~me~

and n o m i n a t i o n s

(fig.

to

the

of both r e p r e s e n t a t i o n

in I~C r e p r e s e n t a t i o n

that

within

a

differ

Names only

Language,

canonical

represented

A detailed

If we look notice

a

be a

level)

constructs,

in a nomination

collections,

constructs

or

from

therefore

The c o m p o s i t i o n

the presence

to the component

of

or a n o m i n a t i o n

2) or by trees

of

A combination are

set

Definilion

mentioned

a construct.

aggregation

Atoms

is a in

cannot

(first

of aggregates

function

of names is involved

boxes

i.e.

as a part of

A construct

is of no significance.

to i n t r o d u c e

by nested

finite

of being a c o l l e c t i o n

immediate

the

a

(Atom)

to "be",

relevant

(Kollektion)

types

an

represents

an ~!Rm

its capacity

composition

collection

two

is

no meaning

in

itself.

is

that

in

may be the

an aggregate

i s

(Komponente).

to in a

a book

is either

which

construct

nomination

as s e l e c t o r s

A construct

situation),

of

collection

the property

can be referred

an atom is declared

(in a given

is a ~ R @ ~ 2 ~

the

(Nomination).

A construct

the c o m p o s i t i o n

communication.

within

which

(Gebilde).

Whereas

as e l e m e n t a r y

construct

to information.

a car in an administration,

(Aggregat).

construct,

considered

information

a construct

a family,

a process

be viewed

another

of model

is called

about

or an ~ e ~ a ~ e

compou n d

and its r e l a t i o n s h i p

various appears,

Therefore

of

fig.

in different

2

or

contexts.

locations

3

we In a

where

(the

on the c o n c e p t u a l

level

a concept

is

needed

which

26

allows

to

distinguish

between

different

appearances of one ccnstruct

(within a c o n s i d e r e d e m b r a c i n g construct). (Stelle) pairs

has

been introduced.

(name,

inserted

construct).

at

the

In IMC the concept of

In case of a c o l l e c t i o n the empty

name p o s i t i o n

in the pair.

in

(=relative to)

name

is

The first pair of a spot

d e f i n i n g s e q u e n c e always c o n s i s t s of the empty name and construct,

~R2~

A spot can be defined as a sequence of

the

which the spot is considered.

reference So with the

symbols of fig. ~ the c o n s t r u c t in question appears at the spots

(-,c,)

(home address,c2)

(-,ci)

(place of birth,c3)

(-,c,)

(branches, c s)

which are spots in cio construct.)

(city,c3)

(-,c~)

(The lower case c~s stand

The same c o n s t r u c t

for

the

respective

also appears at the spot

(-,c2)

(city,c~)

in c 2 and

(-,c5)

(-,c3)

in cs.

Another example is c 7 which appears in c, at the following two spots:

(-,c,)

(ho~e address~c 2)

(-,c,)

(date of birth,c 4)

It turns outs

(street,c6)

(number,cT)

(year,c,)

that the concept of spot is e s s e n t i a l

for the discussion

and u n d e r s t a n d i n g of some s o p h i s t i c a t e d

aspects in data base management

systems,

the

not least

information

Fig.

those

(constructs)

2 and 3 show,

always c o n s t r u c t s

concerning

and data

a t

system.

information

models

between

by the way, that in c a n o n i c a l graphic r e p r e s e n t a t i o n s p 0 t s

spo% structure is hierarchic, hierarchic

interrelationshi~

(representations).

But

it

are depicted.

one sigh% be is

obvious,

(in hierarchic,

network,

As by definition

tempted that

in

to

label

I~C

any a

a 1 1

existing

etc.)

the spots

relations,

form h i e r a r c h i c trees. So

far only individual c o n s t r u c t s

have been considered.

types or d e c l a r a t i o n s has been said nor used tacitly. is a set.

But not any set is a type.

determined

what are the e l e m e n t s

we focus on ~ e ~ _ ~ f constructs.

In

the

constructs world

First

of

of such a set. (Gebildetyp),

Nothing about

A type in general

all,

it

has

to

be

In the present context thus the

elements

are

of data base management systems instead of

27

"element"

the terms

"occurrence"

or "instance"

of

a

type

have

been

adopted. But

not

even

constructs

any

set

that only constructs for exchange. be

of constructs

has to be declared

the

specifies

"understood" are made,

via

is a construct type.

considered

channels

by interpretation.

should be called

type(s)

As only representations of

an

what constructs a "type

information

system,

can

a type

and

definition/declaration

is often called a "data definition

one

sloppy terminology

of

are admitted

can

be

in which type declarations

but unfortunately example

saying

of constructs

will be represented

A language,

A type of

communication,

which belong to the specified

Sore precisely:

exchanged

declaration

for a

language,,,

language".

This is

which is so characteristic

for the

field of data processing.

Not even "type declaration will

be

shown

below,

representational

level).

language', would be sufficiently

also

other types have to be declared

Therefore,

is a "construct type declaration composition

of

declaration, applied example

in

constructs

a graphic analogy

to

box

in

the to

representation, occurrence

the This

if

by others is specified

in a recursive type

the

type

definition

canonical

construct

of a particular type

in

particular.

the

be An

5, an occurrence

where in both figures the small

~[R@__~es~nation

emphasis

can

representation.

is shown in fig.

in fig. 6,

"type

language

plate"

is

a place for inserting (Typenbezeichnung)

also

used

in

as

the we

construct

is put on the fact that the construct

is

(cf. fig. 6 and 10).

It would be beyond the scope of this paper to discuss involved

(on the

such a language

As far as only the

upper righthand corner provides say.

speaking

As

(CTDL).

for a graphic type definition

name of the type or prefer

strictly

language"

construct

of that type is represented

precise.

the

aspects

concept of type in general and of construct

types in

The one or the other will he addressed

all in

the

following

paragraphs.

After this very short outline,

concepts to talk about model information

and a canonical

technique

representation

type has been emphasized

because

guestions of representation

of

are available.

its

to be discussed

great

The concept of

importance

for

in the next section.

the

28

~. Data as r e p r e s e n t a t i o n s For

convenience

the

term

" d i g i t a l data" i n d i c a t i n g which

consist

(pictures,

of

"data" that

characters

sounds,

etc.)

is used in the following instead of

only (cf.

are

not

representations

are

[DIN]).

representations

Other

investigated

considered

with regard to their

r e l a t i o n s h i p to information.

R e f e r r i n g to the c o n f i g u r a t i o n of two offices with (fig.

I),

let

the

piece

of

paper

on

r e a l i z a t i o n of a c o m m u n i c a t i o n channel. addressee

three,

that

one

agreement

seven",

or

A multitude

all r e p r e s e n t a t i o n s

there

might

of such

communication. irrelevant,

of

to

So

paper

in

text

taken the

carefully

as "number

for

shape

etc.

granted of

the

in

everyday

c h a r a c t e r s is

On the contrary,

between d i f f e r e n t fonts,

is

default in m a t h e m a t i c a l

literature.

beginning

Or:

In many of

in other places it is.

e x a m p l e s may show that the r e l a t i o n s h i p

and r e p r e s e n t a t i o n make possible

you

because they

meaning which usually is agreed upon at the

or

might

on the c o n s t r u c t level even

and a "plain seven"(7),

usual

~ + 3

and not be interpreted

a difference

are

according

languages the i n t e r s p e r s i o n of blanks in some places is

no relevance,

two

be

agreements

distinguish

programming

These

So

might be i n t e r p r e t e d as "number

but in m a t h e m a t i c a l texts it is not.

a different a

the

The example suggests the

between the c o m m u n i c a t i n g offices.

between a "bar seven"(~)

have

be a

whether

a c c o r d i n g to another agreement the r e p r e s e n t a t i o n

seven",

between

appears

The question is,

two or one construct.

be taken for an a r i t h m e t i c e x p r e s s i o n

have

channel

the i n t e r p r e t a t i o n of the various r e p r e s e n t a t i o n s is the

subject of a g r e e m e n t s to

a

fig. 7

i n t e r p r e t s the five r e p r e s e n t a t i o n s there as r e p r e s e n t a t i o n s

of five, four, answer,

which

(data)

has to be e s t a b l i s h e d

mutual u n d e r s t a n d i n g

between i n f o r m a t i o n

in advance in order to

in c o m m u n i c a t i o n

via a channel.

What

are the p r o v i s i o n s to be made? For a c o m m u n i c a t i o n background

of

to

be

possible

understanding,

r e p r e s e n t a t i o n s onto constructs. agreements

may

be

there

i.e.

a

must

be

a

prior

predefined

mapping

In the course of c o m m u n i c a t i o n

used to extend this cemmon background:

common of

further

One office

passes the d e c l a r a t i o n s to the other, the latter one accepts or rejects them.

The d e c l a r a t i o n s c o m p r i s e

29

- construct

type declaration

- representation

Construct The

type declaration.

type declarations

construct

communicated

were discussed

type declaration

via the considered

in

determines channel.

the

preceding

the constructs

The construct

type declaration

language is a part of the above mentioned common

background.

The representation

a

type.

It

constructs

what

are

type

we

arrive The

at

the

An example

may illustrate

representation intuitively.)

Fig.

to

of

ccnstruct of

channel.

occurrences

of

x~presentation

a

~

language

(RTDL)

mentioned common background.

the relationship

be

Although

necessary

indication

to

the

type declaration

type and their respective

are not

declared

representations

in the regarded

of

concept

representation

is a further part of the above

been

to

admissible

the set of all representations

(Darstellungtyp).

languages

refers the

of this type which can be exchanged

Considering given

type declaration

determines,

section.

which can be

discussed

between construct

occurrences.

here

and

should

it is a very simple example,

depict the ideas presented

type

and

(The used ad-hoc be

understood

many figures have

sc far,

which gives an

about the magnitude of usually implied declarations.

8 shows a declaration

MONTH-NAME,

of the four construct

YEAR and DAY-NUMBER.

types

CALENDAR-DATE,

The latter three are types of atoms,

the first one is an aggregate type.

Additionally

the type

composition

is shown in IMC representation.

Fig.

9

shows

MONTH REPR, the

a

pertaining

YEAR REPR,

declaration

construct types MONTH-NA~E,

DATE

PEPR

is

the

of four representation

and DAY REPR are the representation YEAR,

representation

and DAY-NUMBER, type

for

the

types:

types

for

respectively. construct

type

CALENDAR-DATE. In spite of the extensive remain:

The character sets to be used,

the medium

(paper e.g.)

to the pre-existing Fig. of

declarations

common

course

of

the

assumptions

the arrangement

and other details.

component

type DATE REPR.

of the construct types)

and

still

of characters

on

They all have %o be counted

background of the communicating

10 shows two occurrences

representation

many implicit

offices.

type CALENDAr-DATE

some

occurrences

of

(and the

30

This example suggests that the concept of format belongs to the concept of representation that

only

type.

one

type.

Up to here the assumption has been maintained,

representation

This restriction

of representation declared

type can be declared for each construct

should be dropped now.

If multiple declaration

types for one construct type is provided,

representation

types

close relation to the common use of this term. example of fig. could

9,

declare

representation

representation

of constructs of type

formats,

one "key-word"

It

be

and

can

explicit working

types

(=

above

type DATE HEPR we

formats)

CALENDAR-DATE

in

(two

for

the

"positional"

format).

observed that the separation of construct type declaration

representation in

declaration

(Format)

Referring to the

instead of the one representation

three

each of the

could be called a ~_m_a~

type

existing

decoration

systems.

The

is often simultaneously

area

format.

(=format layout

declaration)

of

the

construct

the specification of the input

This might be a reasonable economical

But to understand the relationship

is

between

information

and

not type and

approach. data

one

should be aware of the double function of such a "data definition". Applying

the

view which has been presented sc far of the relationship

between

information

(representation information

(constructs

and

and

representation

between

two

offices

construct

types)

we

types) outline

via one channel:

and a

flow

properties

(e.g. from a data base).

Office B finds the specified construct

representation

identifies the type of it,

of it),

of

the construct in question into the channel.

regarded channel,

type

(i.e. a

chooses one of the

type declarations and puts

conforms to the representation

of

An office B may be

requested by an office A to retrieve a construct with given

pertaining representation

data

a

representation

As this representation

declaration

established

office A is able to interpret the data

for

the

(knowing the

representation type and construct type). Some

reader

argumentation

might have noticed, is missing,

that in the CALENDAR-DATE example an

why the representations

details of the represented cons%lucts not necessarily processing,

so,

it

because

it

and not the construct. in a representation

only

(cf.

corresponds

fig. to

is %he representation

do not show all

10). Actually, the

practice

the

this is in

data

which occupies storage,

More extensive representations could be provided

type declaration

less extensive declarations,

for

various

etc.). Of course,

capacity of the involved channels

(storage).

reasons

(security,

that would require more In any case the question

31

arises,

whether such a "representation" is really a r e p r e s e n t a t i o n of a

construct.

Strictly speaking,

specifications,

it

r e p r e s e n t a t i o n is there.

Therefore

shows only the ~ a ~ i X ! ~ ! _ _ ~ construct,

is

not.

together

of

the

represented

in "input data")

This leads to the idea,

the

use

definition"

of

the

word

"data"

can partly be justified:

representation

type

in

the

be

entirely

clear

by

that

term ',data

The "data definition" defines

declaration

now,

With this in

criticized

the admissible data,

admissible individual parts of construct representations. should

that

usually means individual part of the full

r e p r e s e n t a t i o n rather than the full r e p r e s e n t a t i o n itself.

its

all

a full

because the r e p r e s e n t a t i o n a l part common to all occurrences

(e.g.

mind,

with

a r e p r e s e n t a t i o n in the a b o v e sense

(Individualteil)

of that type is in the type declarations. da~

Only

which allow the interpretation of the construct,

the

omission

in

i.e.

the

However,

it

of the word "type" is

misleading.

5. Practice oriented remarks

In this

concluding

section

some

applications

of

the

ideas

about

i n f o r m a t i o n and data as discussed above shall be tried.

First

a

preliminary remark:

system of IMC has been offered compete

with

other,

misunderstanding.

IMC

about information,

view

on

as a new proposal of a known

data

models.

data

That

that the model

would

%o

be

a

aiming to he a c o n c e p t u a l tool for speaking

on this level comprising the various

N e v e r t h e l e s s it is a specific

well is

There might be the impression,

c o n c e p % u a 1

data

models.

model and as such offers a

model information which allows to form a wariety of

i n f o r m a t i o n structures,

but has its own limitations,

too.

It is not the task of this paper to outline the features of hierarchic, network,

r e l a t i o n a l or other data models.

in

context,

this

so-called

to

Hut it might be of interest

what these attributes refer.

They refer %o %he

"data structures" which can be established

in a system of the

respective

model and which are supported by the

system's

functions.

With the t e r m i n o l o g y introduced above

we would of course say

" i n f o r m a t i o n structure', instead of "data structure" structure

in

representation efficiency, communication

our

understanding

as

structure

security,

or

purposes

the

any

goal

possible

else

of

structures

as meant here.

of

normally is left to the implementor,

manipulation

the

Data

information

in order to achieve this of

nature. constructs

For and

32

related q u e s t i o n s c o n c e r n i n g

model i n f o r m a t i o n are of main interest:

what levels of a g g r e g a t i o n are nominations what

are

the

restrictions

or

collections

for the nesting of constructs,

special generic types adjusted to the

application

in

On

available, are there

question

(e.g.

"relations",

which in terms of IMC are c o l l e c t i o n s of equally domained

nominations,

called c o l l e c t i v e s

orientation

in

extensive

address c o n s t r u c t s other

questions.

(Kollektiv)),

constructs,

what properties can be used to

(independently of their representation), The

answers

to

these

p e r t a i n i n g o p e r a t i o n s on the c o n s t r u c t s hierarchic,

It

is

a

network or r e l a t i o n a l

matter

of

course,

i n f l u e n c e d by r e p r e s e n t a t i o n of "redundancy" benefits

and

clarified, but

to

chance)

appearance

are of

are of relevance.

of

redundancy.

~ @ ! _ _ § ~

construct

"consistency

(cf.

constraints"

But

it

has

to be

constructs,

of

appears

an

embracing

(Parallelstelle).

type

that a

(necessarily or If

declaration

hy

the system it

will store the r e p r e s e n t a t i o n of the c o n s t r u c t each time it appears

(at

a

parallel

spot)

to be. that

or

It is c o n c e i v a b l e the

same

with the RESULT

(usually once).

The more often the

the higher the degree of redundancy is in p r i n c i p l e

technique

consistency-conditioned

the

less often

is stored,

decide,

Once a

whether

representation

is free to

this

so-called

the SOURCE clause of [DDLC]).

offices)

the

r e q u i r e d

s p e c i f i c a t i o n of this kind has been established,

(as one of the c o m m u n i c a t i n g

be

problem

It has been shown,

at several spots is

has to be specified in the

consistency

The

does not refer to the level of

construct

to

It is not intended here to consider the

at which the same c o n s t r u c t called

a

model

also e f f i c i e n c y and other aspects

techniques

disadvantages

Spots,

many

together with the

data

may appear at several spots as a component

construct. by

the

that r e d u n d a n c y

a

and

(or something else).

that

is one of them.

questions

render

the level of their representation.

construct

what is the support for

could

(and actually is done sometimes)

he

applied

p a r a l l e l spots.

feature of [DDLC]).

said

also

for

other

than

Such a s i t u a t i o n is also given

On the model i n f o r m a t i o n type level

RESULT clause specifies that the atom at the s p e c i f i e d spot is the

result of the e x e c u t i o n of a specified procedure, at other spots as input. additionally

is

In both the

specified,

SOURCE

which uses c o n s t r u c t s

and

the

RESULT

clause

whether a r e p r e s e n t a t i o n of the depending

atom is m a i n t a i n e d p e r m a n e n t l y

(ACTUAL)

by the system,

or is made up

only when r e q u i r e d for passing it via the c o m m u n i c a t i o n channel to r e q u e s t i n g office causes

redundancy.

(VIRTUAL). However

In the strict sense, also

i n t e r p r e t a t i o n of the ACTUAL and VIRTUAL

another,

the

the ACTUAL feature less

restrictive

feature is conceivable,

where

33

the

system still remains

assumed above)

free to follow the s p e c i f i c a t i o n

Doing a closer look to the d i s c u s s i o n of redundancy one encounters

a

(the "system")

is a

unit with a storage as a private channel fig.

11

is

configuration

often

preferred

containing

(input channel,

two

stated.

representations) RESULT

rather

than

are

the is

a

With

this

what is the object channel

which

the

As a matter of fact this is seldom clearly

input format declaration

(e.g.

sequence of atom

(e.g.

SOURCE feature,

made up to one complex declaration package,

d e c l a r a t i o n into the same package.

well known under

1.

a diagram

we have also three places to

complexity of which is still more increased by

"optimization"

fig.

and data base format declaration

feature)

functional

If we consider a r e p r e s e n t a t i o n tyFe declaration,

is applied to?

In particular,

To show explicitly

computerized

channels or still better three channels

the question has to be answered, declaration

configuration

(the "data base"),

data base, output channel)

represent constructs.

type

(in the context of

system

is a slight modification of that used so far.

that one of the offices

like

(as

or to understand it only as an efficiency constraint

data base management systems) which

verbatim

label

"schema',.

minimization

of

packing

the

construct

Such d e c l a r a t i o n packages The

consequence

of

the

are

such

an

the number of characters to be

written by the programmer at the expense of

quality

of

software,

in

particular of clarity.

Finally

some

remarks on the relationship between information

on %he one hand and their manipulation appropriate. or their

on

the

other

hand

and data might

be

If would be an obvious question to ask whether constructs

representations

are

r e p r e s e n t a t i o n s can he handled,

manipulated.

Strictly

speaking,

as was stated previously.

only

But so-called

data

manipulation languages do not refer to the r e p r e s e n t a t i o n a l level

only.

Primarily they are designed for the manipulation of constructs.

This will be illustrated by an example of the retrieval of a construct: The properties which are specified as parameters of a request refer a

construct

rather than to a r e p r e s e n t a t i o n of it.

to

The delivery of the

found construct is done by putting it into the respective channel in an agreed representation, is "navigation".

i.e.

meeting the output format.

This term refers to moving from one spot to the other

in an e x t e n s i v e construct.

Also here no reference to the r e p r e s e n t a t i o n

of this c o n s t r u c t is involved. some r e p r e s e n t a t i o n at.

Another example

Only upon request

of the construct

(at the spot)

In case of a data base management system,

the

navigator

gets

where he has arrived

he does not receive the

34 representation

on which the retrieval has been performed,

representation.

A counter-example,

representation in the data base in the output channel Although

a

information,

this

implementor,

does

the

user

has

reguirements. time

exert

language refers %o the level of model

not

imply

representations

accessed in order to execute several

representation

and

interests access.

to

way

application

of

adequacy

and

resources

will decrease. from

manipulation given

to

computer

computer

and

level.

a

efficiency.

concepts,

security

However,

of

update /

compromise

between

in overall

computing

information

differentiation

hand

computing

(traffic density, balanced

facilities to system interfaces,

view of inforaation

to

A good choice of

More and more it becomes evident,

includes to support conceptual presented

cost,

functions as well as a forecast

the involved people and the intended

to this goal.

On the other

the influence of storage and biased

access

it is up to the

which refer to storage and

should yield

considerations

actual

also the policies of

time,

acting in the future

etc.)

to

move

influence

has

some influence to the information

user's

no

B~t again,

in what way he has provided to be

He

These requirements

retrieval ratio, efficiency

that

manipulation commands.

construct types and of manipulation the

where the

is the same as

(librarian's counter).

takes place in the system.

which

is a library,

(room with book-shelves)

~'data manipulation"

representations

however,

but an output

time

that we have

stractures

and

where more preference is application. wherever

This goal

useful.

The

and data is intended to be a contribution

35

References

[DIN]

DIN/Fachnormenausschuss 44300 "Information Institute

[ANSI]

ANSI/X3/Sparc/DBMS

Study

GMD/Arbeitsgruppe the description (German).

[ PZT ]

Prozesse".

[DURI]

R. Durchholz

and

[DKR]

Beschreibung

Verlag,

"Concepts

T. B. Steel

"Data

Jr.,

IFIP-TC-2

"Abstract 10/5,

(German).

Datenbanksysteme,

E. Falkenberg

base

J. W. Klimbie

Conference

(German).

a "A

status

technical

1975 Elektronische

G. Richter, und

"Design of a data programs

(DAGS)"

Systementwuerfe

und W. Klutentreter,

fuer (Hrsg.),

1974

Description

CODASYL

data

197~

basic system for application Datenmodelle

Report".

for

Namur, January

Objects"

In:

CODASYL/Data

1967

1968

W. Klutentreter,

GMD, St. Augustin,

diskreter Haendler,

standardization

Special Working

Rechenanlagen R. Durchholz,

base

of the DDL",

H. Zemanek,

for

systems"

Basel,

Data Base Management,

(eds.), North-Holland,

base management

[ DDLC ]

zur

Birkhaeuser

In:

"Terminology computer

ueber Aufomatentheorie,

G. Richter,

systems".

American

1971

and K. L. Koffeman,

report".

Report.

fuer Betriebssystemnormung,

(Hrsg.),

DIN

German

1975

of models of job processing

in-depth evaluation [ZEM]

Interim

February

"Grundsaetzliches

Unger,

management

[STEEL]

Group,

In: 3. Colloguium

(~NI),

(German).

March 1972

Institute,

GMD, St. Augustin,

C. A. Petri, Peschl,

vocabulary"

for Standardization,

National Standards

lABS]

Informationsverarbeitung

processing;

Language Committee

DDL Journal of Development,

(DDLC), June 1973

"June 73

36

a~nd

I,,,office ..... %_______

Figure I

_

Configuration of con~unicating functional units

office

office

"user"

"system"

Figure 1!

office B

Extended configuration of communicating functional units

37

name

f•ly

home address

~

iJACKSON I

city

~

I HOUSTON1

~ ~

street

first name

FOHN BiJ

~

street name

place of birth

[ HOUSTON ]

[JAckSON

date of birth number

~

~

year ~ m o n t h

i~71 day branches

[WASHINGTON 1

LOS ANGELES]

[ANN A~oR, 1 t HO~-'STON ]

Figure 2

Constructsin iMC box representation

_

~

Figure 3

..........

.o~)sTo,.

~ % jhumber ~

~)street

i) home address .....

t

.....................

"

- -

/

~ /

X

. ~

/-~hvear f ~onth ~ d a y ~ y q ] ~ j ~ ~__

~"'~lace 7f~irth

.-......] 1 F~os ,,.,~,s]--

fir.~ame Sz~,e

"[ ..................

1 | / ~ branches

t

I ranmalmly~ name~

Constructsin IMC tree representation

streetf-~ name ~ ] -__ "~

k~ic i t y

f ~ ~

¢O O0

39

, ,, /C?. ~

name

f•iy

homeaddress

/

FJAc~SO~

city

~

C3

C~

first name

[JO~N '~-I C6 _

0 s<eeti

_

~

place of birth//

c3

1H o u s T o ~ ] ~

~c~

date of birth f

\ C7

1~7 day branches¢ ..~.,. IWASHINGTON ]

[~os A~G~Es I

j~.,~

[CAMBRIDGE

[ANN A~BOR ]

I~{ousTON

C3

Figure 4

Construct representation of fig. 2 with additional lettering for reference purposes

--c 5

40 EMPLOYEE

----

~ Figure 5

~DSCR

SKILLS

MBE R

Jt

Graphic construct type definition

EMPlOYeE

PERSON

¢

SKILLS~

....

IsKILLCODE I 1120 . J. WA=TERS ]

I

,ISK~LLCODE 1135

NUMBER

5 7 8 ~ Figure 6

Occurrence of construct type defined in fig. 5

41

Figure

see n e x t page

7

construct atom:

JANUARY,

construct atom:

FEBRUARY,

... D E C E M B E R

type Y E A R

1900~INTEGE~

construct atom:

type M O N T H - N A M E

1999

type D A Y - N U M B E R

1~INTEGER~31

construct

type C A L E N D A R - D A T E

nomination:

MONTH

--> c o n s t r u c t

type M O N T H - N A M E

YEAR

--> c o n s t r u c t

type Y E A R

DAY

--> c o n s t r u c t

type D A Y - N U M B E R

non-occurrences:

MONTH

DAY

FEBRUARY

3O

FEBRUARY

31

APRIL

31

etc. CALENDAR-DATE

,•MONTH

YEAR

0

atom

~A_Y-NUMBE__R om....

Figure

8

Construct

type d e c l a r a t i o n s

42

representation

type M O N T H REPR

r e p r e s e n t e d c o n s t r u c t type M O N T H - N A M Z string:

1

or

JAN --> a t o m J A N U A R Y

12

or

DEC --> a t o m D E C E M B E R

r e p r e s e n t a t i o n type DAY R E P R r e p r e s e n t e d c o n s t r u c t type D A Y - N U M B E R string:

DECIMAL representation

representation

type Y E A R R E P R

r e p r e s e n t e d c o n s t r u c t type Y E A R string:

DECIMAL representation

representation

type DATE R E P R

r e p r e s e n t e d c o n s t r u c t type C A L E N D A R - D A T E string: (DAY R E P R "-" M O N T H R E P R "-" Y E A R REPR) or (YEAR R E P R "-" M O N T H R E P R "-" DAY REPR) or ("D:" DAY R E P R /// "M:" M O N T H R E P R /// "Y:" Y E A R R E P R

Figure

9

Representation

; delimiter

",")

type d e c l a r a t i o n s

4+3 SEVEN seven

Figure 7

Five c o n s t r u c t r e p r e s e n t a t i o n s

on p a p e r

43

I'CALENDAR-DATE

DAY0 YEA~0 l DAY-N~M~'4 ' ] 19G7'YEAR1 MONTH 0

4-0CT-1967 D:4,Y: 1967,M:OCT

1967-10-4

I CALENDAR-DATE _ ~ MONTH

DAY_~

--1973 ]

< M:MAY,Y: 1973,D: 14

D:14,M:5,Y:1973 14-5-1973 1973-MAY-14

Figure 10

Construct type occurrences and representation type occurrences of fig. 8 and 9

Figure 11

see first page (fig. I)

Data

A®

Base

Eesearch:

Blase~

H.

A

Surve Z

Schm~%z~

Tiergartenst~.

IBM

Wissenschaftliches

Zentrum~

Heidelberg~

15

Abstract The

research

Most

of

models

activities

the of

issues

information~

implementation industry of

ac%ivl%ies

respect

area

of

tial

future

%0

da%~

OF

and

between

with

%rends

Introduction Models

3.

Data

Manlpulation

4.

System

data

modelling user

and and

and

data

data

systems

are

institutes

reviewed.

center

around

manip~lation~

system

and

Comparison

analysis.

requirements

development.

potentially

architecture

base

base

research

shows

emerging

are

principles

with

with

differences

Conclusions

and

aspects~

respect

drawn in

to

the

poten-

research°

Languages

Problems

~.

Storage

6.

Modelling

7.

Summary

8®

Bibliography

Structures and and

objective

and

Search

Algorithms

Analysis

Conclusions

INTRODUCTION

and

in

of

CONTENTS

Data

past

by

documented

research

des~n

I®

The

area

and

established

base

2.

1.

%he

interactive

±echniques

emphasis

TABLE

in

considered

of

present

192/

/49,

this

paper

research

is

primarily

activities

in

to

the

provide

data

an

base

overview

area.

This

over

~a--

45

per

does

not

er~

information

information

survey

retrieval

systems

of

such

an

introductlon

to

available

Ll~htfoot:

Jardlne

and

of

T

data

still

help

is a

or

have

been

such

a

The

the

scheme

an

first

our

shown

programs

is

seen

by

base

the

We

will

~

is

is

which

sical

or

internal

is

actually

we

can

The

use

are

the

of

between selec±

the

conceptual

conceptual

a~e

specified

in

the

in

conceptual the

never

in

subpart

a

and of

the

definition

external information

the

It

with

the

It

is

a

standard-shown

designer

IMS

in

through the

views

exist

of

a

serves

in

as

the

the

double

phy-

form~

help

of

All

of

The

these and

mappln~s

purpose:

sufficient

a

C[!),

administrator

language.

a

central the

Informatlon

[ mapping

mappings.

and

a

conceptual

with

the and

langua@e.

examplel

way

base

as a

at

syntax

base

of

to

installatlonT may

For

way

aspects

referred

"correct"

data

serve

the

or

mapping

neces~ry

informa--

legal

form

of

of

system

system

{fig.

is

information,

pepresents

the

the

data

mapping

and

of

usually

It

physical

internal

information

Is

reflects

Given

responsibility data

been

the

retrieval

information.

corresponding to

the

unconscioesly,

information

directly.

of

information

memory.

of

what

defining

used

vlews

of

type

grammar

is other

view

stored

construct

mapplng~

mappings

for

to

has

major

views

for is

~iven

a

describesv

view

(D.Ao

schemes

or

point.

group

For

specifies

point

a

to

during

responsible This

A

conceptu~l

Experlence

which

knowledge~

persons~

flow

level.

reference

by

similar

central

schema

The

J.A.

S[stems

accepted.

consciously

shows

administrator".

a

and

definition

widely

employed

very

data

group

to

in

commer-

addition

in

our

make

authors

the

information.

similar

Users

question,

is

scheme

conceptual

therefore

a

experience

conceptual is

the

Barnett

-- A

of

danger

interested

of

~{anagemen[ 1974}

already

I and

schema

{A,J.

the

ago.

and

view

IMS

Base

implemented~

fig.

as

of

iS

some

[IMS)

D~ta

the

decade

in

who

depth

Vurth--

aspects

aware

reader, In

Is

and To

who a

integrated

"data

debates.

book.

well

Amsterdam,

This

in

are

System

architecture

concept~l

information

the

of

nearly

is

The

stored,

system?

software,

survey.

(ANSI/X3/SPARC}

mappln~s~

tion.

this

~ which of

such

[n

base

non-compute~orlented

the

systems~

~olland~ in

WedekindVs

scheme

data

base

Management

subject

scheme

group

Date's

to study

No~th

base

simplification ization

recommend field,

to

data

We

the

referenced

a

and

addressed.

Development.

editor)

Is

systems not

Information

litemature

available

and

data

Evolutionary

What

are

limitations

cially

with

commercially

for

{a} a

to

spe--

Fig.

of

a data

base

parametric interactive application programmer data base administrator

external conceptual internal

I :Structure

Users PU IU APR DBA

Views E C I

APP

APP

system

[<--

O

E

47

cific

use

subview {a)

of

%0

the

a

view~

represents

protection, tems

is

primary

scheduling

query

language

typified

two

by

data

is

{b}

"more

possibly~ natural"

purpose

v user

and

to for

is

isolation

the

a

of

transform %he

i.e.

%he

specific

importance

etc.~

slmilar fairly

via

high

the

some

is

incorporated

at

a

the

selected

use.

for

Polnt

reasons

essentially

of

a

a

of

for

higher

sys-

needln~

than

written

into

the

which We

a

the

query

a

user

and

manl--

experts.

well

This

defined

ac--

interacts

language

language

-- s u b l a n - -

manipulation

programmers

wlth

programmers

manipulation

host

data

application

by

the

vlew

data

application

data of

talk

help

This by

users

performs

performs

have solver

language Ke

who

structure.

general~

level

view.

we

problem

query

without

sublanguage. In

The

programs

First

interactive

user"y

simple

language as

consider.

conceptual

level

application

rela%ionship.

%fen

to

the

"parametric

~rogramming

in

called

to

parameters

system

gu~ge

also

groups

"non-DP-professional".

for

with

end-user

user~

very

different

%ions

major

the

at

pulation

is

and

aspects.

ape

is

base

which

%he

There

of

data

language

data

manlpula-

language.

In

practlce~

ly

to

data

be

large

used

base

by

only

problems

the in

preven%ion~

of

one

management

to

access

amounts

person.

system

s±ored

stored

information It

that

it

wlth

reeovery~

therefore

allows

information.

connection

is

are

protect[on~

unlike-

and

in

a

concurrent

a

number

schedullngT

efficiency

to

requirement of

creates

Integrity~

and

a

sharing

Concurrency

system

extremely

of

deadlock

solving

all

these

problems.

While

commercially

plications pert

of

large

number

used

the

for

language

the

rage the

problem activities

conceptual

of

structures

trends

Sections

research

modellin~

contain

research

in

some

the

research and

search

~easuremen%

conclusions and

tO

major

with

and

2

9

and

describe

to

of

IMS)

a

are

devoted

system

and respect

problems

to

deserving

6

will

efforts primary

and results

research.

level

query

5

data

we

wlll

describes as to

section T

and 4

such refer

a

model

models

Section

find

data

section

techniques

Section

to

SUp--

we

high

data In

aspec±s.

analysis

ap-

on

the

user

research.

implementation

algorithms.

towards

single

this

support

concentrates

Correspondingly

oriented to

p~imarily

research

solver.

view

area

(like

users~

interactive

languages

contributions

systems

parametmic

of

system.

manipt*la%ion discuss

employed

Involving

a

sto-

some 7

of

will

recognizable

48

2.

DATA

The

conceptual

ence as

MODELS

in

a

data

close

for

a

as

view

has

base

management

possible

conceptual

model

between

a

of

~s

system.

in%ultive known

in

notions as

The

information

the

a

of

vlew

how

world.

conceptual

A

refer-

should

information.

models.

of

point

such

of

data

Peal

~nd

central

Cle~rly~

possihili%ies

exists

world

introduced

ape

set

which

Peal

%o

view

provides

information

been

Proposals

conceptual

to

encode

Of

course~

be

data

conceptually The

information

is

mapping not

for-

is

that

malized.

Closely of

a

connected

extremely sors~

S

sets er~

has we

visor

a

number

are and

exactly shows

one the

biasing

data

theory.

CODASYL /124/~

One

and

we

~. I.

CRM

gree.

model

information A

n--ary

D1

where such

the as

relation

the

Di

D2

are

x

a

a

to

be

a

of

close

name.

Furth-which as

ad-

taught

students.

to

these

and

ks

an

pPofes-in

professor

course

n~mber

of

P

advises

one

a

consider

objects and

he

and

~*~

a

by

Fig.

reality

2

without

Of

less is

based

the

s±imulatln~ andv

by

and and

are

Sibley as

Codd

the

Information

ideas

Ash

E.F.

on

in

a a

notions

of

Algebr~

of

due /3/.

stimulus series

of

to

Mealy

The

most

for

data

papers

to

subsection.

{CR~)

set

finite

[39--41~

of

43/

named

subset

of

rel~tions a

of

caPtesian

assorted

de-

produc~

Dn

potentially

ks

or

acceptance

finite is

values

set

of

next

numerical a

by

more

/74/,

Model

is

set~

exactly

has

attempts

developed

relation

x

all

and

terms

~elational

the

us

sets

the

students

courses

attempt

earlier

Rovner

devote

will

of

have of

model

let

model.

are

been

within

attended an

We Each

which student

models

An

has

Codd~s

data

%he

Feldman

research

which

of Other

unique

A

and

models

/34/.

successful base

any

courses.

number

in

purposes

situation®

of

data

conceptual

world

him.

professor

a

C is

a

Information

Conceptual set

by

attend

towards

of

illustration

professor

every

Tough± may

and

~ which

for

notion

Fo~

real

students~

know

the

schema.

simplified of

courses

In

with

conceptual

or

infinite string

n--tuples®

To

sets

values, any

of

"scalar"

in

other

wordst

the

elements

Pelationv

data

values a

n-sPy of

%he

49

professors

P # :I PN :A

courses P # :2 PN :B

~

C~ :I CN :M

students

--•C#

:2 CN :C

S~ :1 SN

:L

~

S~ :2 SN

:L

C ~ :3 CN :0

S~ :3 SN

:M

teaches

professors

advises one : many

students

courses

I~" many

Fig. 2:Example situation and schema

: many

50

tuples

~re

tion the

n~med

with

is

homogeneous#

s~me

attribute of

example

information

A

P#

in

contains

The ed all

of P

only

such

relatlons in

fig.

are

elements

al

in

any

be

modeled.

of

other

in

tuple

are

(a)

build

the

can

advised

by

one

is

of

3~

reference.

the

same

This

the

A

rela--

relatlon

allows

a

have

tabular

representation

used

P~

of

in

set

professor

to

some

the

reference

appear

to

the

first

form.

normal

{and

not

sets in

between ways

to

thls

(or

set

examwhich

indicat-

means t or

that

structur-

information

and

can

p~ofessers.

information:

their

wlth

that

For

schema

lists

%hls

ele-

P.

This

way

dis--

key

domain

in

students

store

students

store

or

the

Two

value

relation.

corresponding

all

and

a

their

P~

two

of

is

consequences

least

refer

another

in

as

~elationshi9

at

S

references.

values %o

or

which

scalar has

and/or

keys

same

but

~ and

a~e

the

there

be

the

socalled

This

Consider

fig.

values~

fig. in

way).

Principally we

in

domain~

all

a

of

them.

different

actually

integer

shown

4

with

may have

tuple

key

ease

elements

in

relation

domain

a

shown

relation

another is

two

for

CRM.

a

a

PefePnce

element

play

in

any

names

associated

as

in

wl±hin

tuples

ments. is

relation

domalas

tinct

i.e. names

listing

Some

a

attribute

unique

numbers}

the

professor

his

unique

tuple

o~ {h)

we

which

store

form".

The

every

the

case

relationships are

these

tlon.

to we

However~

therefore relation "thlrd cies.

SC.

The

Codd

professor

[a)

in

professors

%he

the has

relation

of

to

number)~

"firs±

or

normal

%o

/41/~

to

many

normallzato

many

and

relation, the

remove

Date

and

students].

satisfy

Is

additional

an

serve

Codd

pro~essors ~

normalizationsT

essentially

referred

in

advises

relationship

further

be

students

professor

converse

defined

[or not

form.

and

one

int~oductlon

which

Is

would

thls

student/course

ferm"v

Peade~

is

(i.e.

s¢o~e

the

requires

normal

(b)

many

the

Case

between

one

cases

student

student.

For%unately~

courses In

with

advises

the

"second"

some

/49/,

and

redundan-

or

Wedeklnd

/192/.

The

advantages

those7 ers

who

with

ics. tlon~

Since

a

o@ are

background a

relative

relations

used

agree

relation

CR~ to

are

its

in ls

the

set~

complementatlon In

domain

in

tables~

elementary

a

simplicity

apparent

"think"

set etc.

names.

in notions

and

of

operations can

More

its

particular

like

immediately Importantiy~

for

discrete unlonT be

appeal

%o

researchmathematln±ersec-

applied

pro3ection

if may

the be

51

P

S P#

PN

2

I

A

L

I

2

B

M

2

S~

SN

P#

I

L

2 3

SC

C

c~

P~

1

M

2

2

C

I

1

3

3

0

I

2

3

CI~-

Fig.

3:Normalized

C~{ r e l a t i o n s

S (S~ int,

SN char,

P49 int)

P

(P@6 int,

PN char)

key

C

(c@int,

CN char,

P]~ int)

SC

(S#~ int,

Cj~ char)

key

Fig.

4 CRM schema

key

(SJ#

r ef

(P~ t_.ooP . P ~

(C~

ref

(P~ t_o P . P #

)

(P~#) key

(S# , C}% ) ref

)

(S~ t_oo S.S.~ , C}% t_q C . C #

)

52

applled bined

to

relations

using Or

duct

well

Pierce

minology. viewed

es,

the

shown

The

stoned

%hey

to

example,

no

COBOL?

PL/I~ to

ponds

has

used the

a

ALGOL68y most

to

that

±o

of

be

pPedlcate

in

ter-

Codd~s may

calculus

investigated

compro--

relation

"relational

subject

obvious

is

Join

every

may

cartesian

both

be

may

be

approach--

calculus"

and

has

/41/.

are

or

has

the

as

called

o~der

Codd

we

and

structure such

approach

first

and

been

is

which

PASCAL

example

and

equivalen±

intuition

portant

generalization

~elations.

It

different

composition

different

equivalent

43/.

of

of

algebra"

model

/20,

structures

for

new

are

relational

a

predicate

define

that

oN

slightly

"relational

erations of

a

to

a

relations methods

product~

In

as

applied

and

known

It

a

within

a

of

not

structure

the

which

the

consid-

full

science.

record is

used.

violating

critical

offer

computer

hierarchic

frequently

structure

number

does

This

"first

Pang,

It

has,

organization

of

simpler

corres-

is

one

only

normal

form"

imcon--

dition°

The

example

shown

how

one:many to

a

of the

%he

rela%!onshlp

schema

constrain%

many:many

is of

between

~ffected

the

by

professor

constr~in%~

a

students

and

constraints. -

student

completely

professors

Per

example~

relationship

new

relation

the

fact,

is

has

to

has

if

the

relaxed

be

intro-

duced.

Another tion

consequence has

%o

be

of

no

Interest

often

example, name

in

"M",

sor

number

we

allow

a

user

the

key

at

every

two

problems.

user

has

privacy well

to

For

but

for

FirstT know

that

In

the

projection~

in

the

projection

no

a

in

the

two

of

to

if

persons

with

thus invalid.

making One

can

get

%0

a

man's

thls

same

imagine

ways

student

has

the

name.

with

profesThis

may

which

not

does

may

with

is

not There

he

is

on

to

employee's

allowed

managem's

if con-

allowed

the

are

is For

ariseT

salary

statistles

the

his

mana~e~,

number

which

si%uatlons

user

salaries.

man

of

it he

informa-

information.

advisor

domalns~

privacy

the

any

to

compare

find

the

adviso~ this

basic

informatlon9 basic

particula~

of

that

Critical

the

number

Second,

the

of

and

with

order

the

way

a

salary

man

other

more

subset

reasons

of of

that

other

example~

associated

constraint.

happen

only

employee's

numbers

of

learn

howeverT

see

values.

name

%0 is

help

requester

%he

has

there

to

salary, man

know

is

the

the

tolerable,

rain

the

to

since

look

manager

with to

"system"

2~

considered

normalization

encoded

order

the

be

at

of

to

look

immediately salary~

the

contradicting projected

out

it

may

appea~

as

salary

distribution

around

the

one

the

first

%uple

prob-

53

lem

though

lem

is

by

in

SQUARE} are

for

McGee

wlth

common

the

implicit

computer Of

in

tions

for

mercially

Within less

in

model

[like

implementa-

model

P.

and

lime of

as

in

a

it

is

are

those

developed

model

/3S~

/1/

im--

clean, joins s

in

many

employed

of

Senko

is /15S/.

find

other recently

with

of

of

the

as

data

the

a

and

of

graph

model

by

relational

known

/20~

or

designed

binary

also

decom-

more

graph

graph

generally

model

to

model.

up

system a

as

synonym

all

shows

well

papers

PL/I. struc-

models

graph

discusses

DIAM

IBM

adapta--

a

Practically

model

sort

the

execution

are

now

graph

graph

of

1963)0

restrictions

on

~ Other

Science

data

ALGOL68

56/.

models %he

in

The

appears

activities

some

group

in

{as

during

based

research

{as

description

essentially

the

a

and

the

model

9 Amsterdam,

{1969)}.

McGee

model

model

We

formal

Some

papers.

the

Mathematical

of

3

is or

syntax

states

/I,

the

CODASYL

38/.

a

~eneral

are

of

section entities

objects

Holland

"Schema"

origin

this

abstract

PASCAL~

systems

of

an

the

syntax".

set and

and

that

their

over

abstract

6,

activ[tles

graph

37 s

graph

the

s llke

origins

activities

PL/Iv

this

base

The

North

On

In

graphs

interpreter

schema.

number

DBTG

opera%loon

•

model s in

a

entity

a

some

Towards

Walk:

to

Fehder

of

claimed find

discussed

Programming

Wabstmact

data

The

Swenson

K.

this

research

semantics"

the

COBOL,

in

or

case be

model

to

19627 the

model

Automatic

therefore

and

CongP.

influenced to

the

Schmid

file

special cannot

labeled

(McCarthysJ.: IFIP

dels

"data

of

conform

Altman, and

systems These

file

models

McCarthyWs

u

base

data notion

explicitely

/121/.

Astrahan,

prob-

Models

%o

specified

consciously

1968

second

structures.

to

available

data

though

data

back

extensions

refer

sets". flat

a

flat

have

of

"declaration

signers

The

experimental

It

goes

languages

be

of

turn

which

(Lucas

and may

in

entities

has

Reviews

tures

authors.

in

section.

CaM,

the

model).

work

programs

the

between

Laboratories

Annual

what

Data

Proc.

McCarthy's Vienna

in

relations

the

is

explicit

science

to

the

a homogeneous

this

kinds

behind

information)

schema

of

homogeneous

both

or

Computation.

of

the

idea

binary

to

"duplicates

model in

Oriented

The

named

This later

of

Graph

some

allowing

foundation

sense

2,2.

least

known

implementing

/122/o

theoretical make

at

are

examples

described

plementatlons

ways

by

actually

described~

model

elegant

solved

INGRES~ tions

no

mo-

known In

$6~

as

Abrlalls

63~

model

I$~/o to

es--

54

tabllsh

a

connection

between

relations

in

CRY

and

the

the

DIAM

real

world

/isi/.

Among

the

serves

earlier

special

ceptance

of

model,

the

data

base

attention

a

data

/155/0

base

It

system

set

model,

activities.

The

DIAM data

same

mathematical

rigor

than in

stressed

pure

The

subsequently

matical

The

notion

nodes

as

terminology

thing

"that

thought, Some 5,

7

eeg.

IABC'. the

relations

graph.

To a

the

is

related

the

model

We

will

model

and

to

fig.

I®

on

the

dlscuss

the

as

su~zh

a

the

that

its

world

more

DIAM

model

straightforward find

acdata

wlth

fact, real

a

the

standardiza-

the

in

de-

Its

defined

to to

not

mapped

the

entry

and it

possible

of

way

clean

to

mathe-

%o

can

stored

as

as

node entry

fact

events"

a

be

any--

or

/34,

finite

a

set

of

are

labeled

the

node

by

uniquely

entities

are

to

shoe

be

directed

drawing

nodes 6

only

(which

entry

fig.

may

in

in

155/.

identi-

entlties.

Since

between

other and

of

frequently

entity

being and

abstraction most

like

entities.

in

an the

An

of

between

is

of

with

concepts,

rel~tions

the

that

entity.

entitles

simplicity

sets

any

graph

labeling the

graph

over

C~H

we these model

fin--

nodes

in

edges

of

a

in

the

entitles)

we

information and

named

and

other

inter--

node

represent nodes

as

a

ed@es

with

the

representation

example.

of

the

via

in

graph

used.

notion

clear

an

represented

as

are

domains

explicit

be

unary

~

our

node

is

dlstlnction

model

relation

advantage

the

a

between

node

relations

consistent

be

denotations

graph

Fig.

for

model

To

relationships

For

name.

first

from

binary

schema

binary

its

been

be

and

types

of

node

relation.

clean

Other

given

unary

ween

impact

not

associations,

can

the

and

has

has

graph

such

unique

represent

prete

relatlon

call

relations

graph,

assume

~raph.

reality

help

In

blnary

we

have

or

the

a

objects,

with

Information

from

still

of

can

in

and

This

graph

of

in

has

entities

fied

A

described

essential

used

it

essentlally shown

model

formallsm. ~ince

as

effort

foundation.

objects

Its

here

contributed

closeness

mathematical

detail

more

had

CRM.

as

%he

activities

structure

tion

designers

entity

research

symbolic of

the

fop

TIpractical"

to

develop

model

This

removes

domain

entities~

reasons calculus

or

fact

that

to

names.

More

importantly,

is

sense of

the

need

which

mathematical

is

the

not

present

without

convenience.

algebra

oriented

the Like

only

distinguish

bet-

due

to is

in

CaM,

~t

need

%o

deviate

CRM~

It

for

languages

with

is the

55

C~

CN ~C

~N

?S

SC

Fig.

5 :Information

as a g r a p h

56

CN PC 4--

c-~

,s4-SC~

s~ ~_~

ich:I R~--~

one

R ---~ m a n y R 4}--

one

R --

many

Fig.

: one : one : many

6:Schema

: many

to

the

graph

model

int

57

.....

I O

sA~,

>1,, 2300

"SAL

]

2800

I ... 1950

Fig. 7:Subgraphof E, MGR and SAL

I

58

same

rigor

/12~/.

a

• urnish

user

restricts fig.

W

with

to

for

the

an

the a

other

science

does

force

side

subvlew

of

can us

be

to

does

data

Since

not

base

may

which

mapped

exclude

it

the

relationships~

lilustratlon,

computer not

On

seen

practically

these

all to

structures

difficulty

a

{i.e.

be

conveniently

provide

subgraph)

by

the

known

some

form

from

our

to

which

user.

See

structures

in

of

grephs~

high

level

it data

modeling.

2.3.

The

It

is

not

in

the

Equivalence

at

sense

all

surprising

that

in

ple

straightforward

respondin~ ween

of

the

language.

In

question

The

of

of

Bobrow

/17/,

models

are

by

Neuhold

of

creates

3 . i.

Low

Level

As

we

can

see

an

application

%o

as

second

the to

DBTG most

a

we

in

The

models°

DIAM

one

on

how

This

[s~ of

equivalent

model is

a

model

to

a

of

be

simCOP--

choice

bet-

"convenient"

or

however~ the

therefore

can

there

question

question will

or

are

cases

schema

decided

also

of

data

Sihley

same

for

First

a

and

a

while

system.

This

a

has

models /167/

for

superimpose

need

/43/.

DATA

tially

a

models

not

data

come

only

manlpule--

back

to

the

McGee

{at

model

in

investigated /122/0

least

in

creates

a

A

model

on

nsuperimpositlon

results

been

a

new

theory"

thls

Rs

direction

by

Different

the

world

mapping By

a

it

was

are

of

prob-

problem9 stated

reported

by

/82/®

3.

onset

the

section

coexist

on

how the

Codd

EeP.

Frasson

in

/134/y to

~ even

namely

which

the in

other.

be

but

equivalence

likely

researches)

the

must

model next

eonve~%

in

becomes

in

Moreoverv

to

models

the

different

equivalence.

question

lem7

way

data

the

encoded

versa.

schema

processing

question

tion

vice

dlfferen%

"natural" a

and

equivalent

two

that

information

encoded and

CaM

o.f Data....__Mode~s

MANIPULATION

a

"low case

Versus

in

High

fig.

is

program

records in

LANGUAGES

are

of

data or

oe

access

are

LoLic

accessed

interactlvely

typically

programming level"

Level

retrieved

language.

as is

"one the

in at

one

This

a

by

type

record

at

higher

level

a

external

form

terminal, one of

time

and

In

logic"°

the

processed

p~ocessing

"multiple

either

is

first

sequenreferred

Typical records

vla

for at

a

the tlme

59

logic". level

Research logic.

program in

allocation

subset

needs

by

a the

Even

modest

may

very

of

access compared well

plication

be

its

%he

use

to

and

even

Is

still

Of

the

thelm

more

type common

specify to

sub space

in

a

to

todays

level

towards systems

their

processing

Pesul±s

through user

a

viewv

high

oriented implemented

in

oper-

resource

external

systems~

of

tO

and

the

higher

application

going

the

specified

available

for

is

prim~rily

though

the

scheduling

and

be ape

the

required

program

for selection

has

in

is

conceptual

commerclally

it

the

towards

also

logic"

data

the

relevance

as

that

tlme

projects

data to

of

programs

%he

ef~ect~

research to

a

oriented

informatlon

between

nature though

interactive are

In

realize

at

of

this

mappln~

primarily

to

records

purposes, Is

are

important

which

system

program

logic,

is

"multiple

on

The

which

It

case

advance

ate.

activities

ap--

Installa-

tions,

Subsequently searchers then

we

some wlll

is

In

which

they

of

data

models.

Some

Table

I lists CRMt

data

manipulation

referenced. wlth

are

We

languages

start

based

%o l a n g u a g e s

used.

Finally

lansuages

wlth on

which

are

will

come

we

some

of for

the

IS/I

IBM MIT

experimental

some

location

MacAims

data

models,

characterized back

to

it

would

.........

be

systemst more

which

correct

to

remark

reference

algebra

Todd

algebra

Goldstein

UK

RDMS

MIT/MULTICS

algebra

Steue~t

MORIS

Mllano

calculus

Bracchi

SQUARE

IBM

Research

mapping

Boyce

SEQUEL

IBM

Research

mapping

Chamberlin

INGLES

Berkeley

calculus

Held

ZETA

Toronto

definitional

Mylopoulos

DAMAS

MIT

calculus

Rothnie

Table

I.

Some

by

re-

CRM I m p l e m e n t a t l o n s ~

%he

other

developed

the

A by

special the

way

equivalence

Implementations

though

System

the

devoted

CRM

3.2.

ment

be

continue

subsection

of

relational

systems

claim

to

imple-

claim

that

they

60

implement four

homogeneous

represent

concept

of

tlenal XRM~

a

data

and

files

graph

SEQUEL

is

for

system)

a

snduser ing

derived

and

dy

an

Te

give

us

consider

is

an

is

IS/l)

This

tion

In

{P

~

%he

query

the a

data

the

relations

which

on

lan@uage stands CUPID

top its

ef

as

currently

tool

berela--

%o

low

a

mesembles

INGRES

is

ef

RAM,

supporting

definitional

by

top

homoge-

ePiented

management

used

and

the

rela~

of

query

keywomds.

system

data

on

on

top

The

first

between

better)

en

graphics

language

of

let

level

implementer

the

primto

stu-

access.

different

styles

of

query

langua~es~

let

query:

name

of

the

algebra

{S;

C2

is

a

=

~M"

advisor

of

approach)

));

sequence

{operator

=

)%s).

calcuius

Ci

=

%o

query

OF

PRO~

IS

P

RANGE

OF

STUD

IS

S

INTO

R(PROF.PN)

RETRIEVE

=

a

the

we

student~

whose

name

C5)

%

obtain:

C2

selection

v*') I a

refers

oriented

Cl

of

RANGE

=

second

the

selection

value

language

WHERE

(operator and

in

the

to

INGRES)

PROF.P~

=

=

iIth

';'), a

a

projec-

domain.

we

STUD.P~

obtain:

AND

STUD.SN

~M ~

Here

The

answer

P~OF

and

STUD

existential

a

aspect

the

product

{operator

QUEL)

is

has

specifically

rela%ion~tl

expression

cartesian

ZETA

or)

syntax.

Engllsh

and

system

of

binary

based

In

implemented

implemented

directed"

level

is

compact

more

language

%he

"M"P

the

(

It

following

is

a

a

shown)

somewhere

relations

is

stered

QUEL

~WsynTax

impression the

n-ary

has

119/.

high

a

optimization

What

In

a

hls

DAMAS

I%

with

Toronto,

provides

implement

i%Ives.

it

/S3)

~%

calculus,

turn

systems

is is

supports

offers

nine

SQUARE

in

SQUARE

frem

which

interfaces

tions

XRM

the

which

supporting

/110/.

developed

user

relational

which

model

Of

approach

/111/.

management)

the

file.

experlments, an

management

flat

d~ta

early

vlmapping"v

algebra

neous

flat

is are

in

the

variables

quantifications

result in are

relation the

~)

predicate

applied

by

a

unary

calculus defaulT.

relatien. sense

Clearly

ever

which

61

In

SEQUELt

All

of

the

"mappln~"

FROM

P

WHERE

P,P~

IN

SELECT

P#

FROM

S

WHERE

S,SN

=

nine

systems

tion

research

data

solution

of

pointed

out

for

the

three

ing

research

Some

above~

may

model.

In a

there

First

data

effort

this

graph

is

most

DIAM

that

their

already

ZETA

first

genera-

contribution

to

significant

as

development,

we

know

system

to

manipulation called

DIAM

that

At

least

such

SEQUEL

is

on@o-called

model

oriented

entities.

formulated

P{PN}

FERAL

the

where

recently on

model

/82~.

graph

system~

medel~

A very

I~IS a n d describes

which

interesting a

query

nition

This

generated

SN

allo~ on

hierarchical

DIAM

as

languages

of

work

query of

a

the

their which

Language}

continues

language graph

{or

in

binary of

preceding

as an

/157-IS9/.

composition

the

=

rela--

relations

subsection

can

oP

very

and to

QUEL

a

data

In

query

data

to

another form

Nice~

the

system

given

computer~ can

DBTG

then

in to

be

a

and

students

one

comparison.

Implements

the

with

similar

language

to

model

/123/.

is

definition

data

least

languase

developed

map

at

manipulation

similar

a

professors need

developed

offers a

VM1 ;

between

IS/I

research

language

language

possibly

is

some

CRM

Independent

its

for

query

connection

descrlbed of

language

PS

where

top

usin~

follows:

identifier

Mcgee

The

as

property

example

for

the

/72/.

FERAL

query

as

discuss

(Representation

interestin~

FERAL

establishes

single

with

The

in

RIL

will

are

model,

languase II

activities

we

data

with

tional}

form,

called

follow--on and

follow-on

subsection

between

fers

be

though

by

research

oriented

has

es

might

means

INGREST The

;

Systems

FERAL

a

This

Increased

planned.

Senko's

be

be

what

problems~

SEQUEL~

mentloned~

usin G

its

systems. base

systems

Non-CRM

already

are

base data

is

represent

'M';

R,

3.3.

data

obtain:

PN

the

As

we

SELECT

the

System

approach~

a

SIMS

/194/

language.

The

their

internal

hierarchical

accessed

by

the

with A

graph

advantagconceptual

which data

ofdefi-

form

and

conceptual query

lan--

62

gua~e

without

actually

tures

ob~ee%ives~

which

though

SIMS

report

generation,

reports

is

computer level

0.4.

User

%his

A

a

by

design

SIMS

most

of

p~oblem Dana

data~

meets

other

wlth

these

experimental

fea-

systems~

implementations.

i~you%s

which

and

specifically

to

Presser

of

with

report

designed

computer

solve

for

about

this

generated

the an

task

help

of

a

interestln~

/46/.

Aspects

we

apply

missed

earlier

the

natural.

interface

of

Into

access

will

discuss

specific

some

data

technique

manlpulation

with

have

as

CRM

efforts

has

~eneral

purpose

p~og~ammlng

a

powerful

their

75/.

build

an

question, of

%rac% the

of

might

groups.

respect

is

system

management imental

languages

%o

the

whose

interface

to

Further

approach

The

feasibillty

language"

is

Thompson,

found

of

the

subject

in

at

a

as

Is

and

systems

offered fo

for

traditional

best

by at

way

a

It

Kraegeloh

in

natural

report

~nd

lan--

some

is

ZETA

user

R~ND~Z-a

as

natural its

about

±he

data

exper-

/184,

Implemented language

at-

called

TORUS

uses

%o

believe

least

system~

already

of

proposes system

the

the

/42/.

natural

base

reseamchers

language

are

protection inclusion

/149/.

whether

Some

data

efforts

Schauer

data

APL

is

Toronto.

147,

lin~uis-

/156/.

~tcommunicatinn

with

TO r a t h e r

sceptical

data

manipulation

languages

many

applicable~

since

"universe

of

%he

in of

implemented

which %o

be

top

/59/.

attractive

Petrlck

systems,

the

syntax

natural

being

study

language

more

developed

references can

a

and for

lan-

the

these

develop

language

query

combine of

proposal

computer.

in

to Two

the

%o

open~

formal the

freedom

such

being

/131/.

a

embed

a

language on

s%lll

to

to

language

llke

data

computer

currently

languase

tLc

a %he

proposes

which

language

%ha%

~oal

query

currently

target

facilities.

ALGOL

C~M

groups

make

Codd

an

rigorously

end--user

possibility

guage~

I02/.

is

defining

Its

describes

measumement

which

all

VOUS~

into

Interactlve

evaluation

way

research

Earley

structures

as

computational

specific

/44~

data

the

research

with

mechanisms

of

i~e=

the

use~®

guage

A

the

non--trivial

section

series

to

a~e

of

langua@e

deslgners the

a

one

seems

hl~h

In

Is

converting

of

the

computer

considerations. these

discourse"

considerations is

essentially

in In

natural the are

case not

restrict-

63

ed

to

the

simply

A

objects

completely

IS

to

119,

graphical

form.

Into

spaces

free

formulated

easily used

a

menue

extended method.

and has

It

by

Is

to

wlth

geographic

can

point

user

obtain

The

questlon~

cessful Tigations chology of

the

(or

slight

3.5,

As

pointed

ferent

out

data

languages. one

(CRM)

guages

are

attribute name

the

form

the

of

McDonald

display

device,

CRM

by

is

a

a

can

be

McDonald's

query°

such

the

help

Sehauerts of

ZloofWs is

system

asso--

in

displayed

approach

abstract

which map

while

/143,

that

questions

the

contents)

to

questlons

skill

more

to

suc-

Inves-

experimental

question

indicate

Is

reasoning.

Of

methods

posed

opposed

users

In

(with

information

within

the

to

The

some

"examples"

modiflcatlon 19

in

entities.

the

as

of

and

which

queries

error.

oriented

employ

fills

plctuPe

device

answered

to

user

of

/2S/

relations

Simple

and

graphic

unskilled the

Zloof~

llke

subareas

seems

the

are

easily this

CRM

very names

followed

be

which

and

illustrated

{or

"can

schemata.

To

a

of

psy-

183/. of

One

syntax

are

semantics

o~

a

are

/143/.

Equivalence

can

for

in

described

a

the

semantics

user-lnterface

answers

earlier~

corresponding

diagram

GADS

to

for

models

equivalence

and

by

stored

probability

display

or

independent

Model

a

the

extension

locations

of

taken

Example

the an

use

experiments

significance

pate

is

way,

generally

slgniflcan%

flew

unbiased

reported

more

low

cannot

under

find

a a

one

another

are to

wlth

related

wether

than

base

description.

locations.

%o

information

By

draw

to

a

data

requires

of

Query

expresses

natural

is

relation

example

clated

the

method

descr~ptlon

the

) which

query

Their

ZloofWs of

CUPID,

the

the

In

in

approach

149/o

display

stored

dictionary,

different

/198,

used

verbs

data

structured

Schauer

of

and

by

similar and a

we

the to

introduce

SEQUEL.

followed

In A

will

the

for CaM

we

variable by

an

the

briefly

two graph

deal is

that

dif-

respect

indicate of query

with

name.

query

languages, Both

relation by

to that

the

model,

denoted

attribute

know with

equivalence

informally

(GRAPH)

variables.

period

to

we

equivalent

we

extended

other

examplesv

made")

Subsequently be

end

and

by

be

a

lan-

names, relation

64

Example

S

eelation

SN

SoSN

In

GRAPH

we

~elatlons), a

period

name

~ttribute variable

deal

with

A

denotation

set

followed

by

relation

name

a

~elatlon

denotation.

with

obvious

the

the

name

or

sets

(unary is

relation

a

relation A

meaning

set

name

that

A

followed

denotation

the

and or

denotation.

name

set

relations)

a

variable

name

relation

by

may

relations

set

a

also

be

'W~unsU

is

a

followed

by

a

used over

by

denotation

period

a

(binary

followed

as all

a

varlaDle

elements

of

set.

Example S~

S.SC~

PS~

It

should

while

The

CRM

be

noted

is

bound

period

ls

composition

A

query

in

~rom

is

of

the

CR}{

names~ with

In

and fhe

GRAPH

set

is the

both

arise.

to

to

in

the

definition

of

sets

levels,

FROM

llst

a

The

recursive

used

as

the

operator

for

functional

right.

of

predicate

is and help

languages

variables)

form:

list[

~he

is

languages

listl

a

(or

relations

GRAPH

two

both

in

with

ambiguity.

that

~elatlons

denotations

built

In

llstl

sets

PSoSC.CN

left

SELECT

In

S. S C , C N

FSoSC~

llst2

WHERE

attribute is

over

predicate:

names~

variables

list2

is

a

which

can

list be

of

relation

built

startin@

lls%2®

list the of

the

of

relation

predicate relations

use

subsequent

of

is

denotationst over

starting

subscripts

examples

are

set wlth

may such

list2

is

denotations the

sets

a

list

of

which

can

he

in

llst20

be

necessary

to

that

ambiguity

does

avold not

65

Query

1 of

Name

the

who

professort

advises

student

M,

CRM:

SELECT

PN

FROM

P,S

SELECT

PN

FROM

P

WHERE

P.P~

=

S.P~

and

S.SN

=

IMt;

GRAPH

Thls

simple

the

two

query

data

lationshlp of

may

used

be

GRAPH

illustrates

models.

between

pertles

these

will

Query

2

C~M

uses

Names

are

of

the

while

in

graph

the

CRM

apparent

courses

essential

wlth

the

by

attended

do

a

between

logical

re-

unique

por--

of

these

make

we

some

help

model

as

in

difference

%hat

%he

to

has

composition

more

VMt

requires

encoded

Therefore,

functional even

=

normalization

entities

become

P.PS.SN

already

entities

directly.

simply

This

WHERE

relationships

comparison natural

in

where

langua@e.

query°

next

students

which

are

advised

by

vBm°

CRM:

SELECT and

CN S.S@

FROM =

P~

$7

SCoS~

C~

and

SC

WHERE

SC.C~

=

P.PN

=

'B t and

P.P~

=

S.P#

CRM

form

C.C~;

GRAPH:

SELECT

The

brevity

should~ the

and

graph

model with

between can

then

ies

in

a

implement user

a

be

over

CRM.

macro in

accept

GRAPH

simple the

P

terms

their and

on

advantages

Is

top of

of the

as

these

a

%0

the

superiority extend

definitions

encodings.

algorlthm.

to

essential

possible

accepts

CRM

WBt ;

compared

an

convert

forward

language

it

which

=

form

conclude

fact,

queries

the

P.PN

GRAPH

to

In

of

WHERE

the

used

processor

straight

GRAPH all

of

not

entities

has

FROM

elegance

however~

language

the

PS.SC.CN

Thls queries

With

other

macro

into

model.

the

The

CRM

relations processor

CRM

words,

CRM i m p l e m e n t a t i o n

graph

of

querwe

such

can that

differences

66

of

the

is

primarily

away

languages

with

the

and

of the

other

are

of

of

sections

underlying

syntactical

help

implementation. quent

a

thei~

nature

syntax

little

other

deserve

and

they

Issues

of

relevance

questions a~e

appear

since

macros.

practical

Many

models

in

like

the

on can

one

level

be

data

glven those

process

a

the

transformed model

versus

right

sort

discussed

of

which

in

receivlng

of

subse-

more

atten-

tion.

4o

SYSTEM

4 , i,

Introduction

The

major

PROBLEMS

peoblems

concurrent

access

gram

management

with

Iocklng

and

last7

enough

in to

a data

and and

hut

data

shared

not

leasfT times

system

by

schedu!Ing~

like

many

with error

with

high

enough

the

whole

to

make

IMS

users~

system

or

recovery

response

base

are

with

data

with

pro-

Integrity~

independence

data

transaction system

with

application

enforced

isolation~

connected

rates

short

and

attractive

for

the

user.

The

implementation

may

turn

natural full

out that

data

in

wlth

fact

outside

the

area

nection

to

provide

does

and

high

reference

of

data

independence

attached

at

a

purposes~ is

Among

and to

of

the was

all

least

stora@e

though of

the

systems at

supported

SEQUEL~

nearly

therefore

portion

conception

activity in

It

larse

functions,

independence

solutions

so

systems

models

to

data

experimental

manpower,

original

follow--on

will

with

Its

for

and aim

system in

%o

the

we

references the

ce~pect

even time

the

be[n~ problem

above.

base

data

sections

in

projects

DIAM

eesearchers

that

data

system~

management 3~

R7

plans

a

costly

mesearch

base

mentioned

tional

few

System

expePimental~

The

such

quite

section

ambitious

structures.

areas

of be

only

of

set

mentioned very

to

multi--user research bibliography,

a

far

have

not

not

mean

that

level

and

systems. in

query

conslderable

the

area

data In

developed they

languages. number

integrlty

addition~ Of

full

have

security

of

In

and

subsequent

recovery mender

opera--

problems

relevant

and the

size

ignored

pafers in will

authorization

in confind in

67

4.2.

A

data

it

/175[

Data,,,Independence

base

allows

without

system

transformations

also

dence

{a

affecting

correctly

is

in

model

it

for

is

with

widely

organization

and

to

changes

form

fact

/182/.

tha%

The of

its

links

or

inverted

5).

Every

such

direct

mix

of

application

there

will

be

a

The

need

new

types

schema.

of In

affected~

ers

for

base.

or

example~

There

since may

data

the

old

may

main.

Certalnlys

ments

into

many

are

be

be

a

designed

a

many

least

s may

suchs

or

indepento

Is

data

not

the data

of

a

affecl

they

rely

%o

which

the of

the

In

on

CRM

of

which

conceptual a

binary

programs Consid-

only new

data

read

and

new

model~ relation

programsy

if

many

one

to

the

insert

the

data~

model.

Informationv

otherwise

of

conceptual

conceptual

the

update

programs

old

a

tlme~

additions

the

unaffected.

which

the

the

ad-

for

wlth

to In

domains

only

see stoP-

base

application

a

of read

In

data

due

changed

since

structures;

changes

remain

{projection}

(for

organization.

programs~

programs

some

mlx

%he data

depends

organization

existing may

internal

program

changes

domain

alter

constraint

even

the

data~ of

implemented

The

arises

of

the

redundancy

internal the

of

best

storage

data

The

application

consequence

activities,

some

to

form

a

means

the

part

of

directly

other

generallyv

changes

one

that

data

application

ape

Since

aftected~

Other

an

independence

between

are

¢o

that

independence,

absolutely

more

subview fop

Is

internal

large

need

dependency

entered.

constPalnt

a

no Is

at

the

least

different

relatlons

the

files

addition

he

model

be

there

cannot

the

should

already

relaxing

at

stlll

no

paths

programs.

conceptual

while

internal

optimize

Informations

generals

are

the

update

to

adapt

for

independence

implementation

to

need

data

i.eo

of

access

additional

attempt

given

data

conceptual

conceptual

of

performance

via

will

of

sense

respect

independence

is

section

requires

T

wlth

certain

there

example

mlnls±rator

~ence

a

which

transformatlonsT

Of

invariantT

on

i.e,

many

the

data

stays

heavily

In~s

how

that

forms

in %he

in

claimed,

internal

respect

before

programs

selection

extent

conceptt~l

transformations. of

the

programs

correctly

clear

the

%o

or

existing

Independencev

conceputal

recognized

run

the

of

data

internal

of}

makes

between

the

the

independence

independence

sometimes

internal

progeams while

the

This

We d i s t i n g u i s h need

after

consequence

as

of

which

effect

transformations. automatic

data

maximum

programs~

non-affected run

supports

This since

new new

do-ele-

Information for to

these

examplev a

many

%o

programs

constraint.

68

Support

of

capable

of

is

conceptual de±ermlng

affected

which

is

data

or not

trict

the

solvable.

data

each

This

solvable

mains

extensively

for

not.

independence

in

mappln~ Thls

applied

of

its

involves general,

languages requires

requires application

a

very

[%

is

such

a

programst decision

fherefome

necessary

that

appear~

the

complex

of

type

/exceptions

that

the

decision

theory~ in

Is

whether

I%

problemT #o

res-

pmoblem

which

other

system

has

re--

not

eontexts~

been

53

in

and

65/.

Support

of

Internal

1.

data

A data

the

definition

Internal

schema. al

any

nal

schema~

process

cess

to

all

of

in

section

following

be

there exploits

inversions

A more

of

the

lor~

and

such

languages

with

a a

called

a

given

by

the

conceptu-

supported

system paths

the

for

reduetlon

.

the

and

program,

external This

words$

in

ac-

optimizbut

what

results

capainter-

needs~

independent

other

which

be the

the

the

without

data

In

in

of

of

system

must

program's

execution

"logically"

purpose

Is

prac-neces-

pemformance

user.

query

data

a

user

(in

a

his

he

Inverslens

or

offered on

is

not.

the

Implementations to

a

"data

may

query

advantages burden

language

independence

relation

However~

any

the

n

A

independence

new

limited

base

specify

During

user

except

attributes

independently

execution

inversions

in

administration which

formulated

by

des-

de~ree

of

and

a

query

maintains

unavoidable

of The the

stomate

/175/.

comprehensive

development

no

When

exist

overhead

specifies

allowed

access

during

access,

be

support

a

wi±hou%

time

"optimally

paths

expe~ImenTal 3

inverted.

system

may

the

the

meet

been

data

way:

whether

and

a

introduces

should

to

mappings

program

predeflned

internal

se~ves

for

cribed the

has

the

is

Almost

which

form

independence

different

the

access

reduction

galns

data Of

following:

language~

conceptual

application

which

these

This

sary

of set

~ecognlzing

exploit

role")

given

of

tically

mapping

every

the

schema,

ble

ing

the

~equires

and

to

degree

is

conceptual

To

form

The

schema

2,

independence

approach

%o d a t a

flexible

data

slightly

modified

for

a

data

model

and

mapping

motivation~ very

starts

independence

definition

close

Smith %o

the

with

language. have

DBTG

the Tay--

developed model

/1797

69

169/.

As

enables

it

whole of

pointed to

I194/o

a

out

operate The

by

and

the

form

programs

operating

The

evaluation data

probably of

is

Data

processed

cesslng

in

practical mappin~

anothe~

Ramlrez

in

et

al.

from

tioned

less

have

data

data

of

tion

has

tions

language nition

fop

Sraph~

a

whichT In iS

for a of

Desautels

oriented

towards

the

a

describe this

of

to

a

been

area

created pro-

combines

the

small

has

right,

allows

projects

of

full

the

own

which

remaln

a fact

in

its

have

these

in

power

of

enough

to

This

be

plan

to

translatlon~

approach IS3/.

such

to

as and

data

is the Lam~

data

ori-

transla-a

negative descrlp--

developed

for

hlerarchleal

of

•

the

pro-

DBTG

data

have

implement

model

at

continuing

with

structure) between

is

men-

Smith

runnning

as

Shu

P,

during

/133/

and

D.

pro--

the

are

data

grammars Lum

their

Su

of

Merten

hierarchical

importance.

work

projects

form

of

activity

currently

both

(mappins

by

major

the

contextfree

conversion use

developed

126/.

and

and

makes

another

~ouseIT

of

/177~

generates work

in

computers~

an use

which

which

Fry)

Again

purpose

particular

illos-

in

This

in

evaluation

language

definition

of

dataT

internal

used

to

a

application

languages

organizatlon

This

Navathe

/95 v 165/o

network

such

justification

models

CONVERT

translation

with

into

/i08/.

{mapping

is

the

task,

built

as

by

have

all

language

still

used

Merten~

level

language

prototype

rarchles. version

Heller

record

DEFINE

a

is

investigated

and

the

and

tures} in

been

mapping

CRY

means

convert

complex

data

/142/.

data

of

as

by

~roup,

being

underlying

usefulness

Liu at

/

functions

The

resul±,

and

Michigan

increased

The

and

whole to

mapplns

compiler~

language

a

very

a

projects

a

as

which

orientation

large

built

definition

University

totypes.

a

data

which

da~a

data

and

experimental

the

these

access

rewriting

of a

The

descriptions

Taylorls

ented,

than

a

also

%o

wlth while

Similarly~

with

system.

lan~ua~est

is

conversion

systemT

orientatlon

conducted

grams

%he

one

language,

/166/.

experiments has

to

the

definition

to

a

conve~ting

possibility

than

system

T which

such

collectlons~

date

data

is in

data

these

impetus

translatlon

translation

and

es

a

also

without

the

expensive

on

has

converting

of

management

given

data

data

of

without

mope

of

base

SIMS

given

existence

standard

slze

on

importance

description

trated

earlier,

these is

data

a

defl-struc-

languagthat

o~

a

decomposed

Into

ble--

ARPA

net~

data

con--

and

also

Schneider

translation

specifically

70

4.3.

Data

Though

Intefl=Xlt[_and

the

recovery

pcoblems~

are

increased

in

multl--user adequate system

by

solution,

The

notion

assertions

example~

state

require

that

different the which

a

person

a

A

allows

a

enforced

A straightforward

I~

be

data

the

a

the

a

own

supports

specify

wlth

the

as

stay

a

then

data

or~ the

during

rules.

complex

mope

its

Such

that

integrity

{for

mannummay

rule the

budget

sum

of

allocated

to

rulesT

of of

complete

A

consistency

the for

notion

invariant

person~

exceed

be

collection

sense

ancestorv

not

may

system

consistency

a

without

responslbility

certain

known},

may

fact~

multi-user

viewed

which

about

are Its

base %o

the

which

extent

are

sub--

system.

approach

Provide

2.

specifying

to

the or

the

who%her

This

approach~

proposed

/66/

has

considered

%o b e

language

undeoidable

tent.

Second?

it

base

checked~

is

has

modAfica%ions

transformed plex baser

into

cons[s±ency which lar~er

to a

the

for

such

rules

could

consistent

rules

may

can

range

from

data

bases

in

has

consls%

of

been

user

base

/1~78/

and

Firs%~

state.

hours processing

defined

for

a

small tame.

in

when

in

general

consistent

Third~ access

it

are

must

before

require

modlfied~ hold,

assertions

checking to

a

data

llke

predicate

assertions.

still

in

carefully a

language

specify

to

caution.

the

be

general

base

example

since data

a

assertions

with

to

a

with

language

data

whether

a

data

user query

a

Whenever

checks~

for

known

In

called

in

In

and

enormously

following.

calculus

of

is

department

user

by

also is

birthdate

cannot

in

department.

sequently

the

and

expenses

it

are

are

responsibility

of

be

integrity

they

connected

may

contents

Information

something

address

some

closely

base

assertions that

whenever

name~

is

data

users. used

problems, over

to

systems~

under

schema

Systems

respect

concurrent

take

A

User

user

means

these to

data

wlth

many

schema.

the

These

may

with

integrity

the

abou%

processing0

to

data

and

MuI~I

single

traditional with

in

exist in

necessity

of

consistency

in

system

dealing

for

the

ber~

a

whlch

present

suppor%~

has

Pules

also

Recovery

large bases

the

system

recently is

also

for

a

general

themselves the

consls-

consistency

perform

a

state of

a

portion to

in

Is set of

several

of

number a~aln

Of

com-

a

data weeks

71

The

first

problem

consistency tency

rules

of

the

certainly

The tlon

of

the

to

checking

The

of

end

of

can

take

third

of

a

the

to

for

the

be

enforced

llke

the

assurance

of

analysis

of

the

an

analysis

The

problem

than do

one

not

end

isms

The

of

The

interfere system to

for

a

situation

have

wlth

part

is

the

without

be

is

at

increased

has~

each

the

and

in

other a

data

access

analyzed

during

is

cycle

free.

execution

tlme

in

a

technical

of

purpose ef~i-

situations

time

needed

for

necessary

compiled

query

addi-

and

most

time

A

to

approach

modificatlon~

as

comparative courser

such

level.

concurrent to

access

ensure

by

that

are

a

time. well

users

To

user

this

excIustve

Basic Known

more

the

operations. gives

limited

or

same

easily

Of

user

in

blrthdate

the

the

update

a

the

the

that

participatln~

contains

constraints.

which

fop

an

is

allows

with

with

father

It

addltlon~

to

a

by

variation

state

In

since

source

locking

of

in

the

facility base

names

help

a

costly

may

change.

Of

Given

Illustrated

that

assurance.

sesup-

Be~in--

how

The

objects

if

a

consistency

one.

be.

serves

sensey

constraints may

first

however,

function

to

exclusive

under

linear

rule sony

base only

state. Now

rule

of

canv

integrity

provide

of

granting

systems

a

he

with

the

data

is are

information

Pule

makes

with

system

must

IMS~

introduc-

which

determlngT

that

the

the

transaction

will

number

of

/176/~

integrity

user.

the

access

ating

not

the stored

This

by

queries

does

Ls

integrity

Stonebrecker

consis-

llke

user~

the of

may

labeled

hirthdate~

without

by

the

the

complete,

consistency

algorithm

the

every

bound

is

¢o

rule

to

consistent

capable

a

n

ruleo

proposed

which

that

control.

transformations

birthdate

for

processing

the

he

an

integrity" is

connected

If

the

a

user

way

edges

where

A one

transaction

consistency

only

person

enforcement

perform

so

in

systemsy

lead

by

into

under a

has

661.

base

must

requires

vePfled

/65,

relationy

cycle

"system

enough

and

state

base

a

n*#3,

previous

clently

the

some

father

precedes

language

Practical

data

are

system

relation.

every

the

whenever

data

rule

father

father

as

transaction of

containing

this

proportional

tion

a

in

the

the

slm~le

recognized

consistent

checking

subgraph

Checking

been

place

during

is

if

decldable.

transaction

is

Given

the

a

a

ruler

costs

example:

in

of

problem

checking

the

has

notion

only~

criterion.

transformations

consistency Its

solved

expressed~

this

transform

and

be

remains

problem

of

posed nlng

are

rules

satisfy

second

quence

c~n

from

mechanoper-

semaphores.

report

by

Eswaran

et

al

?2

/65/. in

A

/65/~

being

complication

of

locking

is

to

lock

the

created

potentially

/30/. be

finlte).

locks

such

the

formulation

Locking

has

systems

there

of

such

%hat

i.e.

taking

the

other

back

to

the

the

by

The

of

preempted did

record

internal

/83/.

This

method

is

noted

that

these

files

and

state

discussed are

hold

by

a

then

to

pre-

preemptions resources

be

is

l.e.

possi-

data

its

/29/.

to

positioned

This

during

systems

The

With

is

files~

al

operating

transactions

the

resources.

e%

most

in

deadlocks.

user's

has

process

in

on

/6S/.

As

give

Chamberlin

required

restrictions

Solution

checkpoint

of

for

with

the

infinite

decided

system

%o

the

an

be

may

always

is

preclaimingo

second process

process

not

journals

which

one

Is

schedule

The

from

the

deal

/67/

can

can

from

which

with

it

deadlocks.

to

explained

exis%,

objects

imposes

by

of

yet

objects

predicates

This

ways

of

that

also

not

created

handled

Everest

system

it

by

danger

appears.

which

help the

be

systemss may

number of

dictate

two

resources

in

set

overlap.

to

example

deadlock

state

with

ble

for

process. a

they

essentially

resources no

infinite the

the

base

which

described

consequence

away

an

predicates

are

proposed

claiming

be

data

requirements whether

of

as

are

may

predlcates

in

ob~ectss

(though

Performance

two

fi~st~

There

created

Such

extension.

need

sets

execution

It

should

for

recovery

a

transaction

be pur-

pose.

Recovery

is

terminate

necessary

normally.

error

in

check

failure~

the

transaction livered

isolation data

be the

failures

second a

If

posslbleo

much

such

solutions placing

expected beginnin~

no

Bjork

of

as

this

MULTICS

to an

feom

the

and

work

is

be

a

failureT

first an

to

deadlockt

error

the

all

Thls

described Sayanl

deIs

propagate

in

which

that

is

been

by

base) recovery

such

162/ a n d

to

a

by

they large

Genton

/148/.

recovery

a

of

transactions

cause

laid

logical

consistency

data

not

does

to

failure

Of

the

have

a

a

the

appeared.

Edelberg been

or

(via

being

had

or

objective

restart

without

has

in

The

operating

/81/.

integrity

and burden

application the

that

for

exception

indirectly The

algorithms

unnecessary

an

or

failure

/iS/s

may

hardware

such

failure

Recovery

/50/,

thls

objective

by

for

avoid

directly

impossible

zerodivide

transaction.

as

basis

All

v a

affected

Davies

is

for

example)

execution

/83/,

systems

a

A

cause (a

the

of

i%

program

has

tO

base.

been

extent

for

which

the

continue

The

userWs

input

the have

whenever

end

of

recovery on

the

programmer trsnsactlonso

problems user.

is

The to

The

must most

inform

of

course

that

should

the

interactive

system

of

problem

73

solver as

should

far

were

as

the

As

with

at

the

for

not

the

only

data

be

required

query

language

user

to b e

improvement

(a)

reduce

STRUCTURES

from

from

the

the

subsections

5.1.

Strra~e

of

the

search

with

Inverted

~-Trees

answers describe rithmic

searches list

of

indices

In

of

I151.

hlgher

there

of

allow

to

to

These

topics,

two

section

in

are room

partleularv

derives

techniques otherwise

utilize

these

are to

records.

techniques the

cases

the

Its

which and

structures.

~ data

we

(b)

s%ore~e The

next

If

hierarchical known

as

HaerdeP

Index

and

quicker

and

Bentley

supportln~ describes

reduce

the

In-

allow

Finkel

trees

a

the

IB--Treel.

update

which

/112/.

to q u a d

of

file.

retrlevalt

Indexes

llsts t } to

Iogamethods

storage

costs

/90/,

time

Lum

number

is

with have

d l v l s l o n I is

of

methods

hashing.

Hashing

its application recently

shown

in general

addltionalt

organization

These

an and

for

/77/°

acceleration

inverted

obtain

tlme

parameters (Ibit

the

an

McCrelgh%

trees

search

and

is

of

complexity

in connection

by

help

and

binary

two

Ghosh

lhashlng

applied

which

search

reducing

studied

splitting

this

section

required

multl--a%trlbute

of

with

extenslvely

niques

if h e

signlfican%

organize%Ion

time

inverted

compressions

meat

course~

of

certain

method

assumpTionsv

preceding

storage

Hayer

introduced

address

Of

by

extension

/113,

in

solutionsv

programs

wlth

a logarithmic

of

Another

should~

as

performances

the

search

to

repeatedly

queries

an

of

employed

parameter

is

Lum

to

He act

ALGORITHMS

in

the

devoted

existing

SEARCH

algorlthms

binding

described

allow

/9/.

AND

frequently

file

to

Structure§

one

organization

able

is ce~talnly

improved

enormous

are

transactions, be

discussed

There

and

discussed

of

without

about

problems

with

existence

existence

tWO

sert

the

sometimes

structures

the

understood, proposed

as

know

is concerned~

system.

functions

Independence

value

One

over

The

STORAGE

Data

the

independencet

beginning

In providin~

5°

of

to

may

to d a t a that

best

essentially

such

as

be

combined

ltnks in

has

been

manase--

under

Their

basic

tech-

/8~/.

between

or

various

modlfI--

The

74

cations°

Storage

and

well

are

120y

13S,

tures

to

I.~°

described

!60/.

The

programs

to

which

structures

offer

the

without

the

responsibility

Attempts

to

solve

Reduction

Reduction

the

problem

the

relationship

tations

are I)o

during

given

is a

by

Reduction

too

of

to

Internal

in

99,

specific

storage

the

next

t00,

118,

the

Struc-

case it

stractures

in

past

structures)

In

organizationl

discussed

the

of

richness

independence.

of

reducing

the

a

of

in

is

the

optimally° subsection.

llke

or

evoke

and

is

loaded

the

objectives

an

external

%o

a

to

application

with

program.

problems

form

the

secondary

expectations;

similar

internal represen--

conceptual

Woptlmizatlon"

accesses

unrealistic with

and

forms

an of

to

accesses

internal

these

number

query

not

external

between

somethin~

reduce

~' s h o u l d

complex

/54,

this

programs

the

are

mappings

is

to

execution

"optimizatlon

the

Surveys offer

data

studied

Problem

is

objective

the

utilize

problem

where

(fig.

to

know

access)

mary

and

with

to

this

extensively

remains

binding

not

does

been

textbooks

structures

program

The

in

problem

system)s

~.2.

have

as

pri-

storage The

the

term

problem

optimization

In

compiler.

Variations

in

handlln@

Of

intermediate

example,

the

expression

opl

(A

where

A 9 By

C)

the

relational

two

intermediate

evaluate amounts liary

AB of

of

in

A v B)

data

execution.

this data sets

AB

the

=

A

far

of

{i.e.

opl

opl

indices)

By

and

also

D*

op3

CD

with

the

fOr

Conslder~

are

an the

On of

be

D

and

to

enormous

amount occupied

other

hand)

queries can

oriented

can

op5

storage

some

and

C

addlton

be

of

the

then

auxiby

the

there

are

dorin@

built.

towards

consequently built

if

in

construct

enormous

the

slze

operators

might =

in

inversions

primarily

modest

to

requires

evaluation

is

connected sets.

evaluation

exceed

and

temporary

area

base

and

accesses

C)

are data

D}

algorithm

by

some

over

straightforward

an

may

in

optimization

relations

A

storage

least

a

op3

large

Such

which

a%

research use

(C

relations CD.

improvements

auxiliary query

op2

relations

evaluation

tive

are

secondary

underlyln@

the

D

op2

algebra.

stomagey

drastic

B)

of

expressions

thelr Most

interac-

assumes

temporarily

of

for

that one

75

One

of

~he

due

to

Palermo

queries by

earlles%

no

lus

of

system

is

and

by %he

consists

assume ies.

that

mentary

queries

has

not time

Into As

than

by

• GPeenfeld

implemented

verslon

and

Chamberlln

advantage

of

calculus of

/6/e

a

Their but

inversions,

intermediary

of

seDIAM

lists

attentlont

search

earllert

To

stategles

the

researchers

reduce are

problem

of

under

CPU

tlmet

access

module has

however, also

eleput--

the

assump--

less

dynamic

can per

be

taken

other

in

com-

appllca--

been

for

Taylor

assump-

Thls

which

approach

primarlly~ Conway~

the

system.

required,

or

compiler

Pernandez9

the

quer-

to

/19S/.

perhaps in

module

the

form

They

elementary

according

essentially

standard

valid.

over

organized

bottleneck

always

to W o n g / C h i a n g .

expression be

some

is due

or

reasons

125,

75,

44

180/.

should

respect be

research CPU

6.

Research

be

clear

that

de$crlbed

as

long

are

the In

as

there which

Is

AND

in

area

no

of

are

a not

constraints

of

generaEly respected

deserve

"minimizing"

is v e r y

number

generally

potentially

%o

problem

to make which

Pecognltlon

addition

MODELLING

the

reduction has

structureT

Questionst

time

the

above

to s y s t e m

soy

tectuPee

and

84/

in

construction

can

much

seaTch

IMehl,

/5,

the the

quant~fication

Astrahan

becomes

the

efficient

several

efficiency

algorithm

to

a CPU

Senko

of

achieved in c a l c u -

to

problem

base

into

is n o t not

variables for

and

type

is

applicable

a boolean

reduction

mentioned

proposed

iS

data

interpretive

tlon.

reduction

received

hoverers

piled

It

and

of

This

is

InversiOnSo

expression

CPU

As~

the

the

investigated

once.

principle"

taking

also

problem

growth

Another

by

thls

algorithm

handllng

547/.

into

the

domains

Ghosh of

primarily

the

that

less

/89,

of

the

"least

problem

query

usage

tion

and

to

case

a boollan

tlon

and

of

each

a

is described

merglng

thls

for than

reduction

involves

related

In

tlng

by

that mope

Astrahan,

efficiently

algorithm

claims

restricting

A

at

look

investigations

accessed

applying

algorithm

{indexes}

CPU

and

described

expressions

paper

and

operations.

reduction

their

to b e

indices

Rothnie

problem

Palermo

has

expressions

quence

A

/140/.

tuple

building

comprehensive

more in

secondary

complex.

~very

assumptions valid° data

attention storage stoeage

wlth

This

has

base

archl-

in

future

requirements accesses.

ANALYSIS

of

modeling

and

analysis

has

as

Its

objective

to

76

learn

about

velop

slmple

management changes

existing

probabillstic

system,

in a

management

Such

system

system

primary

have

data

management

of II

with

report

structed Their cesses plex

in

possible system

way

Data

base

itles

though

may

be

Tools.

the A

base

event

model

this

performed

the tool

tools

/91/.

organlzatlonst

which

they

of

/132/, pro-

so

com-

critical.

comprehensive Is

con-

the

are

become

but

Nakamura have

package

processes

direction

to

a compara-

these

slmula±ion,

may

of

lead

simulator

These

systems

follow-on

simulation

performance

in

of base

should

base

and a

using

system

development

step

data

and

driven

system.

simulation

and

storaze

conventional

a

management

of

proposed also

they

of

tool by

mention

A

data

base

by

Rel-

proposed

a

is

they

server

of

complex

analysis

FOREM

in

~22/,

Yao

analytical

to

restrict

the

the

be

IS an

example

storage

in

/196/

modeling

analytically

themselves

of

analytically

IMS for

level,

treatable

therefore

to

well

of a determlnlstlc~

structures.

and

activ-

The in

Wedeklnd

methods

/193/

are

tractable

developed "r~Ther

For

organization of

It

queueing

Lavenberg

general"

example,

Extensions

by

model

and

distribut!onsT not

does

and

total

I/O

the

model

are,

their

expllcltly is

of

Shedler

model

represent

represented however,

by

likely

the

/103/.

a

Is the

sin@le %o

make

necessary,

Perhaps

the

indices

%o a

this

objects

too

studies

system,

of

storage

simulation

clearly

the

also

queue.

been

for

allow

gross

physical

also

deterministic.

component

Though

are

Cardenas

essentially

DL/I

have

Analytical

parts

analytical

at

of

detailed,

base

data

a

administrator

recently

simulation

base

systems

a whole,

defined

To

has

techniques

to e v a l u a t e

base

data

Influence of

colleagues

FOREM

data

help

simulating

hls

to o v e r a l l

fairly

out

system

a data

a

data

called

useful

the

questions

and

tool ~aerder

desIsner

base

a

activities.

Senko

indexlns

of

de-

to

/~4/®

ier

as

about

is

a

that

II a p e

respect

with

model

138/.

current

PHASE

e%

by

analysis

/154~

of

limited

early

an

the

and

the

predict

data

of

analysis

al.

the

behavlour

components to

Thus

modeling

recognized

and

the

research

been

FOREM

deslgno motet

their

help

these

has

PHASE

even

for

may

in

for

tive

models

models

system

and

need

called

or

analyzing

interest

The

development

by

systems

problem

most

frequently

flat under

file,

investigated AuThors~

varying

who

assumptions

question have are

Is

the

contributed Lum

and

selection %o

Ling

research

/114/,

of on

Palermo

77

/139/, Yue

Stonebraker and

tigated

Data

Wong

/197/

the

question

may

be

tempt

to

have

position

in

data

the

Chen

and

have

given

]l16y

Lum

21].

an

and

Chen's

al.

model

Into

response

The

second

%Istlcs~ Easton

finds

the

takes

in

sets

has and

heuristic

been

a

60/.

approaches

are

to

queuelng

arm

of

given

Is and

no

and

of

the

to

an

at

target

and

by

data

and

Buzen

the

hierarchy

to

minimize

distri-

usage

recently

The

sets

allocation

their

for

storage

suitably

algorithmic

bounds

a

cost.

ls

given Wong

best

some

Buzen

usage

effects

drives

Chandra

by There

ac-

as

minimal

%o

Their

ARPA

within

a

have

the

data

contention

disk

well

at-

improve

(like to

function

an

given

network

allocation

disk

data

network

the

constralnTs.

T

in

a

as

inves-

categories:

the

cost

storage

considered

/31~

Wong

on

total

algorlthmWs

number

second

etal.

Their

minimizing

over

Lum

of

hierarchy

contention

of In

has

/164/o

variety

allocating

addition

given

a

time r

information

which

Shneiderman

storage

case

of

a

problem~

data

a

nodes

costs.

specify

under

over

/15G/~

Schkolnlck

levels

minimize

in

to

problem

consideration.

tlme

buting

line the

/71/.

access

to

%hlrd v

statistical et

and

devices

reduce

algorithm

levels

costs

/23/,

different

within

assigned

considered

hlerarchyT

and

be

at

destributed

allocated

hierarchy, to

Cardenas

Stewart

size

or

to be

and

index

physical

have

cessibility

Farley

between

to

t98/,

Kins

of

to

balance

assigned

net}~

and

allocated

first 7 data

be

/174/,

sta-

also

by

solution,

but

optimality

ape

derived.

Casey

and

within 32/. al

Chang

a

have

simplified

Chang

has

function.

considered

network

extended

Both

the

of

Casey's

specify

third

computers linear

problem to

cost

algorithms~

of

reduce

allocating

line

costs

data

/26,

functions

to

a

attempt

to

minimize

which

more

27,

generline

costs.

With

the

open: how

analysis

what is

ape

the

Nakamura

etal. of

the

(tO ences

and

data

can

be

reported

of

a

data

far~

base

their

simulation

model.

Answers

describe over

system}

collected

in

a

to

a

at

least

one

input

data?

system

statistically

raise %o

operational

Hildebrand messages~ base

so

characteristic

observing

userVs

the

In

their

actually

Rodrlguez of

The

workload

validity by

work

such

systems how

the

Oft

and

data

of

the

trace

of

physlcal

systems

other

words~

question

can

only

ranging

disk

of be

the

found

statistics.

appllcation

/145/.

remains

characterlzed~

collecting

trace

operational

with

further

questions

relevant

question

from

program address

Lewis

and

a

log

calls refer-Shedler

78

derive

from

such

tmansactions process

In

a

(i.eo

a

be

the

model

Poisson

flt

mine

the

used

To

%0

a

model

%o by

Ghosh

/86~

model

blocks

on

Ghosh

with

for

also

and

Easton~ to

Tuel

to

sequence

of

behaviour

determake

model

an

deter-

are

also

extension

references

09

of

and

again

has

a

large

data

of

the

cer%aln

and

which

use

storage

the

/I07/.

model

measurements~

secondary

between Polsson

rate)

relationships

proposed

the

with

and

linear

has

times

non-statlonary

dependent

Tuelv

by

Easton

a

established

61/®

comparison

comparison

interarPival by

time

and

system

model.

reference

programs

this

a

theoretlcally

base by

%he

with

data

data

coeffleien%s validate

pllcatio,

a

empirical in

independent

dated

of

the

modeled

process

approach~

parameters

interactions

that

satisfactorily

semi--empirlcal

mine

the

observations

can

ap-

valibase

system.

It

is

clear

valldated met. art

The of

next

that

reasons

data

analysis

a

7.

data

SUMMARY

least

we

AND

two

major

systems

our

opinion

by

integrity

system

has

ventional

to

The only but

goal

in

also

that

have

in

over

to

their

%hat

described continuation

models

been

be

state

summarized

this is

and

convincln@ly

current

research in

research

on

of

the

in

the

modelln@

section

and

has

extremely

be

than

a

a

made

Important

the

base

of

of

part

system

tREes

the

complexity

of

Consldec

past~

user~ of

may

in

this be

that on

adwanta@e

the of

the

at

data

on

integei--

In

a

the con-

independent

due

different

devices.

the

userms

same

(or

cumrent

in

data

system

responsibility. devlce

of

the

base

is

language

goals

sys±ems~ a

structures

requires

that

programming

alone

conventional

storage

storage

a

complexity

the

program

different

system

In of

of

consldered:

systems.

large

userVs

activities

objectives

equivalent

data

the

clear

the will

its

independence.

take

of a

yet

wlth which

as

%he

responsibility

independent

not

view.

operating

system~

Implementation

also

greater

data

the

is

that

of

ape

or

and

remains

connected

systems

representative

has

general~

summarize

far

Implementatlon~

obtain

in

and

factors

base

to

CONCLUSIONS

to

Data

fly

base

poin%

try

are

it

pPogress~

practical

Before

this

research

~owever~

of

slgnlflcant

objective

characterizations

for

base

section.

from

the

workload

program a,otheP)

structures

is

%0

not

device~ during

79

access base

where

admlnis%rator

The

area

ed

restructuring

only

of a

amount

base

years

time;

of

expensive

are

assessment, in

This

researchers

that

question~

of

not

a

should

cussing

of

one

A

of

promising

design

Into are

the

system

Data

With

respect

than

can

Research be

be

base the

model

data

be

have

under

driven

two

interface

branch

was

against

the data

[mpor--

are

but

%he

researchv

level

that

d{s-It

mope

is the

other.

will

for

of

lan-

now

attitude:

and

by

research

programming

of

started

Imple-

reduced

around

mode~s

top

requires

certainly

changed

on

man--machine

a

is

lar@e

contlnuev

the

interac-

Investlgatlons prototype

efforts

way.

by

%0

storage

data

description

and

increased

power

wlth

stmuctures

intelligently

handled

%0

a

repmesented

reached

of

researchers

wlth

been

be

held

which

base

a

large

war"

takenT

between

have

to

amount

questionT

start-

takes

such

"religious

data

another

investigated

put

mope

there by

is

data

emphasis

langua@esv

of

lan~ua@es.

already

base on

mapping

how

the more

available

management

systems.

these

structures

can

utilized,

systems

in

administrator research

into

a is

which

combined

his

solution

sometimes

is

models

a

has

falr is

a

selecting

the

In

has

in

the

activities

now

probably

as

can

of

Before

a

model

However~

be

efficiently

Modeling

data

of

aspects

translatlon~ to

in

activities

problems

fa[lume

It

engaged

evaluate

and

of

that

different

solvem.

justified

continues

is

and

problem

user

A|ajor

viability

rlsk

nature

q~stlon

how

tive

%he

Peal

efforts.

exampleT

which

the

of

whlch

are

supported.

question

number

the

For

they

problem

much

of

new.

the

the

Justifies

similar

be

the

so

t

clarification.

and

is

implementation

models. %ant

research

Understanding

performed

spent

The

control

demonstrating

prior

guage

systems

ago.

prototype

mentations

under

role.

data

few

Is

has a

way in

that

its

already

useful

set

significant

beginning. been of

help

It

conducted

tools

results

will

for

take and

the

for

some

has

system

to

the

timer be

data

before

continued

deslgner

or

ad--

ministrator.

Comparing

first signed

the

a

fop

prlmarily rent

obtained

difference

state

and

employed

designed of

of

art~

for it

results

emphasis, by the is

with

parametric interactive llkely

industry

Systems

that

~ctivities

llke

users

while

problem research

we

Iris a r e research solver. changes

may

prlmar{ly systems With

the

priority

see

de-are cursome-

80

what

in

favour

described ningt

in

of

the

section

productive

parametric 6

is

user.

already

The

now

modeling

primarily

and

analysis

work

oriented

towards

run-

systems.

Conclusions

With what any

the

wealth

are

among

trends

tion?

Major

~iI

are

these

heartedly

research these

existing~

results

recognlzable

Whet

answer

of

with

currently

we

the

to

major

are

becomes

major

respect

the

quesflons~

it

%o

achievements?

a

change

problems?

well

meaningful

aware

of

While

that

Are

research we

the

are

ask: there

direc-

tryin8

reader

may

to

whole--

disagree.

resplts

I.

Model

One

Development

of

the

a~reement shown

primary

on

in

deal

a

~®

at

internRtv

of

problem ~dmlnistrating)

data

b~se

administrator

2.

to

Multiple

Due

to

lem

solver,

been

the

%ures

ture

in

many

to

the

we

user

control his

to

roles

the

has

is

have

programmin@~

of

the

{conceptual~

different

that

and

in

over

his

storage

installation.

Logic to

the

record

power

and

commercially of

that

finally

research

in

is which

application

multiple

notions

solvin@

Storage

Time

of level

importance

views

interactive

at

a

time

use

of

has

the

~ea--

systems.

predicete

parametric

logic

flexibility available

and

prob-

locks data

In are

of

bases

as

to

more

gener-

use®

Structures structures

"what

textbooks

a

means

assume

performance

exceedin~

the

problem

Storage ally

at

offered

similar

3,

Records

research structure~

information

users

and

orientation

particular

the

the

high

of

function

tune

developed

system this

levels

solving,

base

pest

of

base

that

data

structures

data

three

external}

{parametrlcs

~ndependence

particulart

In

least

Data

achievements

type

fig.

with

for

can

like be

represent

research

found

the in

important

~ctivlties.

B--trees Knuth

or

VOlo results

to 3t

say chapter

and

are

it

6" basic

or

other to

fu-

81

Recognizable

I,

Trends

Data

After that the

find

models

area

of

the

the

data

base

this

area

has

respect

contain that

solutions men%

3.

Data

current

management

system

systemo

ent

types

of

the

management

in

one

likely

to

and

to

functlons.

It

and

data

in

can

of

solved experi-

which

need

in

recovery T be

Increased

systemsv

the

problems

cannot

system°

sharing

of

arises

a large

is

more

data

into

number

a

consistency a

much

be

ex-

integrated

base

mann@e--

of

and

data

places

where

programs to combine

central among

simpler

descriptive

system

and

recognizable

offerln~ the

operating

about

trend

ensuring

is

the

in

differ--

dictlonaryv

descriptive to

interface

des--

stored

these

data

base

the

data user

for

information.

~

Performance

constitutes felt

that

tn

constitute research

the

sense

currently current

performance~

performing

and

many

makes

merge

descriptions

generally

Performance

ble

A of

maintaining

l0

and

within

apparent,

information

the

problem

problem

functlons~

system

time

more

models

the OS

lead

research

even

different

that

management already

realized

Dictlonary/Directory

the

thereby

even

increasingly

justi~icatlon of

resource system

to operatlng~

criptive

and

Into

apparent

operating

further

systems

With

DBMS

operating

ence

pected

future.

made

outside in

the

scheduling~

classic

their

is

superimposition

in

has

of

have

it

coexistence

the

attention

research

i.eo

The

is called

Integration

%he

controversies~

system°

more

Past

Major

Coexistence of

different same

system

2.

Model

years

of the

systems

do

though

%hls

alternatives= a bottleneck is

throughput ma3or

necessary

In has in

not can

and

transaction

problem. offer

the

only

be

It

level

this

been area.

recognized

rate generally

of

proved

partlculavltha% not

is

achlewa-by

CPU In

better

time

may

the

past

82

2.

Integrity)

It

is

Data

necessary

system

can

phasis

be

handled

here

on

is

these

functions

users

installation

niques

which

desirable

3.

Concurrency

in

in

&

%ribu%ed

on

network

Design

Tools

todays

systems)

%he

to

he

is

these is

Data

In

a

given

data

from

logical

time

order~

to ~

process

extreme-

schedul-not

have

bases)

so

in

been

increase which

oP

how the

the

for

in

are

dls-

systems~

llke:

to

how

select

current

of

helps

reported

development

to

the model

hardware

state

which

research

the

future

decisions

InfoPmatlon) of

for the

to

time=

such

de!etion

and

and

the

in in

of

inevitable

clock) (The

which

The

degrades

and

ant)

making

section

6

tools,

~eload Is

range

A

solution avoids

order)

significant

large

interruption

time

a

is to

not data may

from the

interruption.

reorganize

type

of

does

performance

therfore

For

is which

and it

parts

generalv

of

the

duping

too %o

utl--

in

which

become

a~fect

storage

available

hours

pbyslcal

not

is)

bases)

of

update

physically

reason

physical

use. the

%o

fragmentation)

which

normal

addition)

is

but

dump

Peorganlzation}o necessary

wlth i%

reestablish

wholel

tolerable.

ks

tech-

are

problems

data

more of

With

storage

To

necessary

around

Simllarly

the

computers.

even

much

system

like

llzation.

as

provide that

prevention)

These

with

number

Some

information

b~se

em-

Reorganization

disorder the

of

relevant

dynamic

stored the

a

decisions.

S®

so

faliures

efficiency)

way.

and

Information)

certainly

benefit.

The

(to

tPlvlal}

~deadlonk wlth

repmesentatlon. not

representationst

efficiency

from

specifying

efficiency.

is

has

of

data

wlth and

recovery

and

make

conceptu~l

physical

whole

systems

In

has

a

Papld

concurrency

4,

user

and

efficiency

satisfactory

a

Pules system

functlons

connection

multlprocessing

posslbllties

lacking,

of

again

solved

as

Recovery

more

the

mode

and

problems

by

ignoring

allow

ly

ing)

provide

integrity

enforceable

which

The

Independencev to

reoPganlzatlon

this

are long

weeks

data

used to

be

fop

a

problem

83

Acknpwledgement

The

are

authors

Scientific Heights

grateful

Centerv and

San

Jose

Pope

and

North

they

are

grateful

of

preversion

a

8.

to

and

America ±o of

their

members to

Eo

F.

collegues the

IBM

many

Codd

helpful

and

M.

E.

at

the

Research

representatives

for

thls

of

from

IBM

Heldelberg

staff

at

Yorktown

Universities

discussions. Senko

for

the

basis

lh

~u-

Specifically a

crltlcal

review

for

status

report°

BIBLIOGRAPHY

The

subsequent

report.

It

research

reader

the

list

is

hoped

results.

critical

in

entries

in

Re~erences

Definition

also

be

lists

II~

They

37

82

169

179

194

142

152

166

12S

152

175

65

66

78

17

18

Tndependence

47

48

55

82

180

181

182

194

4S

Data

Integrity

1

29

30

129

163

176

Data

1

Manlpulation

3

6

Languages

13

16

a

should

iS

a

partially

author.

this

reference

to

are

intended

not

he

in

recent to

help

considered

as

Subsection

references

first

95

as

elsewhere,

cross

which to

value

annotations

found

of

according

of

the

Languages

35

Data

can

subsection

8.Io

ig

literature.

which

ordered

it

contains

presen%~

selecting

ordered

Data

references

that

references,

Cross

o~

Where

revlews~

alphebetically to

to

numbers annotated

I contains referring

list

of

84

19

20

2S

28

3S

36

40 72

42

46

59

68

69

70

73

74

79

87

93

I01

102

lOS

106

I09

II0

119

123

!28

131

136

141

143

147

149

15S

158

173

183

I~4

185

194

198

Data

Hodel

17

Da%a

Equ£valence

82

122

134

167

Models

1

2

4

7

8

14

20

34

35

38

39

41

43

52

56

57

58

63

68

69

70

79

!21

124

133

151

15~

157

178

190

7S

94

171

142

153

165

177

Data

Security

30

Dat~

44

T~ansla%ion

95

108

Modelling

and

126

Analysis

-- G e n e r a l

-

24

61

8S

~6

91

103

107

I13

115

117

127

132

137

144

161

188

i~3

196

22

138

14S

154

170

31

32

Tools

12

-- O p t i m i z a t i o n

2t

23

Algorithms

26

2~

33

85

60

71

88

97

98

114

139

1SO

162

164

174

197

94

171

187

S0

62

81

83

148

116

Privacy

76

Recovery

IS

Resource

29

Search

S

Storage

Allocation

30

65

and

Scheduling

67

Algorithms

6

84

92

140

147

1~

Structures

9

I0

II

S1

77

80

90

96

III

112

130

146

ISS

182

186

189

Surveys

and

Textbooks

8

49

$4

64

99

I00

104

118

120

13S

IS6

160

172

191

192

86

8e2o

References

1.

Abrlal~ W0rk.

J.~°

sterdam

y

paper

ing

the

is

W.

cessoP !44

L~t

and

156

Deductive (1968)

is

the

terms

of

father

00.2200

6.

to

~educe

M°

l%hm

for

the

i@74

ACM

Astrahan~

M~

scope

exceed-

advocates

a

on

Data

Base

!975.

binary

Associative

ACM

Natl.

Pro-

Confe~ence~

relational

model

definitions

the tO

of

grandfather

deductlve

Manipulation. T

as

in

a

relations a

function

capabilities.

The

Division

and

M.

a

Connection

Matrix

Poughkeepsie

v

TwRo

English

algorithm

employed

Co

of

W.

S.

is

P.

by

The

ACH v

in

RIL

Chamberlin~ Language. the

data

attrlbutes~ is

matrix

true

the

rows

and

with

techniques

it

A

SEQUEL to

make

accessing

Programmer

Search

Accessing New

described

query

Query

essentially

where

a

1

in

respect

to

have

be

To

requlrementSo

Gosh~

and

mat~ix~

represent

Sparse

Workshop~

M.~

binary

attrlbute

Independenf

given

11minlmization"

the

false.

algorlthm %0

a

columns

that

Data

Describes

Bachmann~

a

(e.g.

as

sto~age

SIGFIDET

heuristic

Structured

7.

Am-

and

Interpretive

accepts

leads

Data

the

otherwise

As%rahan~

path

of

represented

indicates

entity,

applies

a

Committee

1968

Development

entlties~

position

cess

IFIP

1971 is

represent

An

It

which

of

System

June

Study

relations

Concepts

Informatlon

A

of

Holland~

entities.

Newsletter~

TRAMP:

wlth

describes

Report:

Capabllitleso

mother)

IBM

it between

SIG}~OD

system.

othe~

and

R.

area.

implementation

in

Method.

5.

North

•

of

Ashany~

ACM

Sibley.

answerlng

the

Proc.

Management~ 1974.

philosophical

relations

in%trim

question

a

B~se

April

and

management

binary

Systems.

with

-

TRAMP

4.

base

with

ANSI/X3/~PARC.

Ash~

Data

Corsica

mathematical

data

model

Management

3.

Semantlcs®

Cargese~

1874.

The

data

2.

Data

Conf.

York~ whlch

Path

Selection

Hodel

{DIAM)

Alger•

Proc.

!974. constructs

a

DIAM

ac-

(Fehder).

D. CACM

D.

Implementation

1By

5@0

Interpretem use

of

-- 5 8 8

and

a

{1975}.

the

secondary

of

feductlon

indexes

for

operations.

as

Navigator.

CACM

16,

653

-

658

87

(1973). C.

8.

Wo

Bachmannt Proc.

C.

vol. ape:

evolution

The

The

of

Rot

Large

and

Ordered

used

by

data

Lecture.

Management.

data

AFIPS

description

NCC

1~75

(conceptualt

ANSI/X3/SPARCo

structured %0

the

introduction

of

Bayert

Re

model

[graph~

understanding

new

hardware

Symmetric

Binary

structure

described

Bayer~

Storage

network)

of

to

the

vs

nature

support

data

%ual

Data 1,

are

a

and

290

189

search

%he

has and

be-

ef~i--

method.

Structure -

of

{1972}.

{B-tree)

Logarithmic

B-treesv

Bayer

-

173

of

Maintenance

306

and

Mainte-

{1972},

modlflcatLon

of

the

storage

McCvelgh%o

Characteristics

and

Processing

Methods

74,

440

for

--

444t

Searching North

and

Hollandt

19740

paper

access~

by

I~

characte¢istics

B--trees

and

organlza%lon

Informatica

Information

Amsterdam~

Informatlca index

Binary Acta

Organization

structure.

are

Symmetric

Addressing.

Acta

storage delete

Algorithms,

R.

Eo

hierarchical

Insertl

nance

12.

as

Indexes.

a standard

clent

The

trlpartlte

McCreighiv

described

come

11.

Base

Award

{1975).

contributes

The

Turlng

algorithms.

Bayert

The

3.

a

debate

model

data.

base

of

ACM

in D a t a

-- 5 7 6

external)

current

relational

1973

Trends $69

I.

20

10.

W.

44,

Trends

internal~

9.

famous

Bachmannls

contains

pseudo

a

discussion

random

of

access

hashing

{l.e.

B-trees

and

indexed

In

sequential)

random

and

vlr--

memoeies.

Bennet

t

Systems.

~e

To

Traditional

and

K~uskalt

appear

in

stack

large

average

large

number

gorithm

Tot

to

J°

Joof

handle

thls

Processing

and

Dev.

algorithms

distances

distinct

Stack

Res°

processing

stack of

IBM

Vo

as

pages,

situation

they The

Data

Base

(IS75).

are

inefficient

appear authors

wlth

for

in

the

describe

drastically

for

case a

o~ new

improved

a

al-

effl--

clency.

13.

Bergenv

Mot

Environment and

Its

Erbet for

R.t the

Application

Pistor~

P-t

Interactive in Computer

Schauer~ Evaluation Aided

U., of

Design.

and

Walch~

Go

Scientific Proc.

Workshop

An

Data on

88

data

fo~

bases

interao%ive

dams s editors), ble

14.

from

Blller~

ACM.

~®s

and

15.

North

BjoPk,

L~

National This

16.

17.

paper

Eo

Formal

is

the

in

papers.

a data

Gamma--Zero

n-ary

of

and

J®

C.

Decker~ Data

a

View

on

74,

Proe,

DB/DC

papers

See

IBM

o~

and

[5--16~

J.

G®

1979v

Lln-

availa-

Schema-Subschema of

IF[P

System.

Con--

1973

ACM

T.

K.

describing Davies

L.,

Base

Interface: Report

level

for

query

recovery

the

Tralger,

Research

low

a

first

I.

L.

of

The

Speclfiaca%ions RJ

1200~

language

1978.

accessln@

a

base.

An

(R.

two

system~

F**

Cleemput

(1973}.

of

Relational

data

R.

for a

Operations.

descmiptlon

Sytems

Scenario 142--146

Eo

relational

Base

1974.

base

A de±ailed

Bobrow~

Processing

second

Codd~

D.~

Objects

Jo

Amsterdam,

PPOC.,

the

BJo~ne~

Neuhold~

Recovery

concept two

/149/.

~ollandv

A.

M.

September

Schauer

[nformation

Conf.

(W.

Canada~

also

See

Correspondence, gress~

design

Waterloo,

Experimental

RUstln

Data

editoP)~

Management

System.

Prentice--Hallv

In

Englewood

Data

Cliffs,

1972. The

paper

describes

It

contains

(hierarchy

18.

Boyee~

a or

R.

as

Management~

Proc.

of

1974,

North

Holland,

SQUARE

iS

a

Bracch!~

D.

IFIP

G.~

the fop

Fedeli~

System.

ettrotecnica~

vs*

D.~

implemented

dlsc~sslon Codd*s

King,

Work.

W.

Conf.

of

relational

F.,

Expressions:

AmstePdam~

on

system

excellent

Relatlonal

syntatically

based

Management

but

approach

Chambevlin~

Chamberlln/Boyce

19.

brief

Queries

language

experimental

network)

F.~

Specifying

an

and

LISP.

the

EDMS

approach.

Eammer~

SQUARE.

Car~ese,

In

MQ

Data

Corslca,

M.

Base April

1974®

te~se,

so--called

set

omiented,

"concept

hlgh

of

level

mapping"°

query

See

also

Date

Base

"SEQUEL".

A.~

and

Laboratorio

Poli%echnica

di

Paolini~ di

P.

A

~elatlonal

Calcolatorl,

Milano,

Internal

Instituto Report

di

No.

EI-

72--5,

1972. ~|ORIS is

a Codd

pulatlon

language.

hierarchical

relational The

structures

system

wlth

a

users

wlew

{i.e.

uanormallmed

calculus

(external

oriented

schema}

data}.

may

manl-include

89

20.

Bracchl,

G.~

Model

for

Prec.

of

Fedeliv

Data

Base

IFIP

Holland,

A,~

the

ceptual

Conf.~

schema

P.

Systems.

Cargese,

A Multilevel In

Data

Corsica,

Relational

Base

Management~

April

1974,

North

1974.

binary

{hierarchical,

Paolinit

Management

Work.

Amsterdam,

Advocates

and

relational

and Codd

many

(graph

model

models

relational,

for

etc.}

the

as

fOm

model)

the

external

well

as

con-

schema

Internal

sche-

ma,

21.

Buzen~ ry

queuing Is

offems

a

Model program

and

A.

costs

also

and

Fo

CACM

E.

The

play land~ GADS

F. t

1,

System.

is

an

North

Hol-

sets of

in a

data

memory

sets.

hier-

The

paper

be

tlme

of

File

Organization

--

1973,

used

to

given

estimate

the

data

total

sto-

organization

Performance

of

Inverted

Data

Base

197S. SchkolnIek

and

Yue/Wong

for

re-

J.

P.

Doubly

Modeling

Chained

and

Tree

Analysis

of

Structure.

In--

1975.

J.

L.t

Evaluation

Giddlngs~

of

an

Go

M.~

Interactive

Processing

74,

and

Manteyt

Analysis

10SS

-

and

1061v

North

P. DisHol-

1974o

and

provides a

graphics intended a

data

in

ence

gained

with

GADS a n d

this

kind.

of

-$48,

263,

The

--67t

stored

system

271-27So

in M e m o -

subject.

data

a

Balancing

results.

may

Sagamangt

interactive

It

data

Selection

540

and -

Information

locations

grammers.

and 16y

Bennet,

and

Amsterdamt

graphic

and

57

P-T

Design

Chen#s

access

thls

Organization:

E.

to

Farley/Stewartt of

Data

Carlsont

74,

allocation

which

2S3

18,

A.

Systems

Load

specifications.

Cardenas~

Base

Processing

the

Analysis

CACM

Klng,

Optimal

access

of

average

related

treatments

form.

2S°

the

descrlbed~

A.

P.-S.

Evaluation

System.

Structures.

24.

F.

is

Cardenast

See

for

to a n a l y z e

generalization

device

cent

P.

1974.

model

and

a

A

Chen~

Information

used

Cardenast

rage

23.

and

Amsterdam,

archy

22.

P.t

Hierarchies°

land, A

J.

variety

of the

system as

a

for tool

extraction files.

The

requirements,

data to

related

be

used

technique paper

by for

to

accessin~

discusses

which

must

geo-

non--pro-

experibe

met

by

90

26.

Casey~

R®

Network.

27.

The

author

lem

of

The

costs

28.

1973

SJCC

gives

an

storing

G.

Design

Free.

D.

Query

of

Copies

1972

Prec.,

exact

and

data

of

Chamberlln~ lish

Allocations

allocating

R.

Casey~ NCC

G. AFIPS

sets

at

of

and

Tree

a File

40,

heuristic a

in

617

an

-- 2 2 5 ,

of

to

Networks

Distributed

-- 2 5 7 ,

D.~

and

Boyce~

for

the

prob-

computers,

between

251

Information 1972.

solution

network

%ransmlssion

42,

ACH

a

within

vol.

Language~

of

vol.

~Iven

nodes.

Data.

AFIPS

1973®

R.

F.

SIGFIDET

SEQUEL

-

Workshop

a

STructured

1974,

ACM~

Eng-

New

York,

1974. SEQUEL

Is

SQUARE, Boyce/

29.

a

however,

D.

Free

Scheme

for

tion

Processing authors

processes ite

in

delays

zatlon NCC h

and

view

The

cussed.

North

case

to

Of

those

English.

See

deadlocks.

of

-

use

Traiger,

a

Data

In

Base

~olland,

A

Deadlock

System.

Informa-

Amsterdam,

deadlock--detection Their

L®

and

algorlthm

1974o

baekout

of

avoids

indefin-

Viswst

Authorl--

process.

D.~

Gray~ in

44,

a

virtual

J.

a

425

can

the

in

343.

%o

language

restrict

similar

%o n a t u r a l

and

Locking

propose

a

closer

F®,

340

Locking

Views

R®

Resource

vol. is

query

syntax

very

SQUARE.

Boyce,

D.~

D®

Proco

a

for

semantics

74,

of

Chamber!Int

with

with

Chamberlin

Chamberlin~

The

30.

language

N.t

Tralger~

Relatlon~l -

430,

Data

I. Base

relation

derived

form

The

problem

of

be

fop

authorization.

access

%o

a

SysTem.

1975

AFIPS

1975.

SEQUEL. used

L.

view

for

the

other

updating

relations

via

is

dis--

views

Locks

exclusive

temporarily use

of

one

user.

31.

Chandrav ment

to

disk one

32.

System.

S.K®

to

related

specJ

drives dlsk

algorithm

Chang7

Wong~

K.

C.

Worst

Storage

Case

Analysis

Allocation.

To

of appear

Place-

a

in

SIA~

Computing.

authors

of the

on

and

Ko~

algorithm

Journal The

A.

fy

such drive is

a

heuristic the

that

is

ACM S t G M O D

probability

minimized.

analyzed.

Data

algorithm

Base 1975

See

The also

Conf.

of

worst

allocate

data

simultaneous case

sets access

performance

of

EasTon/Wongo

Decomposition InT.

to

in on

a

Hgmt.

Hierarchic of

Data~

Compute~ San

Josev

91

1975. The

author

cost

33.

Chenv

P.

tem.

1973

A

34.

S.

Optimal

AFIPS

Caseyls

results

the

hierarchy

CODASYL

Development

and

deflnitlon

section

sets

can

CODASYL

CACM

by

allowing

&

non-llnear

of

an

n--tuples

CODASYL

also

2821

problem

Language

many

or on

the

taking

Structure

Sys-

queu|n~

Group.

An

In--

1962.

ideas. idea

which

Storage

1973.

BuzenfCheno

1 9 0 -- 2 0 4 y

entity

Multilevel -

allocation See

fop

of

in 277

Contains~

that

then

fop

files

jolns~

may

example T be

union

and

interInter--

performed°

from

original

St

source

Programming

Available The

be

42~

Committee.

Algebra.

as

vol.

considerations°

"oldtlmer"

the

Allocation

Proco

into

preted

Flle

NCC

of

treatment

formation

36.

extended

effects

An

3S.

has

function.

Language

Committee.

1971.

DBTG-Report.

ACM. DBTG

proposal.

Programming

Language

Committee.

DBLTG proposal,

Febru-

1973.

ary

Contains nltlon

the

COBOL

language°

data

The

manipulation are

languages

and

suvschema

essentially

data

those

of

defi-

ref.

3S.

37.

CODASYL

Data

Language. Essentially

38.

39.

CODASYL

the

Systems

Base

Management

from

ACM.

of

same

data

data

model.

F.

Relational

Codd~

The of

E.

CACM

paper

in

Feature

Systems.

compares

A 13~

377

which

Technical

commercially

Model

-- 3 8 7 y Codd

Committee. June

definition

Committee.

a

network

Language

Development~

Primarily

Banks.

40.

description

Journal

language

Analysis Report,

available

of

Data

Description

1973o

Data

as

of

in

Generalized

May

1971.

systems~

for

35.

Available

contains

Large

Data

Shared

also

Data

1970.

introduced

%he

{Codd)

relational

model

data.

Codd~

E.

F.

A Data

Base

Suhlan~uage

Founded

on

the

Relational

92

Calculus.

41.

E.

Codd~ Model~ Data

1971.

Fo

and

Base

CllffsT

42.

Codd, Data

Further

Systems

E.

F.

Base

of

Information

F.

Amsterdam,

Recent

Base

Relational

Sublangua~es.

Prentice--Hall~

R.

mentation 211

--

The

main

of

W®,

220,

In

Englewood

multiple

In

User.

Cargesev

Corsl--

1974.

are:

natural simple

data

dlalogue~

choice

lan@uage model,

query

Pes-and

Interrogation

a

[n Relational

74~

1017

-

Data

1021,

Base

North

Sys--

Holland 9

Codd's

relational

data

topics

sublansuage

including

types.

superimposition

needing

Maxwell,

model

W.

L.~

The

and

discussion

author

storage

lists access

investigation.

and

Measures

a

Morgan,

in

H.

L.

Information

On

The

Systems.

[mp!elSv

CACM

1972.

in

at

as

a

R.

File

W.~

The

Maxwell~

W0

by

file

which accesses

ve[llance

progPam~

automatic

functions.

which

contains

ls

also

a

security

conscious

of

discusslon

of

1972.

L.~

and

Morgany

H.

Processing

L,

A Technique

74,

988

-

992.

1974. by

has

are %o

of

checking

approach

Information

Ams%erdamv

Each

To p e r f o r m

an

paper

implemented

implemented

declarations~

is

%ime"~

Surveillance.

technique

All

paper

resource.

~olland~

described.

this

compile

systems

Conw~y~

Casual

proposed

steps

Inves%igatlons

Security

idea

"once CPU

the

Conf.

clarification

queDy~

and

the

of

security

Work.

%o a

The

logic v

with

Amsterdamy

steps

performance,

among

Conway,

IFIP

of

Processing

normalization

gram.

Data Base

1971®

1974.

survey

concu~encyv

A

the

Yorkl

capability.

E.

for

New

Data

Rendezvous

system.

declara±ive

tems~

North

seven

Internal

Codd~

A brief

45.

North

answering

%atemen%~

to

Holland,

level

theory

Steps

Proc.

descriptlon

the

of

editor).

ID747

The

only

Rustln

ACM~

of

Completeness

Hana~emen%~

definition

44.

NormalizatiQn

(Ro

Seven

Aprll

of

Workshop,

1971.

question

43.

SIGFIDET

Relational

ca~

high

ACM

the authors

associated complled

the

which

file

can

in

with

into

a

have

To

then

be

their £t

a set

file pass

used

system

to

of

ASAP

is

function

surveillance

pro--

through

suP--

perform

the

certain

93

46°

DanaT

Co~

and

and

Device

~o10

41t

The

paper

Date,

-

of

J.,

An

InforamTion

Report

1116~

Structure

Generation.

AFIPS

for

FJCC

Data

Base

1972

Prec.

1972. high

describes

Co

Data

L.

Independent

1111

manipulation

47°

Presser~

level

elements

for

The

generatlon

and

reports.

and

Hopewellt

Independence.

P.

1971ACM

STorage

SIGFIDET

Structure

and

Workshop,

ACM~

Physical New Y o r k ~

1971.

48.

Date,

J.,

C.

and

Independence.

49.

Dater ley~

Co

J.

An

Readlng,

Similar

Hopewell,

1971ACM

Introduction

Flle

to

book~

one

to

introduction

Deflnltlon

and

Workshopv

ACM~

Data

Systems.

Base

New

Loglcal York,

Data

1971o

Addison--Wes--

197"5.

Massachusetts~

To Wedeklndes

prehensive

P.

SIGPIDET

of

data

the

first

base

attempts

systems.

of

Many

a com-

annotated

references=

~0°

Davies, Natlo

C.

Together to

51°

52.

a

T.

Confo

Recovery

Prec.,

with

Dearnley~

P.

System°

others %he

Delobel, The

Theory

17,

374

Deals

-

as

Comp.

for

a DB/DC

System.

1973

ACM

1973. an

easy

To

of

a

Model

Self

20~

-- 2 1 0 ,

system

Journal observes

accordingly°

and

Caseyw

of

Boolean

386,

1973o

the

R0

E.e.

original

G.

of

Into

17,

understand

patterns

Slmulatlon

introduction

set

of

without

flat To

are

a

I B M J.

decomposition a

of

a

files

derive

allowlnS

Data

1974.

usage

of

Functions.

allowing

file

of

Organlzlng

results

DecomposDtlon

Switching

problem

property, The

paper

Opera%fen

redundancy

cover

Tion

The

with

{enormous) mal

A.

data

C.,

141~

concept.

Management

tures

-

BJorkls

recovery

Among

Semantics

136

and

Data

Base

Res.

Develop.

flat

the

same

further

and

with

file

having The

restruc-

reported.

mlnl-

Informa-

decompo~i-

tlon°

53.

DI

Paola,

Classes Santa The

of

R.

Monlca~

paper

A.

The

Proper

Callf.

deals

with

Solvabillty

Formulas Technical the

and

of

the

Related

Report

solvability

Declslon Results.

Problem

Rand

R--803--PR, A u g u s t of

The

decision

for

Corp.,

1971o problem

of

94

class File.

54.

of

See

Storage.

55.

M.

Dl%fmann~

deP

E.

Annual

Press~

den

%0 be

Data

Structures

Review

L~

and

in Automatic

Rends

Rela±lonal

%help

Data

Representation

Programmlngt

Klasslfizlerung

System-Entwurf. Infomm~%ik-

A~

GrundsTruktur

elnes

notes

yon

Technlsche

vol.

5,

in

PeP@a-

in

Des

Konzept

Darmstadt.

Berlehte

DV75--[

des

ObJektbeschrelbungsbaumes

gPaphenorlentlerten

computer

fuer

Datenunabhaeng£gkelt

Hochschule

FoPsehungsgruppen

Doerrscheidt,

ture

by

1969°

E®

Berlin,

processed

Levien/Marono

D~Imperio~

mon

questions

science

26,

als

Datenbankmodells.

532

-

541,

Springer

LecVerlagv

[975.

Describes

a

Typically

graph

o~iented

data

model

based

on

LISP

ideas.

57.

Durchholz~

R.~

Systems°

Data

Corsica~

April

Influenced the

58.

to

J.

guages~

Aeta

M.

s%Paints

on

%he

Work°

Conf.

Feature

model

of

Management CaPgese,

1974.

"CODASYL

data

Base

and

Data

Analysis"

schema.

Structures.

CACM

related %henry

Level

the

C~

2,

to a of

Theory

formal

Data 293

and

data

Structures -

incorpoPatlon

llke

of

string

structures

s[ml--

309, of

languages.

for

Programming

Lan-

1973.

relational

level

data

struc-

languages°

Wong,

%he Minimal

Co

Cost

K.

The

of

a

Effect

of

Partition.

Capacity

JACM~

22,

Con-441

-

1975. algorithm

proposed,

Easton~ IBM

of

Data

AmsTerdam~

Understandlng

Informal[ca

ALGOL

EasTon~

449~

ideas

for

into

A new

61.

an

model

fop IFIP

Proc,

~oll~nd,

hierarchical

Relational

proposal

tures

Concep±s

1971®

available

Earleyz

A

data

Towards

some the

North

a

J.

Go

Management

discuss

-- 6 2 8 ~

Sketches

60.

the

14,

617

Rich±er~

1974.

Earley~

lap

59 °

Base

by

authors

and

whlch

M.

C~

Research

%0

%he

accep±s

~odel Repnm%

for PC

problem capacity

considered

Chandra/Wo.g

is

constraln%So

Interactive 5050,

by

Sept.

Data 1974.

Base

~eference

STring.

95

Describes

a

of

modification

which

describes

model

Is

measured

the

independent

behavlour

its analytical

well,

tractahilltly

references

An

under

model,

advantase working

of

set

The

assump-

tions.

62.

Edelberg,

M.

SIGFIDET The

of

descrlbed~

Ehrlch~

H.

D,

InformaTlca graph

The

and

which

201

--211, data

for

(i.e°

log)

data

blocks.

einer

Recovery.

1~74

ACM

1974.

restores

Grundlagen

4,

York,

transfers

oriented

model

New

and

algorithm v which

processes

is a l s o

A

an

data

Into

Contamination

ACM~

describes

set

pagatlon

63.

Base

Workshop,

paper

given

Data

a

given

de%ermlnes

blocks

Theorle

A and

der

error

The

error

recovery reruns

and

a

pPo--

algorithm

processes.

Datenstrukturen.

Acta

1975.

model

are

investigated

W°

A

and

graph

from

a

Data

Base

orlen%ed

more

schemata

mathematical

within

point

of

view.

64.

Engles, view

R°

in

Tutorial

on

Programming

Automatic

vol.

Organization.

part

7

It

Annual

Pergamon

Re-

Press,

1972.

65.

Eswaran, The

Ko

P.~

Notions

System.

IBM

paper

The

of

Research

defines

concurrency~ guage is

Gray~

and

presented

N°~

Loriev and

Report

The

RJ

1487~

locks

determines

A.,

and

Tralger~

Locks

December

and

Their

is

I.

On

Base

within

consequences.

Two

L.

Data

consls±ency

proposed~

whether

in a

i974.

transaction,

specification

which

R°

Predica%e

of

no%Ion

predlca%e

predicate

fOr

J.

Consistency

and

such

an

A

lan-

algorithm

predicates

over-

lap.

66.

EswaranT of

1601~

Po,

and

Chamberlin,

for

Data

a

rules

interpreted

are

data

Everest~ rity.

Base

D.

D.

Specifications

Functional

Integrity.

IBM

Report

Research

RJ

1975.

Con%alns

the

67.

K.

a Subsystem

as

of

consistency

routines

To b e

rules.

invoked

Consls%ency

after

changes

Of

base.

G.

Data

Cargese,

classification

C. Base

Concurrent

Corsica~

Preclalmln~

of

Update

ManagemenTT April resources

241

1974. to

--

Control 270,

North prevent

and

Data

Base

Proc.

IFIP

Work°

Holland~

Ams%erdamt

deadlocks

is

Integ-Cent.

1974.

advocated

by

96

the

68.

author.

rende

I,

Informal and

a

of

Sprln~er

hlgh

Falkenberg

%he

from

T1

of

der

E.

language

The

of

Farley,

72.

also

Fehder, search The

73.

computer

a

data

model,

Management

Systems.

lnformatik,

Internal

A

employee

of

B

manipulation

dlmenslon.

und

Dars%ellung

Datenhankbenutze~

a

is

a data

data

closely

of

{and

yon

Informatlon

und

Detenbank--Man--

Stuttgart,

1975.

model

and

a data

related

to

concepts

though

graphically

It

an

manlpulation in for

allows

are}

natural n--ary

loterpreted

as

relations.

Cardenas

papers

DIAM

be

and

Stewart~

Relational

L®

Base

extends

Is g r a p h o r l e n t e d

of T o r o n t o ,

P.

in

it.

example:

and

tlme

of

can

G.,

S.

Data March

for

The

Reports

fo~

fuer

Unlversity

are

model

for

Unlversl%y

Data

{for

to

zwlschen

both

H.

J.

Selection

See

the

Thesisy

binary

in

Instltut

time

with

which

Notes

1973.

language

relatlons

description

relations

Resultatspezlflzie-

"Gegens%andsmodell"T

S%rukturlerung

where

lanEuage.

71.

of

Schnl%%s%elle

A detailed

Heidelberg,

%he

manipulation

stored

to c o p e

J.

Lecture

1974.

dimenslon

agement--System.

Joins

of

Stuttgart,

07/74,

Falkenberg,

Schneider,

Time--Handlln~

to T 2 )

language

and

Da%ensystemen.

Verla~,

level

T E®

Universlty

Adds

B.,

yon

discussion

CIS--Repor%

70.

~eyer,

Handhabung

science

6S.

E=,

Faikenbe~,

recent

A.

Query

Bases.

Technical

investigations

Independent RJ

descmibe

RIL,

the

Report

into

1121

(1972)

and

Index

CSRG-53v

1975.

Representation

RJ

Execution

and data

12Sl

this

subject.

Language.

IBM

~e--

to

the

i.

IBM

(1973|.

manipulation

language

system.

Fehder~ ~esearch Describes

Pc

L.

The

Report a

RJ

query

Hierarchic

Query

1307,

1973.

Nov.

language

to

Language

operate

on

(HQL)

IMS

part

like

hlerarchlc

datao

74.

Feldman, Language.

The

high

J.

A®~

CACM

level~

and

12,

439

ALGOL

Rovner, -- 4 4 9 ,

llke

P,

P.

An

ALGOL

based

Assoclatlve

1969.

programming

language

LEAP

is based

on

97

binary

associatlons~

which

are

implemented

uslng

a

hash

coding

P.

An

Author--

technique.

7S°

Fernandez~ Izatlon Conf.

E.

B. t Summers~

Model on

for

M~mt,

of

Authorization data

76.

base

purer A

77.

governed

Ro

and

vol.

The

by

und

and on

Coleman,

Base.

C.

ACM

SIGMOD

1975

Intl.

197S.

predicates

enforced

26,

and

Joset

over

prlmarily

at

applications

compile

Lecture

Gesellschaft.

and

time.

Notes

in

Com--

1975

discussions

A.,

Retrieval

C.~ Data

San

Datensehutz

of

Finkel~ for

is

Science,

survey

Shared

Data~

contents

H.

Fledler7

a

R.

on

Bentley,

privacy.

J.

L.

Ouad-trees:

Composite

Keys.

Acta

of

trees

for

a

Data

Informatica

Structure

4~

1

-

9t

1974. A

generallzatlon

binary

the

search

on

composite

keys.

78.

Florentln, nal

17,

J. 52

-

Consistency data

J.

Consistency

$8,

of

Data

Bases.

Compo

Jour-

1974.

rules

base

Auditing

are

contents,

predicate

Problems

calculus

of

their

expressions

over

implementation

the

are

dis--

cussed.

79.

Frank~

R.

L.s

University

Shows

detail

in

Franks

R.

L.t

Access

Method.

Describes

81.

o9

the

and

steps,

and

AFIPS a

the

users

Frasert

A.

G.

Integrity

Journal

12,

C.

archical

Structure.

(GI

1975}s

A

Report:

-- w o ~ k l n g

have

to

the

DBTG

A Proc.

Illustrative

papeP to

made

-- 7, get

a

COBOL

approach,

Method vol.

oriented

be

An

for

a

43t

45

language

Generalized

Data

-- 5 2 I 1 9 7 4 , to

tailor

access

of

a

Mass

Storage

Filing

System.

Comp.

1969o

System

Springer

NCC

DBTG

ISDOS

specifications.

~ecovery

Frasson,

The

Ko

keyword

to

Ss

in

Yamaguchis

1974

I -

H.

which

runnlng

ideas

the

E.

Michigan,

methods

Describes

82.

Sibley~

program

application

80.

and

Example.

in

to

MULTICSo

IncPease

Lecture Verla~

Notes

Data in

Heidelberg

Independence Computer s

I~75o

in

Sciences

an

Hier-

vol.

34

98

Descrlbes thelr

83.

Gen%on~

in

the

Recovery

Compo

Journ.

Ghosh~

P.~

S.

Base

work

is

S.

P®~

I%

iS

al

bes±.

Data

IBM

path

and

S=

P.~

and

System

-

accessed

dlrec%

126,

b{. E.

J.

Independent

of

Res.

Dev.

of

queries

An

algorithm minimum

V.

Y.

System

shown

Tuel~

that

W.

G®

Commercial

journallng

Path

1ST

is

-

Procedures

422y

access

given~

"path

in

is

of

a net-

claimed

%o

Collision

by

division"

A

of

an

Design

when

Hashing

197S.

"hashing

[B~

1974.

paths

which

fop

cardinality".

15 -- 22~

Perfromance.

Sys-

techniques.

Search

408

to

Analysis

I~

Access

1970. and

String

of

Lum~

Inform.

analytically

Base

123

for

checkpointin@

reduction

access

Divlsion~

Ghosh~

13,

considered.

an

Ghosh~ by

the

be

hierarchy.

Senko~

Systems.

DIAM

yield

and

can

Procedures

elementary

Data

86 °

A.

structures

Describes

Within

85.

[MS

position

tems.

84.

how

Research

is

in

Experiment Report

RJ

gener-

Model

to

1482,

Dec.

1S74.

87.

88.

The

authors

ate

the

model

Goldsteln~

1970

MacAims

is

C.~

and

early

{I.

e.

of

Strnad~

N=

1974

NCC

AFIPS

Galatll

is

transfer)

R.

J.

The

in

an

MacAims

ACM,

New

and

IMS

Data

York7

evalu-

system.

Management

1970o

system.

%hat a

Data

Base

Report

qC

of

~eorganiza%lon 5063,

clustering

way

Quan%Iflcation

Proc®

vol.

op±imlzatlon

LEAP

Feldm~n/Rovner).

ten.

A.

Go

in

Discusses

Haerder~

measurements

model

as

to

Oct.

records

mlnimIze

for

a

1974. into

the

blocks

number

of

necessary.

GreenfeldT

(see

performance

Workshop~

IBm| R e s e a r c h

considered

units

with

relational

and

Hierarchy.

problem

linearlzed

SiGFIDET

S.~

The

~

comparison

ACM

an

Oorens%ein~

transfers

90.

by

R.

System.

Storage

89.

construct

T.

Die

Technlsche

Forschun~sgPuppen

43T

in 71

-

techniques

Implemen%ierung Hochschule DV74--2.

won

a

Relational

75~ fop

a

relational

Zugriffspfaden

Darms%adt~

Data

System.

1974.

Berleh%e

system

dutch der

llke

Bitl[s-

InfoPmatik--

99

The

author

vestlgates of

91°

Haerdery

Hall~

T°

Zugrlffszeitverhalten Datenbank,

of of

P.

Held~

G.

access

Ae

V.

D.~

Common

is a QUEL

IBM

UK

and

conventional

methods

der

Auswahl

In-

of

for

Saetzen Berlchte

simulation.

Identification

M°

and

R.~

Includes

a com-

indexes.

UKSC0060~

1975

yon

Darmstadt~

DV74--3.

help

Report

System.

relatlonai as

of

rity

assurance

the

Hoffmannv Tems.

Its

NCC

in General

Nov°

1974.

E°

INGRES

Wongt

AFIPS

L.

via

J°

B.

C.~

Proc.

Easllyo

ACMv

New

forms

a

-- a ~ela--

vol°

et

44,

4CS

--

Shut

Descriptlon

access at

and Los

N.

Paclflc~

DEFINE

to

control

calculus

interesting and

preprocesslng

Privacy

In

Angeles~

C°t

and

Language

of ACM

Computer

Sys-

1973.

Lum~

for

Integ-

time.

Vo

Y.

Defining

DEFINE: Informa-

San

Franc[seoy

Aprll

19751

graph

structures

%o a l l n e a r

1975°

then

map

referenced

speclflcatlon~

J.

Journ.

17~

Discusses inverted

by

written

(and in

processed the

according

language

CONVE~.

to)

a

See

Iverted 59 how

-

Indexes

&3y

to

and

Multlllst

Structures.

Comp.

1974.

use

multlllst

structures

in

order

to

maintain

files.

R°

lutions

incorporate

Pov

wlth An

al°

Inglist

Karpt

Data

D.

system

language.

modification

Companyt

Proc°

iS

which

to

Publishing

language

translation

query

Securlty

Smlth~

York~

Describes

is

query

management

level

{editor}.

A Nonprocedural tion

data

high

authors

Melville

Housel~

The

organization

to

Hochschule

Subexpress[on

StonebrakerT Base

plan

Shu

the

structures

Systems.

bel

Technlsche

wlth

storage

Data

INGRES

port

index

1975.

based

97.

an

supePior

Informatlk--Forschungsgruppen

416,

96.

as

are

der

tlonal

9S.

lists

elner

AlGebraic

94.

lists

bit

aus

p~rison

93°

blt

when

indexing°

Analysis

92.

proposes

M.~ to

a

RC 4 7 4 0 v problem

McKellar~

A.

C. v

2-dimensional ~lso considered

and

Wong~

placement

%o a p p e a r is

the

in

SIAM

placement

C.

K.

problem. Journal of

Near--optimal

so-

IBM R e s e a r c h

~e--

of

Computing.

records

in

a

2--d|men--

100

slonal

storage

eonseeu%ive

98®

Kin~

W.

search See

99.

I00.

E.

D~

E~

~*T

tO2.

539. The

Center

North

is

Lavenberg~

iOS.

S.

Levlen~

relations

for

Ott~

volo

N.~

Report

1968,

3:

C.

and

Computing

1973.

and

ZoepprJ*z~

IBM

Germanyv

a

data

1975o

manlpula--

language.

Retrieval

Concepts

Sorting

75.08,007~

tO

natural

in

a

set-theor-

Practical

Symposium

Consldera1973~

531

-

1973.

natural

a "set

language

theore±ic"

S®v

and

Shedler~

G.

IBM

Research

Report

analytically data

D.

Re

E.~ and

Introduces

LsvIt%~

into

Fundamen--

Massachusetts~

designed to

P,

I:

like

query

langua@eT

In%ermedla%e

lanBuage

base

File

S.

A

tractable

Queuing RJ

Model

1561T

of

the

DL/I

the

pro-

1975.

queuln~

model

of

On-Line

Systems,

access.

Structures

for

Spartan

1969.

Execution

1060

a

vol.

Information,

IMS.

durin~

Books~

has

a~ea.

interpretation,

of

Lefkovi%z9

Base:

Ams%erdam~

system

simpllfied~

cesses

close

Lockemann~

Data

Holland,

~oP

Introduced

International

translated

Composent

i04.

of

th~s

H.~

Technical

and

Re=

IBM

M~ssachusetts7

Readlng~

Lehmann~

very

a File.

Programmlng,

Heidelberg~ is

for

Programming~

General

Strucutred

Proco

suitable

A

P=~

K®

in

Readlng~

Computer

D®)

is

two

between

1974.

Languages:

system

distance

Indices

Computer

of

At±

expected

research

Addison-Wesley7

described

which

of

Lat%ermannT

Kraegeloh~

tionsv

103.

The

lan~uage~whlch

etically

January ~ecen%

Art

of

Addlson--Wesleyv

interactive

tlon

the

mln~mized.

Selection

for

Specialty

Scientific

that

is

13411

The

Searching,

User

An

RJ

Cardenas

D.

Kogon~

so

the

Algorithms~

Knuth,

M.

On

Report

Knuth~

and

lot.

F.

also

±al

aP~ay~

~eferences

the [see

G.~

and Data

Maron~

E.

Re%rleval.

Relational also

Stewar±~

Interactive

M.

Data

Di

A Computer

CACM

Data

[0,

Filer

71S a

System

for

721,

1967.

-

system

based

IngePence

on

binary

Paola).

D°

H.~

Analysis.

and

Yormarkv 1974

AFIPS

B.

A Prototype

System

NCC

Proc=

43,

vol.

69

101

-- 6 9 ,

1974.

Describes

an

relying

on

graphics

107.

Lewis~

implemented

standard

and

P.

statistical

A.

Transaction po=%

108.

RJ

system

analytic

W. t

1629,

and

ShedlerT In

AuGust

The

cess

with

~

Llu)

S.)

and

Go

a Data

of

varying

Heller)

Translation

of

It

measurement

makes

data

heavy

use

of

S.

Statistical

Base

System.

transaction

stream

Analysis

IBM

of

Research

Re--

1975.

modeling

time

analysis

me%hods.

Processing

Describes

for

procedures.

J.

Model.

a

as

a

Polsson

pro-

Grammar

Driven

Data

ra%eo

A

1974

Record

ACM

OrIentedv

SIGFIDET

Workshop)

ACM~

New

York)

1974. Grammars

may

grammars

mapping

as

109.

a

string

P.

men%

for

764,

1967 •

111.

of

a

strin~s

to

equivalent

string

C.)

and Data

W.

D.

Acqulsi%Ion

%o

the be

to

a

tree.

7we

are

used

frees

specification.

KnuTsen~

may

A.,

A

and

problem assembled

Data

and

Symonds~

Ao

MultlpvoGramminG Analysis.

of

Environ-

CACM

measurement

10~

data.

communicating

via

75~

Base.

PrOCo

1970

RAM

-

relations

{in

some

Lo~te)

R.

Ao

Scientific January

Prefadata

sets

J.

A

ACM

Schema

for

SIGFIDET

Describing

a

Workshop~

Rela-

ACM~

New

a

data

XRM -

Center

base

sense

an

management

llke

LEAP

Extended

Report

G 320

of

system

(n--ary) -

2096)

based

on

binary

Peldman/Rovner}.

Relational CambridGe

Memory.

IBM

~ Massaehusetts~

1974. Implements

homogeneous

flat

files

on

top

of

RAM

(see

Lorle/Symonds)o

112.

113.

-

1970.

Describes

XRM

mapping

string

tars°

R.

tional York)

mappings

programs

parame

Lorle,

as

different

approach

earlier

and

Taken

Online

bricated

110o

to

Lockemann)

An

be

Lum)

V0

CACM

13)

Yo

MulTi--aT%rlbute

660

-

Lum)

Y.

form

Techniques)

Yo~

665)

Yuen)

P, a

Retrieval

with

Combined

Indexes.

1970.

S.

Tat

Fundamental

and

Dodd)

Me

Performance

Key Study

%o

Address on

Large

TransExist-

102

Ing

114.

Formatted a

plled

large

%0

V®

Yo~

of

Secondary

356,

±he

Cardenas

Vo

Y®

1973.

Lumv

V.

ented

117.

H.

An

Optimization

Proc,

1971

Performance Using

for

and

an

E®~

Data

techniques

as

ap-

ACM

Problem

NAT1.

on

Conf.~

the

vol.

Selec26,

349

into

the

problem

considered

by

Abstract

Wang~ Set

of

File

C.

P.~

Key--To--Address Trans-

Concept.

and

Allocation

the

algorithm

CACM

Ling~

in

H,

Storage

16,

603

A Cost

-

Ori-

Hlerarcbies,

cost

for

of

data

storage7 set

CPU~

allocation

channel is

e%c,

outllned~

cost.

Smith~

Memory

Analysis

197~.

this

and

Virtual

an

combining

minimizes

K.~

M.

322,

-

function

Maruyama~

Investi~tlons

Senko~

318

defined

for

hashin 8

others°

Algorithm

A cost is

1971.

of

Keys.

General

Y.~

18,

which

Ling~

Methods

612,

CACM

4,

sets,

earlier

and

forms%ion

116.

and

vol®

evaluations

1871.

Of

Lum,

and

data

tion

One

llS.

survey

Lum,

-

CACM t4~

Files=

Con±alas

S=

E,

Analysis

IBM

Indexes,

of

Research

Design

Report

RC

Alternatives 5087,

0ct.

1974, A

number

B-trees cally

118,

are

Surveys

McDonaldt

ACM,

New

and

7~

N®,

alternatives

resulting

York7

into

for

indexes

formulas~

which

oPganlzed

as

are

numeri-

is

system.

McGee~

W®

5 -- 1 9 ~

a

See

W.

C.

Hash

Table

Methods.

ACM

Comput--

1975.

M.

Conferencev

also

data

CUPID San

-- t h e

Friendly

Query

Francisco

t April

197Sv

File

volo

flow

Fi!e S~

687

P~ocessing.

Pergamon

Structures

Processing

dlagram-llke

language

%0

the

Held.

Generalized

Programmln~

Information

G.

StonebrakerT

Pacific

grahicy

C~

T.

1975o

INGRES

McGee~

Lewis,

and

ACM

CUPID

matic

121,

analyzed

D=~

Language.

120.

Implementation

evaluated.

Maurer~W. ing

119.

of

Press~

for 1233

Annual

Generalized

-- 1 2 3 9 ,

Review

in

Auto--

t96~.

North

Data

Management.

Holland,

Amster--

103

1968.

dam,

122.

Introduces

graphs

McGee,

Co

Data

W,

Base

April The

author of

McGee~

presents

W.

C,

ACM

SIGMOD

The

paper

125.

Go

H.

and

relations

Intl.

%he

Mehl,

J.

earlier

of

information.

Data

Conf.

Equivalence,

Cargese v CorsicaT

1974.

equivalent

organizations

organizations

at

on

Network

T Proc.~ and

on

Look

papers

between

W,s

and

in

and

of

the data

a

New

proposal

network

Data.

ACM,

Data

data

Prec.

Structures.

York,

fop

1975.

a data

manl-

structures,

AFIPS

1967

FJCC

525

-

New

proposal

to

the

York,

Ao %o

C. in

P,

G°s

the

compiled

and

A

Study

IMS Data

of

information

Order

Bases,

data

1974

as

sets

Transformations ACM

independence

routines,

appllcatlon

File

to view

SIGFIDET

of

Work--

1974.

increase of

be%wren

Merten~ proach

held

York~

proposing

sets.

Wangt

ACMT

program

F r y T Jo

Po

Translation.

which and

data

A Data

1974

ACM

supported

intercept

the

by

IMS

communi-

management.

Descrlp%ion

SIGF[DET

Language

Workshop~

ACM~

ApNew

1974,

Describes

the

idea

translation

Merten~

A.

Gos

New

MeyerT

York~

B.y

and

design

behind

%he

Of M i c b i @ a n

UnivePsi±y

Severance,

through

D.

G.

Modeling.

Performance Proc.

Evaluation

1972

ACM

Natlo

File Conf,t

1972

and

Technology.

and

project,

of Organizations

the

Work°

Operations

operating

shop~

cation

128.

STudy

(CRM}

Conference

Another

stored

organizations.

requirements

Structures

ACM,

of

flle

(DBTG)

Hierarchic

data

IFIP

for

1967. of

with

127.

197S

One

A

126.

flat

Level

outlines language

MealeyT S34,

File

the

Amsterdamv

a number

language

models

%o

Proc,

Holland~

homogeneous

pualtion

124.

A Contribution

North

description

123.

conceptual

Management°

19747

class

as

Schneider~

Course

H.

Notesv

Jo

Predicate

University

of

Logic

Berlinv

and

Data

available

Base from

authors.

Reviews interface

predicate llke

logic

in Coddes

and work

Its

use

and

in

as

a

model

natural

fop

man-machine

language

question--

104

answerin~

129.

sys%ems~

Minsky~

N®

Workshop~ The He

On

!nte~act~on

ACMT

author

New

discusses

proposes

a

Vlconsls%ent

wlth

YorkT

concepts~

cons±Ductive

operators"

Data

Bases.

[974

ACM

SIGFIDET

1974. integrity approach

to

be

used

rules~

%0

as

user

integrity

prlmi%ives

views

for

etc.

deflnlng

by more

complex

opera%ions.

130.

Mul!in~ Hashed

131.

J,

K,

An

Overflow.

Mylopoulos7

J.~

Relatlonal

Improved CACM

Index

15~

301

Schus%er~

System~

-

S.~

1975

Sequential 007,

and

AFIPS

Access

Me%hod

uslng

1972,

Tslchritzis7

NCC

Prec.

D.

A

Multilevel

vol.

44~

403

fhe

prototype

-

408,

197S. The

mechanism

ZETA/TORUS system

with

language ZETA

132.

used are a

on

as

an

Nakamuma~ Base

vol.

44,

±np

of

-

base

lower

level

a

I.~

of

rel~tional %0

data

define

a

prlmi%ives.

natural

and

Performance 463,

is

capabillfy

Yoshida~

System 459

development ZETA

"intelligent"

language

Kondov

high

TORUS

system

management level

is

query

bulit

on

interface,

H.

A

Evaluation.

Slmulation 197S

AFIPS

Model NCC

for Proc,

IS7S.

of expe~Iments

DescPiptlon data

%he

defini#lon

F.~

Data

in

descrlbed.

management

simulating

sysfem

in

a

the

processes

conventional

withln

slmulatlon

a

pack--

age.

133.

Nava%he~

cation 1975 The

S® of

paper

al

Mer%enT

Relatlonal

Eo

View,

when mo~e

J.

%hat

powerful

February

The

paper

of

G.

Investigation

to

Data

into

Translation.

the

Appll-

ACM

SIGMOD

-- 1 3 8 ,

Codd~s in

the

relational

model

context

data

of

".,®

poses

ser-

tr&nslaTlen

as

a

restruc%urln@".

Mapping:

University

10,

123

used

Data

A.

~odel

Proc,~

concludes

fop

Neuhold~

and

Conf®

pmoblems

vehicle

134.

the

Intl,

ious

Bo~

A

Formal

KaPlsruheT

Hierarchical

and

Relation-

Forschungsberlch±e~

Berlcht

1973. compares

formal

notation.

tional

model

is

hlePaPchlcal In

a

and

partIculart

special

case

It of

the

relational m~Mes

clear

hierarchical

da±a

models

%ha%

%be

model.

in

rela-

105

135.

136.

Blnary

Nlever@elt

v J.

Computing

Surveys

Notleyy UK-SC

M.

G,

Search

6v

The

3~

Trees

and

File

Organization.

ACM

1974.

Peterlee

IS/I

System.

IBM

UK~

Peterleey

Report

0018.

Describes

IS/It

one

of

the

earlier

Codd

~elationnl

implementa--

tions.

137.

Olsonl

C.

cessed

Records°

A.

Random

Access

Prec.

File

of

1969

Organization

ACM

Natl.

for

Confo

Indirectly ACMt

New

AcYork~

1969.

138.

Owensl

P.

Phase

J.

Information

II

--

Processing

a

Data

71T

827

Base --

Management

832T

North

Modeling Holland~

System. Amsterdamv

1972. Phase

II

is

management

138.

Palermo~

P.

Indexes. the

of

modeling

IBM

eamlier

designed

specifically

fop

data

Approach

Research papers

Report

on

index

RJ

to

the

0730~

Selection

July

selection.

of

Sec-

Cardenas

fop

1970.

See

results.

Palermo,

F°

RJ

July

I072~ paper

P.

A Data

for

queries

Petrlckt

S.

R.

Research

REQUEST

an

one

of

in

Search

RC

the

Problem.

earlier

predicate

Semantlc

Report

is

Base

IBM

Research

Report

1972.

contains

gorithms

IBM

tool

A Quantitative

ondary

The

141.

F.

One

recent

140.

a

evaluation.

optimizing

calculus

Interpretation 4457~

July

expe~tmental~

reduction

al-

form.

in

the

REQUEST

system,

1973.

natural

language

question

answering

system°

142.

Ramlrez~ tion

of

guage.

J.

1974

Describes to

D0

P*

Reisner~ Evaluation

Rln)

N.

Ao~

Conversion ACM

an

and

Prywes~

Programs

SIGFIDET

using

Workshopv

implementation

Smlth}v

translating

143o

At~

Data

of

which

a

ACM~ data

complies

N, a

S.

Automatic

Data

New

York~

Lan-

1974.

definition

data

Genera-

Description

language

definitions

{due

Into

data

programs.

P.~

Boyee~ of

R.

two Data

For

and

Base

Ch~mberltn

Query

t

Languages

Do

P, -

Human

SQUARE

Factors and

SE--

106

QUEL~

AFIPS

1975

NCC

A psychological

analyzed.

in

144.

Data of

data

the

performance

Models

models

64

-

447

show

the

of

Data

ternational

are

452,

1975.

subjects

is

a

but

slight

language,

which

Implemented of

sequences

described

and

statistically primarily

differ

at

Rothnie,

program the

J.

low

8.¢

a Paged

for

and

of

to

be

for

used

implementations.

A

ACM

Framework European

measurement of

levels

end,

and

a

data

for

Evalu-

Chapters

In-

evaluation

of

base

commands

involve

hgih

Lozano,

To

Environment. "multiple

for

allowing

the

levels the

~97S.

197S.

different

at

objective

D.

of

Representation.

May

and

disk

system

issued

address

in

is the

reference

end.

Memory

A combina%ion nlque

at

the

Prec.

Symposium

dlfferent

1554t

different

Hlldebrand~

Systems,

events

Storage

no.

wlth

of

framework

The

application traces

and

J.,

Base

Secondary Repo~t

designed

Computing

presented.

for MRC

evaluation

Rodriguez-Rosell,

in

on

Wisconsin~

The

ation

146.

with

nonprogrammers

dependency

A~

Rei%er,

An

44,

syntax~

University

145.

vol.

experiment

Only

significant

Pros°

a

Attribute

CACM

key

reductlon

Based 63

17,

hashing" of

the

File

-- 6 9 ,

and

Organization

1974.

inverted

number

Of

page

file

tech--

faults

for

multi--key--retrieval.

147.

Rothnie~

Jo

Relational vel*

148.

44,

employed

with

every

Sayanty

~.

burg.

and

To

attempts

Restart

U.

recovery emphasts

Ein

Messdaten.

Verlag,

Expressions

1975

utilize

the

AFIPS

in

NCC

a

Prec.

1975.

Processing

puts

Schauer, chef

423~

Retrieval

System.

to

for

the

and

Recovery

System.

purpose

1974

of

gained

optimization.

in a ACM

Information

Transaction

SIGFIDET

Oriented

Workshopv

ACM~

discussed.

]he

1974.

York,

ReStart

-

Inter--Entry

Management

tuple-access

Information

149.

Base

strategy

H.

author

Evaluating

417

The

New

B.

Data

appear

policies on

System IBM

as

~eidelberg.

are

defined

and

performance.

zur

Germany, Lecture

Interaktiven Informatlk Notes

in

Bearbeltung Symposium

Computer

umfan~rel--

1979t

Science,

Bad

~om--

Springer

107

Introduces

an

Interactive storage~ "query

150.

a by

brary

or

SchkolnlckT

See

151.

graphics

M.

Conf.

also

H.

tional

A.s

Data

Datay

San authors

ism

of

The

similar

ACM

See

to

data

language

{llke

an

also

open

ended

ll-

/13/.

ACM

Optimization, Jose~

comblnln~

relational

SlGMOD

1975

1975.

research.

J.

Re

SIGMOD

On

the

1975

Semantics

Intlo

Conf.

of

the

Rela-

on

M~mt.

of

1975.

are

Codd~s

access

Data t San

Swenson~

Model.

Jose~

The

world.

for

and

Index

of

Mgmto

system

a

manipulation

wlth

subroutines.

Secondary

base

{APL)~

data

Zloof)

FORTRAN

on

data

facilities

see

Cardenas

Schmld~

measurement

oriented

example"~

PL/I

of

Intern.

interactive

computational

concerned

relational authors

wlth

the

model

and

the

a

kind

of

employ

gap

between modelled graph

the

pure

part

of

model

formalthe

to

real

fill

the

gap,

152.

Schmutzt

H.

Germanyv

74.10.004t A

Oct.

special

schema

to

153.

of

context--free

hierarchical

mapping

Go

Language

31~

1975. authors

Senkor

as

M.

M°~

and

ARPA

E.)

a

Holland~

FOREM

is

evaluation

an

Senkot Data Journo

Mt

data

E.~

Structures 12~

30

a

is Pair

for

base

IBM Peport

describe

grammars

are

internal a

or

the

used

to

external

theoretical

J.

Creation

Information

for

to

treat-

systems°

E.

Networks.

used

and

model

data

language

V.

North

evaluate

model,

Deasautelst

Yo)

(FOREM)o

1968.

to

Relations.

Technical

data

of

Systems

translation

in

a

File

I,

a

25

-

network

network.

Lum)

Model

is In

for

propose

the

and

Centerv

conceptual

system problems

Schneldery

The

Languages

grammars

data

between

described

important

Evaluation

155.

of

Translation

such

154.

a

the The

ment

Regular Scientific

1974.

form

describe view.

Parenthesis Heidelberg

and

Amsterdam~ and

and -- 9 3 ~

E.

B0~

Accessing 1973,

P.

J.

A File

Processing

Organization 687

514

-

519~

1969.

simulation

management

Altman~

Owens9

Information

tool

specifically

designed

systems.

AstrahanT in

Data

M. Base

Mo~

and

Systems.

Fehder~

P,

L.

IBM S y s t e m s

108

This

paper

descrlbes

tem T one

of

research

156.

157.

159.

Senko~

M.

tities

~nd

Senko~

M.

E.

and

ideas

behind

app?oaches

comprehensive

E.

Data

Report

RC

An

Senko~

Me

~®

Report

RC

5263v

Senko~

M®

Eo

%he

DIAM

sys-

%0

data

base

3

I~

Description Oct.

-

Pela%ions~

--

13,

Setsv

En-

1975.

in

the

DIAM

II

wlth

FERAL

for

Lan@u~ge

Description

5073~

RecordsT

Systems

Context

of

FORAL.

a

Mul--

IBM

Re-

1973.

Introduction

%0

Users,

IBM

Pesearcb

1975.

Speclfiea%len

Results on

Sys±ems:

Inform.

Structured

Output

thoughts

Information

Things.

search

ence

%he

e~rller

systems,

tilevel

158.

%he

In

Very

DIAM

Large

of

II

Stored

wlth

Data

Data

FORALo

Basesw

Structures

Proc.

of

Bos%ont

the

1975~

and

Desired

In%.

Confer-

available

from

ACM. The

last

which

is

three

references

based

on

introduce

binary

DIAM

I[~

a

and

has

FERAL

assocla%ions

proposed as

sys%em~ Its

query

language.

160.

Severance~

161.

A

D.

scheme

164.

G.

is

A

[~

descrlbed~ a

set

Shneiderman~

B®

362

-- 3 6 5 ,

Optimum

Shnelderman~

B.~ -

566

~nd

577T

paper

describes

cem%aln

classes

of

B.

Model

The

3,

p~per

93 is

-

and

Gen--

Alternative

File

StPuc-

1975o a

special

Base

Scheuermann~

Of

IJCIS

Survey

"two

dimensional

space

including

well-knewn

of

case,

ReoPganization

Points.

CACM

A

103,

P°

S%ructuPed

Data

STructures.

1974.

The

Shneiderman~

of

A 1974.

1973.

CACM

17~

3t

organizations

as

Data

55, maps

data

organlzatlons

6~

Model --

51

which of

Mechanism:

Surveys

Parametric

Systems

~v t o

Search

Computing

conven±Ional

16,

163.

ACM

Inform.

parameters

162.

Iden%[fler

Model.

Severance~ tures.

O.

D.

erallzed

an

approach

data

for

to

deal

wl%h

integrity

in

case

sTruc%ures.

Optimizing

Indexed

File

Structures0

1974.

concerned

wlth

the

selection

of

index

size

at

dlf--

109

ferent

165.

Shut

N.

C.~

SlbleyT

paper

Ho,

E.

CACM

discusses

paper

Eo

two

"data

-

750

V.

for

On

the

structured" wlth

et

al.

R.

W.

759,

goals

Y.

CONVERT

Data

of

A

a

High

Conversion.

Level

CACM

18,

and

ACMv

Data

Definition

and

Mappin~

1973. a

data

definition

mapping

Equivalence

Workshop,

independence

A

Taylor~

philosophical

Sibley,

Lum,

Language

deflnitlon

H.

translation

the

16,

data

ACM S I G F I D E T

168.

and

The

The

and

to H o u s e l

Language°

Sibley~

C°,

1975.

lustrates

167.

B.

Deflnltion

-- S 6 7 ,

A companion

166.

performance.

improve

Housel~

Translation 5S7

%0

levels

New

of York~

or

"procedural" connection

and

il-

examples.

Data

Based

1~74

Systems.

1974o

directions~

its

by

language

"relational" (DBTG)

are

to d a t a

(Codd)

and

the

compared.

Also

data

restructuring

and

dafa

Dictionaries

for

Is d i s c u s s e d .

E.

H.y

and

Information

discussion

Sayanl,

Systems

of

the

H®

H.

Data

Interface.

need

for

and

Element

NBS-Report objectives

v 1974o of

a Data

Dictionary

capability.

169.

dissertation,

One

of

the

to

Data

Description

of

Pennsylvaniav

University

earller

data

definition

and

and

mapping

Conversion.

1971. languages°

See

Ramirez.

Smithy cal

P.

Approach

Do

D.

also

170 •

An

Smithy PHo

S.

Data

E.,

and

Base

Mommens,

Structures.

J. ACM

H.

Automatic

SIGMOD

Generation

1975

intl.

Conf.

of

Physi-

San

Jose~

from

des-

1975. A

criptive into

171.

172.

design a i d

prototype

input

account

S%ahl~

Fo

A.

AFIPS

NCC

Steel~

To

SIGMOD

1975

IMS

and

A Homophonic vol.

Data Intl.

described

physical

constralnts

Prec.

B~

is

42,

Base Conf.

data

for

- 568v

Standardization on

Mgmt.

generates

structure

objective

Cipher 565

which

of

def[nitlons

taklns

functions.

Computational

Cryptogvaby.

1973.

--

Datay

A

San

Status

Jose~

Report, 197S.

ACM

110

173.

Steuertt tem:

J. ~ a n d

Goldman~

A Perspective.

J®

!974

The

ACM

Relational

Data

SIGFIDET

Workshop,

RDMSv

system

Management ACM,

Sys-

New

York,

1874. An

in±roduc%ory

and

174.

175.

based

on

deecrlptlon

Codd's

Stonebraker~

M.

The

Indices.

IJCIS

See

Cardenas

else

3,

of

-- 1 8 8 ,

for

~esearch

Stonebraker,

M.

SIGFIDET

Workshop

The

paper

first

Partial

on

ACM,

the

unfortunately of

used

and

Inversions

%hls

View

Proe.~

analyzes

being

at

MIT

Combined

1974.

A Functional

which

It d e s c P l b e s

Choice

a

model.

167

ACM

approach,

of

relational

%ople.

of New

Data

problem ks

%he

types

~®

Implementation

not

data

Independence.

YorkT with

kept

1874

1974. a

promising

through

independence

up

%o

to be

form~l The

end.

provided

in

INGRES.

176.

Stonebraker7 Views San

177.

by

Jose,

The

also

et

Su~

Held

S.

1974

Y.

The

Data

tation~

which

Taylo~

W,

Constraints

1975

is

R,

have

it T which

their

Sharing

in

ACM~

York,

a

in

New

of

Intl.

Conf.

and Prec.,

in

more

detail.

Data

Base

Translation

a Ne±work

See

the

a

corresponding

deals

%o

the

of

IFIP

Work.

a conceptual datalogical data

Storage.

Arbor,

Conf.

1974. data

model

approach

forms.

hlanagemen%

Physical

Approach

Amsterdam,

internal

Base

Ann

data

of

with

Data

Infological

~olland~

a kind

Environment.

[974,

Proco

North

MichiganT

for

used

Semiautomatic

is

)(appln G

of

proposal being

See

inte~rity

Management

Generalized

and

a

to

1974.

approach

University

Contains

Integrity

SIGMOD

Foundation

Base

April

It m a y

R.

STructures

180.

Data

with

A

Data

Conceptual

Base.

associated

H.

Workshop,

[nfolo~ical

ments.

Lam,

Corsica,

Taylor~

approach

Achieving

philosophy.

t79.

and

B.

Cargese,

of

ACM

al.

SIGFIDET

Sundgren, to

INGRES

W.T

for

ACM

qodificaTlon.

19U5.

Describes

System

178.

Query

System Ph.

D.

Data

diseer--

1971,

definition

and

~[ichigan data

mapping

translation

languase, experi-

MeDten/Fry.

W®

Data

Administration

and

±he

DBTG

Report.

1974

ACM

111

SIGFIDET Among

others~

taln

181.

Workshop

data

Taylors Base

the

Ro

Cargese~

Wo7

TeichroewT Proc.

D,

of

ACMy

essential

of

about

slstance

183.

J°

Thomas~ by

given ple

185o

F. A

So

186.

Tslchritzls~

of

Turn~

R.I

Research

Development of

IFIP

of Work.

Amsterdamv a

data

in

on

paper:

the

J.

PJ

File

Data Conf,

1974°

base

at

a

user

Organtzatlon.

Informations

there

is

as

data

Storage

no a

and

have

absolutely

Proco

vol°

44~

to he

to be

439

with

of

made

of

as-

Query

197S.

subJects~ into

know-

wlth

Study

- 44ST

3S

translated

best

function

A Psychological

an experiment

of how

Interactions.

Van

der

417~

who

query

by

were exam-

Pool~

of

the

Toronto7 CoddWs

experimental

deverill~

R.

1969

Nail,

ACM

No

IBM

system

Framework

(i.eQ

o~

A.

Overview.

Technical

AFIPS

B° v and System,

UKSC

Peterleev

1975o

relational

Shapiroy

Dos±erty

Language

Technical

007S~

A Network

J.

Coy

1969.

A

UKSC

- Measures

der

-

networks

and

P°

Extensihle

PRTV:

P°

physical

Systems

188~

P°

description

University

187.

399

Report

Discusses

to

o~

B° v Lockemann~

J.

Technical A new

of

Changes

English

Rapidly

Proc*T

Todd~

In

NCC

of

oh--

to

Zloo~)o

REL:

Conf.

use

AFIPS

the

Proco

programsQ

this

Gould~

results

questions

Thompsont S.

and

On

on

Information.

computer.

preprocessor

a

1971~

in

future

use

Holland~

Symposium

Yorkv

the

197S

the

(see

SIGIR

the

C°t

Example°

Reports

184.

of

North evolution

Approach

New

W°

the

Impact

message

representation

D°

1974.

time.

1974.

is

1971

to

Managementv

April

An

the

Retrieval~

ledge

Base

Its

Yorkl

precomptle

Stemple~

~ concern and

New

proposes at

and

Corslca~

authors

The

author

Data

installation

182.

ACM)

Independence

Editions.

The

Proc.t

Eeport model

linked

Z.

for

Optimum

Relation

Implementation.

CSRG-49~

February

can

be

1975.

Implemented

on

top

structures)o

Privacy

Ef~ectlvenessy 1972

IS/1.

FJCC~

Storage

and

Security

Costs

and

vol.

41y

435

Allocation

in

Data

Bank

Protection--Intru-

444.

for

a

File

in

112

Steady

State.

Files

with

overflow

189.

Vose~

M.

Wang~

R.,

C.

Data

and

a

set

of

specify

minimal

without of

set

are

to

the

19,

cover

to

-- 7 7 ,

Inverted

Index

set

of

cover

which

relations

is

again

Given

third

Logical

calculates with

a

the

normal

in

1975.

algorithm,

given

CoddVs

rate

state.

Synthesis

71

dependencies.

in

with

steady

Approach

Segment

Dev®

a

and

overflow

!972.

minimal

tPansltive

melatlons

1973. {hashing)

for

An

H.B,

minimal

Each

S,

May

Res.

covers

38v

utilization,

given

J.

16,

J°

a

dependencies.

floss

Storage

Wedekind~ IBM

-

27

transformations

analyzed.

Bull.

and

17,

Dlv.

Richardson~

Design.

authors

tive

Res.

factors

Comp.

P.,

Base

The

are

relevant

Maintenance.

190 •

J.

key--to--address

areas

other

and

IBM

transl-

set

of

minimum

form

can

velacovert

easily

be

constructed.

H.

191.

Wedeklnd,

1£2.

Wedekind~

B.

Mannhelm,

1974.

193 •

Wedekind~ System. esev

W.

paperlS

tion

of

Wellis~

the

Base April

efficient

M.

E.~

Katke,

1117-

SIMS

is

interesting

data

normal

form

and

tion

been

paid

%o

Based

File

Organizations.

Each

query

queries. to

the

is In

Olsont

assumed

elementary

a

IFIP

in

Work.

Data

a

Base

Conf.,

Amsterdam,

analysis

J,t

number

and

Carg-

IB74.

for

Yang,

the

S.

System.

in

case

of and

mapped

determina-

C,

SIMS

AFIPS

to

the

reasons.

T.

a

FJCC

be

the

queries

a

14,

of

593

boolean

data and

high

language.

Canonical

CACM to

offers

-

an

1972,

ba@e access

and

597,

be

blgh Data hier-

atten-

programs.

In

Attribute

1971.

expression can

level

PartlculaP

data

Structure -

a

language.

conceptual

query

C.

I%

manipulation

transferability

Chiang~

this

and

Information

be

used

and

of

Holland~

mapping

go,

Paths

Access

Instltut

1872, for

may

1972.

paths.

definition,

archical

Wong,

North

W®,

1131,

files

has

of

Berlin,

Bibliographlsches

Proc.

modeling

User-Oriented

41~

on

1974.

access

vol.

level

Selection

is

Gruytem~

I.

Management,

concern

Integrateds

195.

On

Data

de

Datenbanksys%eme

Corsicav

The

194.

Datenorganlsatlon.

over

elementary

organized

according

becomes

essentially

the

113

problem

of

pu±%ing

a

boolean

expression

In%o

some

s%~ndard

~orm,

196.

Yao~

S.

±hrou~h

B.

Michlgan~

197.

Y u e 7 P. ondary also For

198.

The

basic

user

Wongt

C.

Selec%lon,

recen%

M.

-

frame

Op%Imlz~%ion Pho

D°

of F i l e

dlsser%a%ion~

Organization of

Universlty

K.

S%orage

IBM

Cos%

Research

Consldera%ions

Repot%

RC

5070~

in %o

Sec-

appear

IJCIS,

431

%hat

and

Index

other

M.

437,

userWs

and

Modeling.

1974.

C.~

in

Zloof,

Evalua%ion

Analy%ie

resul%s

Query

in

this

By

Example.

of

query

area

of

rese&~ch

197S

AFIPS

NCC

see

Cardenas,

Proc.

vol.

44,

1975.

features

pe~cep%ion

of

of

manipula%ing

of

reference

fills

da%a

example

processing

%ables

consis%ing

informa%ion.

by

in of

in

are %his

illustrated. query

a graphically table

skele%onsv

language

The is

pre--estebllshed in%o

which

%he

Grundlegendes

zur Speicherhierarchie

Claus Sch~nemann~

1.

IBM B6blingen

EINLEITUNG

Das Thema dieses Beitrags ist die konkrete Daten-Speicherung und -Adressierung unter Zugrundelegung eines hierarchischen Aufbaus des Speichersystems. Soweit Datenbankaspekte dabei berahrt werden~ sind sie aus der Sicht der Hardware-Implementierung

und vorwiegend unter Leistungsgesichtspunkten

gesehen. Heutige Computer-Speichersysteme

sind bereits weitgehend hierarchisch

strukturiert. Dabei soll unterschieden werden zwischen einer lediglich dutch Kapazit~tsabstufung gekennzeichneten und einer strengen Hierarchie, bei der auf jeder Stufe wahlfreier Zugriff m~glich ist und der Datenflug keine Stufe ~berspringt. Die Kombination Hauptspeicher - Pufferspeicher stellt eine strenge Hierarchie dar, bei der der Hierarchiebegriff fiberhaupt erst ins Bewugtsein ger@ckt wurde

[11. Der Pufferspeicher

(Cache) ist far die Maschinenar-

chitektur transparent und pagt die Geschwindigkeit des Hauptspeichers an die noch h~here des ~rozessors an. Ebenso ist die Folge Hauptspeicher Magnetplattenspeicher

als strenge Hierarchie anzusprechen, auch wenn

diese Betrachtungsseite

(mit Ausnahme von Programm-Paging im Rahmen des

virtuellen Speichers) bislang nicht im Vordergrund stand und der Plattenspeicher mehr als Ein/Ausgabeger~t aufgefagt und so yon der Maschinenarchitektur behandelt wurde. Der Magnetbandspeicher

ist wegen seiner langen Zugriffszeit

(incl. Band-

laden) nicht mehr im strengen Sinne zur Hierarchie zu rechnen.

115

Ans~tze,

die gro~e und billige Bandspeicherkapazit~t als echte oberste

Datenflu~-Hierarchiestufe

zu integrieren,

sind mit der j~ngeren Entwick-

lung yon automatischen Bandtransportsystemen, Kassettenspeicher,

wie z.B. beim IBM 3850-

sichtbar geworden. Dabei k6nnte beispielsweise dem

Bandspeicher die Funktion eines Archivs und dem Plattenspeicher die Funktion eines Arbeitsspeichers groSer Kapazit~t zugeordnet werden, wobei der Inhalt ganzer virtueller Plattenstapel automatisch auf Verlangen auf das Plattensystem @bertragen wird [2]. In Abbildung ] i s t

das Schema

dieses Hierarchiekonzepts skizziert. Der schwache Punkt der gegenw~rtigen Speicherhierarchie ist das Verh~Itnis der Zugriffszeiten des Hauptspeichers

zum Plattenspeicher yon mehr

als 1:1OOOO, die sog. Zugriffsl~cke. Auch ein Dazwischenschalten von Trommelspeichern bzw. Plattenspeichern mit festem Lesekopf ~ndert die Situation nicht wesentlich. Man versucht daher bekanntlich, h~Itnis durch Programmumschaltung

das Mi~ver-

im Rahmen yon Multiprogrammierung

zu

fiberbr~cken. Mit fortschreitender Prozessor- und Hauptspeichergeschwindigkeit, aber gleichbleibender Zugriffszeit der mechanisch arbeitenden Massenspeicher,

muB der Multiprogrammierungsgrad,

die Hauptspeichergr~$e

und die Zahl der Plattenspindeln immer gr6Ber werden. Damit entfernt man sich vom Kostenoptimum, au~erdem steigen die Anforderungen an das steuernde Betriebssystem und seine Komplexit~t,bei abnehmender Effizienz. Im Folgenden wird versucht,

f~r das gesamte Hierarchiespektrum die Spei-

cherparameter nach einheitlichen Gesichtspunkten zu klassifizieren und anhand solcher Parameter die Leistungsf~higkeit der Hierarchie zu diskutieren, mit besonderer Blickrichtung auf das Problem der Zugriffsl~cke. Die Anforderungen des Datenbankbetriebes werden kurz angesprochen.

2.

TECHNOLOGIE- UND OPERATIONSPARAMETER

Es sind zahlreiche Technologien bekannt, die unter Ausnutzung verschiedenster physikalischer Effekte zu sehr unterschiedlichen Speichereigenschaften f@hren. Am verbreitetsten ist heute die Halbleitertechnologie f~r die schnellen elektronischen Matrix-Speicher mit wahlweisem Zugriff und die Magnetschichttechnologie

f~r die langsameren und billigen Massen-

speicher, haupts~chlich in den Ausf~hrungen Platten- und Bandspeicher. Bine weitere Gruppe, die aber noch nicht das Stadium breiter Produktreife erreicht hat, ist die der optischen und mit Elektronenstrahl

operierenden

116

Speicher [3r4]. Auch die diversen Schieberegistertechnologien wie CCD (Charge Coupled Device)

[5,6] oder Magnetblasen (Bubbles)

[7] machen

vorerst nur tastende Schritte im kommerziellen Einsatz. Die spezifischen Arbeitsweisen der einzelnen Speicherfamilien sollen hier nicht diskutiert werdenr vielmehr wird das gesamte Speicherspektrum einheitlich durch einen Satz von invarianten technologischen und operativen Parametern beschriebenr Tabelle I. Die beiden wichtigen Operationsparameter, mittlere Zugriffszeit und Bitkostenr stehen in einer gewissen reziproken Relation zueinander. Sie bestimmen den Standort einer Technologie innerhalb des Gesamtspektrums. Im Diagramm Abb. 2 sind heutige typische Werte in Abh~ngigkeit des gewichtigsten Technologieparameters, Bitzahl pro Schreib/Lesestation, dargestellt

[8].

Die Zugriffszeit setzt sich zusammen aus der Zugriffszeit im engeren Sinner einer Art Totzeit vor der 0bertragung des ersten Bit, und der Daten~bertragungszeit. Die 0bertragungszeit ist abh~ngig yon der Datenrater gegeben durch Taktfrequenz und interne Bitbreite, und der gew~hlten ~bertragenen Blockl~nge. Zus~tzliche Verz6gerungen durch den externen 0bertragungskanal sind in der Obertragungszeit mitenthalten. Unter Modularit~t ist die Unterteilbarkeit eines Speichers bzw. einer Hierarchiestufe in Module mit eigenem parallelen Zugriff verstanden. Dadurch wird die Zugriffsrate erh~ht. Die F~higkeit zur modularen Aufteilung nimmt im allgemeinen ab mit dem Technologieparameter "Bitzahl pro Schreib/Lesestation'. Bei mechanischer Entkopplung zwischen Lesen/ Schreiben und dem Datentransport kann die Zugriffsrate dutch Oberlappung welter erh6ht werden. So wird beim Bandkassettenspeicher IBM 3850 die n~chste Kassette schon transportiert, w~hrend die vorhergehende sich noch in der Lese/Schreibstation befindet. Weitere Beispiele fur asynchronen Parallelbetrieb sind die Konfiguration mehrerer Plattenspeicher in einer DV-Anlage wie auch die Unterteilung des Hauptspeichers in unabh~ngig und parallel arbeitende Module. Auch die Bitkosten bestimmen sich in erster Linie aus der Bitzahl pro Lese/Schreibstation. Sie sind auger yon den spezifisch technologischkonstruktiven Faktoren vom allgemeinen Miniaturisierungsstand der Technik abh~ngig. Abb. 3 zeigt beispielsweise die historische Entwicklung der Bitdichte beim Magnetplattenspeicher. Entsprechend sind die Zahlenangaben

117

in Abb. 2 nur zeitbezogen zu verstehen.

Die relativen Zuordnungen dOrf-

ten hingegen weitgehend invariant zum allgemeinen Stand der Technik sein, da fortschreitende Miniaturisierung allen Technologien zugute kommt. Die Speicherkapazit~t pro Hierarchiestufe ergibt sich in einer ausgewogenen Konfiguration nach einer Art reziproker Funktion der jeweiligen Bitkosten Ein weiterer operativer Parameter ist die Zuverl~ssigkeit des Speichers, d.h. die mittlere Zahl yon gelesenen Bits pro fehlerhaftem Bit. Dieses Merkmal ist eine Funktion der natOrlichen Fehlerfreiheit des Mediums, des Sortierungsgrades nach guten Einheiten und des Aufwands an gezielter Redundanz mit nachfolgender Fehlerkorrektur. Die Fehlerdichte des Mediums nimmt n a t u r g e m ~

mit der Homogenit~t ab. Typische Zuverl~ssigkeitswerte

sind (nach entsprechendem Sortierprozess) z.B. beim fabrikneuen Plattenspeicher 10 9 und 1012 nach erfolgter Korrektur. Die physikalische Natur der Speicherung bestimmt den Grad der Fl~chtigkeit der eingeschriebenen Information. Bei einem Arbeitsspeicher kann man eine gewisse Fl@chtigkeit mit periodischem Wiederauffrischen zulassen, bei einem Archiv- oder Journalspeicher mud nat~rlich ein dauerhaftes Speichern gefordert werden. In gewisser Verwandtschaft

zur FiOchtigkeit steht die Eigenschaft des

ON-line oder OFF-line Einschreibens, ROM verstanden.

letzteres auch allgemein unter

Bei verschiedenen Anwendungen,

kumenten mit geringer ~nderungsfrequenz,

z.B. Speicherung yon Do-

kann der ROM-Speicher durchaus

sinnvoll und, da entsprechend billig, von Interesse sein. Ein Obergang zwischen dem normalen schreibbaren Speicher und dem ROM stellt der PROM bzw. EAROM (Programmable bzw. Electrically Alterable Read Only Memory) dar. Der ROM-Speicher wird bier nicht weiter behandelt. Der letzte Operationsparameter

ist die adressierbare Einheit, die im

Verein mit der eigentlichen Zugriffszeit die Komplexit~t der Zugriffsmethode und Effizienz des Datensuchens bestimmt. Man unterscheidet zwischen Orts- und Inhaltsadressierung. sierung ist auf Hauptspeicherebene

Die Ortsadres-

die dominierende Adressierungsart:

Die physische Lokation jedes Datenelementes ist vom Programm definiert und wird Ober die Adresse direkt gefunden. Dieses Konzept ist auf den h6heren Speicherebenen f~r das Aufsuchen yon Datens~tzen nicht mehr zweckm~6ig, wenn die S~tze z.B. in Form einer Datenbank organisiert,

118

programmunabh~ngig und vielen Benutzern verf~gbar sein sollen. Sie m~ssen also letztlich durch ihren Inha!t, gegeben durch ein oder mehrere Merkmale, gekennzeichnet sein. Innerhalb eines Satzes sind die Daten im allgemeinen wieder formatiert, d.h. ihre semantische Bedeutung ist durch ihren relativen Ort bestimmt. Die heutige Suchtechnik bei inhaltsadressierten Datens~tzen bedient sich Indextabellen,

in denen z~B. die Hauptmerkmale numerisch oder alphabe-

tisch geordnet und die reale Speicheradresse direkt zugeordnet ist. Beim Vorliegen weiterer

(Neben-) Merkmale k6nnen diese in eigenen Ta-

bellen gelistet werden, wobei die Speicheradressen aller S~tze, die dieses Merkmal enthalten, wieder zugeordnet werden. Mit diesen invertierten Listen kann bekanntlich der Prozess des Suchens nach mehrfachen Merkmalen schnell, d.h. ohne alle S~tze sequentiell prozessieren zu m~ssen, durchgef~hrt werden. Mit Hilfe der Indextabellen wird also die Inhaltsadresse eines Datensatzes

in eine Ortsadresse umgewandelt.

Letz-

tere wird dann beim Speichern mit wahlfreiem Zugriff schnell und direkt angesteuert. Das Durchsuchen der Indextabellen nach dem gew@nschten Merkmal stellt in sich nun wiederum einen Proze~ mit sequentieller Schrittfolge dar. Ein weiteres Parallelisieren w~re das Abspeichern der Indextabellen in Assoziativspeichern,

mit folgenden Vorteilen:

Fortfall der numerischen oder alphabetischen Merkmalsordnung. Dadurch einfache Aufarbeitung durch direktes Zuf~gen/Entfernen neuer Indizes. Fortfall der invertierten Listen, da gleichzeitig auf mehrfache Merkmale assoziiert werden kanno Direktes gleichzeitiges statt sequentielles Suchen. Die Eigenart des Assoziativspeichers,

eine Formatierung der Daten zu

verlangen, w~re in diesem Fall kein Nachteil. Ein Sonderfall der Ortsadressierung

ist die Adressierung mit Zeigern.

Dabei wird auch eine Entkopplung yon Benutzerprogramm und Datenadresse erreicht. Nachteilig ist das sequentielle Durchlaufen der Zeigerkette. Die einzelnen Speichertechnologien unterscheiden sich nun hinsichtlich der GrS~e der h a r d w a r e - m ~ i g

adressierbaren Einheit. Diese ist z.B. ein

119

Byte beim (Halbleiter-) Matrixspeicher,

ca. 10-20 KBytes beim Platten-

speicher und Millionen yon Bytes beim konventionellen Bandspeicher. Wenn diese adressierbare Einheit nun gleich oder kleiner als die gewfinschte zu fibertragene Blockl~nge ist, soll von wahlfreiem Zugriff gesprochen werden. Der Plattenspeicher hat nur einen semi-wahlfreien Zugriff, da seine Adressiereinheit

(die Spur) um ein Vielfaches grS~er als eine bequeme

logische Satzl~nge bzw. eine ffir diese Hierarchiestufe optimale Blockl~nge ist. Der konkrete Block mu~ dann wieder sequentiell auf der Spur gesucht werden. Die sogenannten Zugriffsmethoden,

also die praktischen Prozeduren zum

Aufsuchen von Datens~tzen spiegeln die jeweils zugrundeliegenden technologischen Adressierparameter wider. Ein Beispiel ist die index-sequentielle Zugriffsmethode ffir "direkten wahlfreien" Zugriff zum Plattenspeicher:

Dabei sind die Hauptmerkmale

der Datens~tze in einer Indextabelle nach aufsteigender Ordnungszahl geordnet. Die Tabelle ordnet jeweils einer Gruppe von S~tzen die zugeh~rende Spuradresse auf der Platte zu° Auch die S~tze selbst sind nach der gleichen Ordnungszahl geordnet, um im Falle sequentiellen Zugriffs die gro~e Zugriffszeit ffir jeden individuellen Satz zu eliminieren. Beim Rotieren der Platte werden die ausgelesenen Satzmerkmale mit dem Suchmerkmal verglichen, his 0bereinstimmung herrscht. Beim Aufarbeiten,

z.B.

Zuffigen eines weiteren Satzes in die m6glicherweise physisch lfickenlose Satzfolge, weist ein Zeiger zu einer neuen Spuradresse auf einer 0berlaufspur. Die Methode kombiniert also die Suchelemente Indextabelle, sequentielles Suchen und Zeigertechnik zu einer den spezifischen Plattenspeicherbedingungen angepa~ten Prozedur, Abb. 4a. Bei einem anderen Speicher mit auch homogenem Medium, dem Elektronenstrahl-Speicher,

ist die Adressiereinheit

frei w~hlbar zwischen einem

und Zehntausenden yon Bytes. Das Zugriffsverfahren kann rein indexorientiert und entsprechend einfach gehalten werden: Das sequentielle Suchen entf~llt. Ein 0berlaufproblem existiert nicht. Dank der kurzen eigentlichen

(elektronischen)

Zugriffszeit kann auf eine sequentielle

Satzordnung verzichtet und der Satz an beliebiger Stelle gespeichert werden, Abb. 4b. Die gr6~ere Adressiereinheit,

d.h. die geringere "Wahlfreiheit", bei

!20

den kosteng~nstigen Technologien ist an sich kein prinzipieller Nachteil, da innerhalb einer Hierarchie ohnehin mit Block@bertragung gearbeitet wird. Ein gradueller Nachteil ist nur dann festzustellen, wenn wie beim Plattenspeicher optimale Blockl~nge und technologische Adressiereinheit nicht ~bereinstimmen.

Diese Diskrepanz schl~gt sich dann in aufwendigen

und zeitraubend ab!aufenden "Zugriffsmethoden" nieder.

3.

SPE ICHERHIERARCHIE

Aufgabe eines Speichersystems

ist neben der Speicherung,

dem Prozessor

die ben6tigten Daten in gen~gend kurzer Zeit und in der angeforderten Menge pro Zeiteinheit zur Verf@gung zu stellen. Analog zu den SystemLeistungsparametern Antwortzeit und Durchsatz l ~ t

sich die Speicher-

leistung durch die Parameter Zugriffszeit und Zugriffsrate definieren. Wenn ein Speicher nur einen Zugriff gleichzeitig gestattet,

kann die

Zugriffsrate etwa gleich dem reziproken Wert der Zugriffszeit gesetzt werden. Bei gleichzeitig mehreren Zugriffen,

d.h. Modularit~t gr6~er

als I, erh~ht sich die maximale Zugriffsrate entsprechend. Wie weir die maximale Zugriffsrate ausgenutzt werden kann, h~ngt yon Parametern wie Systemsteuerung,

Programmprofil, Multiprogrammierungsgrad

und Zahl der

Parallelprozessoren etc. ab. In einer Hierarchie

ist eine gewisse Grundmodularit~t der einzelnen

Stufen schon im Interesse eines gleichzeitigen Datenverkehrs nach oben und unten w~nschenswert.

Dies wird steuerungsm~6ig z.B. auf Hauptspeicher-

ebene durch das unabh~ngige Operieren yon Prozessor und Kan~len erreicht. F~r effektive Multiprogrammierung tenspeicherstufe

ist ausreichende Nodularit~t der Plat-

zwingend Voraussetzung.

Zweck der Multiprogrammierung

ist es, die resultierende Zugriffsrate - gemessen an der Schnittstelle zum Prozessor - und damit den Systemdurchsatz

zu erh6hen.

Bekanntlich liegt dessenungeachtet der Engpa~ f@r den Durchsatz heutiger DV-Systeme immer noch bei der Zugriffszeit und Zugriffsrate des Plattenspeichers. Da weitere Geschwindigkeitsfortschritte Halbleiterspeicher

f@r Prozessor und

in Zukunft durchaus erwartet werden d~rfen, die Plat-

tenspeicher-Zugriffszeit

abet kaum noch verbesserungsf~hig ist, wird

dieses Problem immer dr~ngender: Multiprogrammiergrades,

Eine L~sung Qber weitere Erh6hung des

d,h. der Zahl der gleichzeitig operierenden

Programme, mit entsprechender Erh6hung von H a u p t s p e i c h e r g r ~ e tenspeichermodularit~t

und Plat-

erscheint aus Kosten- und Komplexit~tsgrfinden

121

unpraktikabel. Au~erdem leidet bei zu hohem Multiprogrammierungsgrad die Effizienz: Die Systemverwaltung nimmt relativ zur Wirkarbeit zu, die Chance, mit einer Plattenarmposition mehrfache Zugriffe abzudecken, nimmt ab usw. Eine andere L6sung dieses Problems ist der weitere Ausbau des Speicherhierarchiekonzeptes,

bei beschr~nktem Multiprogrammierungsgrad.

(nicht realisierbare)

Der

ideale Speicher, d.h. der Speicher mit der Zu-

griffszeit des Pufferspeichers und den Kosten des Bandspeichers, l ~ t sich durch eine ausgewogene Hierarchie mit gen@gend feiner Stufung ann~hern. Gl~cklicherweise verspricht die technologische Entwicklung Speicherprodukte, die leistungs- und k o s t e n m ~ i g

gerade das Gebiet der "L~cke" aus-

f~llen und sich so gut in das Spektrum einf~gen. M~gliche Technologien f~r die "L@cke" sind z.B. der CCD-Schieberegisterspeicher,

der Schiebe-

registerspeicher mit verschiebbaren magnetischen Blasen (Bubbles) sowie die Elektronenstrahlspeicherr~hre,

Abb. 5. Diese Technologien sollen im

Folgenden elektronische Massenspeicher genannt werden.

3.1

Hierarchiemechanismus

Die Speicherhierarchie besteht also aus der Hintereinanderschaltung yon Speicherstufen, wobei mit zunehmender Stufenordnungszahl

die Zugriffszeit

und Speicherkapazit~t zunimmt. Bei einem Speicherzugriff des Prozessors versucht dieser zun~chst, die Daten auf der untersten schnellsten Ebene zu finden. Bei Mi~erfolg wird zur n~chsten Ebene zugegriffen und so fort. Bei einer Daten@bertragung auf die jeweils niedere Ebene wird nun nicht nur das verlangte Wort oder Byte, sondern gleich ein ganzer Block ~bertragen. Auf jeder unteren Ebene wird ein

Teil

des Blocks abgelagert.

Die 0bertragungszeit ist bei den gew~hlten Blockl~ngen meist klein gegen die eigentliche Zugriffszeit. Das Wesen der Speicherhierarchie dr~ckt sich also darin aus, da~ unter Zulassung yon geringfOgig mehr Zugriffszeit (n~mlich incl. 0bertragungszeit) @bertragen werden,

ganze Daten- oder Programmbl6cke

in der Annahme, da~ davon ein Yell in n~chster Zukunft

ohnehin zum Prozessieren angefordert wird. Es liegt also ein prophylaktischer Zugriff (look ahead) unter Ausnutzung der (gegen die eigentliche Zugriffszeit) kurzen 0bertragungszeit vor. Unterst@tzt wird dieser Mechanismus dadurch, da~ die Daten oftmals in kurzem Zeitraum mehrfach zugegriffen werden,

z.B. bei Programmschleifen,

abet auch beim Operieren

122

auf h~ufig benutzte Arbeitsdaten Die Trefferrate, gegriffenen Ebene,

d~ho die Wahrscheinlichkeit,

Ebene anzufinden,

ferner im allgemeinen

sie nat~rlich

folgt im einfachsten

kann selbstverst~ndlich

bei denen jeder Zugriff software-implementiert

Datenteile

und entsprechend

Einspeichern z.B.

usw. Auf den h6heren Ebenen, eingeht,

ist die Steuerung

"intelligenter".

fiber einen das Gesamtspeichersystem

L~fassenden

erfolgen. enthielte

ordnung der virtuellen

Entwicklung

in einer Speicherhierarchie: speicheradresse Hauptspeicher

gibt es meist mehrere Adressr~ume wird die reale Haupt-

Platz im Pufferspeicher

Indextabellen

umfa~t,

Zu-

zur lokalen Ebenenadresse.

Auf Pufferspeicherebene

einem bestimmten

den inhaltsadressierten

der realen Adresse

h6heren Hierarchiestufen die Datenlokalisierung:

zugeordnet.

Beim

die also bereits zugeerdnet.

Bei

fibernehmen die vorer-

Logisches

und hierarchie-

Suchen wird identisch.

Die Zuordnungstabellen Ebenen gespeichert~

werden

Beim

entweder auf der gleichen oder auf unteren

(schnellen)

einem eigenen mehr oder weniger

Pufferspeicher

assoziativ

eines Archivspeichers~

der alle Daten im 0N-line

einen magnetischen

Bandspeicher

und einem Prozessorsystem, und einer Hierarchie

wird die Tabelle

arbeitenden

Man kann sich so das gesamte DV-System vorstellen spielsweise

fQr die dynamische

wird die heute meist virtuelle Adresse,

einen grS~eren Adressraum

spezifisches

dann eine Tabelle

Gesamtspeicheradresse

Aufgrund der histerischen

transport~

Algo-

Dieser Mechanismus

in untere schnelle Ebenen,

im Hauptspeicher

er-

und das Suchen yon Daten auf einer Ebene kSnnte kon-

Jede Hierarchiestu£e

Prozessor

h~ngt

ab.

nach den gebr~uchlichen

(Least Recently Used).

in die Leistungsbilanz

zeptuell am einfachsten

w~hnten

dieser

Davon unabhgngig

unterstfitzt werden durch residentes

Teile des Betriebssystems

zu-

auf einer geffillten Hierarchiestufe

Fall selbstregelnd

gewisser hgufig gebrauchter

Die Adre~steuerung

zu mit der Speicherkapazit~t

Daten- und Programmprofil

yon Speicherplatz

rithmen wie FIFO oder LRU

Adressraum

nimmt

Kataloge usw.

Daten auf der jeweils

mit der Blockl~nge.

vom jeweiligen

Das Freimachen

wie Indextabellen~

Speicher

in

gehalten.

als die Kombination Zugriff enth~it,

mit automatischem

bei-

Band-

das wiederum aus dem eigentlichen

yon Arbeitsspeichern

besteht.

Die vet-

123

schiedenen,

teilweise im vorigen Abschnitt diskutierten Technologie-

und Steuerungsparameter variieren entlang der Hierarchieachse wie in Abb. 6 skizziert.

3.2

Leistungsbetrachtung

Das wichtigste Kriterium der Speicherhierarchie ist die Gesamtzugriffszeit bzw. Gesamtzugriffsrate,

absolut gesehen als auch kostenbezogen.

Diese Zusammenh~nge sollen im folgenden anhand eines sehr einfachen Modells diskutiert werden. Das Modell orientiert sich an "typischen" Werten f@r die verschiedenen Parameter und extrapoliert bei nicht bekannten Daten. Wie das Technologiediagramm Abb. 2 bereits indiziert, scheint eine nat~rlich einfache G e s e t z m ~ i g k e i t

zwischen den Bitkosten und der Spektrums-

variablen Zugriffszeit zu bestehen. Diese und die Zuordnung der Trefferrate und Speicherkapazit~t diagramm Abb. Gerade

zur Zugriffszeit sind im Modellparameter-

7 aufgetragen. Die Kapazit~tsverteilungskurve

ist als

(im log. Ma~stab) angenommen, mit den Endpunkten Puffer- und

Archivspeicher. Die gew~hlte Archivkapazit~t ist 1012 b, die Pufferkapazit~t 200 Kb. Die auf der Geraden liegenden Punkte f@r Haupt- und Plattenspeicher entsprechen etwa realen Werten. Die Kapazit~tsverteilungskurve ist an sich nat@rlich innerhalb des technologisch verf~gbaren Spektrums frei w~hlbar. Mit wachsender Prozessorleistung und Datenmenge wird sie nach oben verschoben werden. F~r die Trefferrate im multiprogrammierten Stapelbetrieb liegen als Funktion der Kapazit~t und Blockl~nge einige Erfahrungsdaten im Bereich Puffer - Hauptspeicher vor [9]. Typische Werte daf~r wurden der Modellkurve zugrundegelegt.

Zu den oberen Hierarchieebenen ~in wurde extrapoliert.

Das Modell ber~cksichtigt nicht die gegenseitigen Abh~ngigkeiten von Blockl~nge,

Zugriffszeit, Trefferrate, Multiprogrammierungsgrad usw.,

sondern nimmt starr typische Werte an. Die Gesamtzugriffszeit ist

tges = t1+(1-hl)t2+(1-h2)t3 + .... (1-hn_1)t n

GI. I

124 mit tn ~ Zugriffszeit der n-ten Stufe hn = T r e f f e r r a t e

der n-ten Stufe

Die maximale Gesamtzugriffsrate,

d.h. der Zugriffsflu~ an der Schnitt-

stelle zum Prozessor ist I

max° Zges = tt

1,hl

GI. 2

l_,hn_l

P-~I+ - ~ 2 t2+ . . . .

Pn

tn

mit Pn = Zugriffsparallelit~t auf der n-ten Stufe. Die Zugriffsparallelit~t entspricht in etwa der Modularit~t. angenommen, da~ 50% der Zugriffsparallelitgt

Es wird

sich jeweils in echter

Erh~hung der Zugriffsrate durch Multiprogrammierung niederschlagen, Peff also 0,5 po Ferner, da~ unterhalb der Plattenspeicherebene Programmumschaltung nicht mehr lohnt (p=1) und schlie~lich,

da~ Einzel-

Prozessorbetrieb vorliegt. GI. 2 modifiziert sich dann entsprechend. Einige Modellergebnisse auf der Grundlage realer Technologien sind in Tabelle II zusammengestellt.

Unterschiedliche

Speicherzugriffsraten

schlagen sich in unterschiedlicher Prozessorauslastung nieder. Es wurde ein Modeilprozessor mit 2 MIPS (Millionen Instruktionen pro Sekunde) und durchschnittlich

2 Zugriffen pro Instruktion gewghlt. Dieser Pro-

zessor kann seine volle Leistung nur entfalten, wenn das Speichersystem 4 Millionen Zugriffe pro Sekunde z u l ~ t . Die schlechte Auslastung dieses 2-MIPS-Prozessors bei heutiger Konfiguration ohne Multiprogram~ierung ~berrascht nicht. Auch mit Multiprogrammierung ist die Auslastung nur mg~ig. Erst die Einf@hrung des elektronischen Massenspeichers erbringt eine Verbesserung auf eine vern@nftige Gr6~enordnung.

Bei Multiprogrammierung

verlagert sich jetzt der Engpa~ f@r die Zugriffsrate vom Plattenspeicher (mit seiner hohen Modularit~t)

zum Bandspeicher. Dieser Engpa~ k6nnte

~berwunden werden durch weitere Erh6hung der Hierarchiestufenzahl,

kon-

kret durch Einbau einer Zwischenstufe zwischen Platten- und Bandspeicher.

125

Technologisch liegt eine solche Stufe im Bereich des Sichtbaren, n~mlich ~ber eine Modifizierung des konventionellen Plattenspeichers

zu

einem Satz yon flexiblem Platten mit sehr hoher Bit-Volumendichte

[9].

Die Zugriffsrate der Hierarchiekonfiguration

liegt dann oberhalb yon

4 Millionen pro Sekunde. Die Ergebnisse aus Tabelle II werfen die Frage nach der optimalen Hierarchiestufung auf, bei festgehaltenen Endpunkten.

Ffir diese Analyse wird

ohne Bezug auf reale Technologien eine g l e i c h m ~ i g e

Stufung vorgesehen

und die Stufenzahl variiert. Multiprogrammierung wird jetzt nicht ber@cksichtigt. Ergebnisse sind in Abb. 8 aufgetragen:

Bei ca. 16 Stufen

stellt sich ein Sgttigungswert fur die Zugriffsrate ein (die in diesem einfachen Fall der reziproke Wert der mittleren Zugriffszeit ist). Diese Zugriffsrate ist nur etwa 2 mal kleiner als die der reinen Pufferspeicherstufe. In Abb. 8 ist weiterhin die Preisleistungszahl, pro Gesamtbitkosten,

n~mlich Zugriffsrate

aufgetragen.

Hier liegt das Optimum bei ca. 8-10 Stufen. Die Verbesserung gegenfiber einer 4-stufigen Hierarchie ist g r ~ e r

als Faktor 6. Auf der Grundlage

der realeren Daten in Tabelle II ist der Gewinn bei einem Schritt von heutigen 4 Stufen auf (die durchgespielten)

6 Stufen noch wesentlich

h6her, da dort nicht von einer gleichmg~igen Stufung ausgegangen wurde. Ein weiterer Vorteil der feineren Hierarchiestufung ist die Verbesserung des Prozessor-"Wirkungsgrades":

Die Zahl der Zugriffe zum Platten- und

Bandspeicher nimmt ab. Damit nimmt auch die Zahl der prozessierten Instruktionen

(der Zugriffsroutinen) pro Zugriff zur Speicherhierarchie

ab, und der Prozessor-"Wirkungsgrad"

nimmt zu. Schlie~lich kann das Be-

triebssystem einfacher gehalten werden. In diesem Modell ist der Zuverl~ssigkeitsaspekt nicht enthalten, der mit wachsender Stufenzahl kritischer wird. Ebenso sind die Kosten der Steuerungen, Adresstabellen, Trefferratenkurve

etc. nicht ber@cksichtigt.

Die Extrapolation der

ist v611ig hypothetisch. All dessert ungeachtet d~rfen

die Modellergebnisse als Indiz daffir verstanden werden, dab eine feinere Hierarchiestufung noch erhebliches Leistungspotential

enth~it.

126

4.

SPEICHERASPEKTE BEI DATENBANKBETRIEB

Auch der Datenbankbetrieb kann grunds~tzlich in die bisherige Modellbetrachtung eingenordnet werden° Derjenige Parameter, der sich m~glicherweise

(in Richtung ungQnstiger Werte) ~ndert, ist die Trefferrate,

insbesondere auf den hohen Ebenen. Erfahrungen dar~ber m@ssen abet erst gewonnen werden, sodag hier die Modellwerte beibehalten werden,

zumal

auch bei der Datenbank ein gewisses "Nachbarschafts"-Verh~Itnis

yon

Anfragen festzustellen sein dQrfte. Praktisch-anschaulich k~nnte man sich eine Funktionsverteilung

auf die einzelnen Hierarchiestufen wie in

Tabelle III skizziert, vorstellen. Zugriffsrate m~ssen v o n d e r

Datengruppen mit hoher professioneller

Archivstufe auf die Plattenspeicherstufe

resident ausgelagert werden. Der spezifische Datenbank-Leistungsparameter die zul~ssige Anfragenrate.

ist, neben der Datenmenge,

Diese sollte mit wachsender Datenbankkapa-

zit~t auch ansteigen. Die folgende 0berschlagsrechnung m~ge einige Veranschaulichung bringen: Nach Tabelle II ist bei heutiger Hierarchie und Multiprogrammierung die Modellzugriffsrate

~85 M/s. Wenn wir einen Programmablauf von durch-

schnittlich 100 K Instruktionen pro Datenbank-Anfrage

annehmen, w~rde

das System 4.25 Anfragen pro Sekunde erlauben. Dieser Wert dfirfte bei einer Datenbank-Kapazit~t yon 1012 b nicht ausreichen. Nach BinfQhrung des elektronischen Massenspeichers

erh~ht sich die Anfragenrate auf 14

pro Sekunde, Mit einer zus~tzlichen Zwischenstufe zwischen Platten- und Bandspeicher erh6ht sie sich auf ca. 30 pro Sekunde - entsprechende Prozessorleistung von ca. 3 MIPS vorausgesetzt. Die letzten Endes interessierende Frage, wieviele Terminals an eine Datenbank dieser Gr6ge bei befriedigender Bedienung angeschlossen werden k6nnen, h~ngt natQrlich yon der mittleren Anfragelast pro Terminal ab. Bei einer angenommenen mittleren Last yon einer Anfrage pro Terminal und Minute errechnet sich eine Terminalzahl von 30.60=1800. Diese Anschlugm6glichkeit pro 1012 b Datenbankkapazit~t

erscheint ausreichend.

Als Schlugfolgerung aus diesen Betrachtungen soll die Feststellung getroffen werden, dag Organisation und Technologie zukQnftiger Speichersysteme das Potential haben, den Leistungsanforderungen eines breiten Datenbankbetriebes

gerecht zu werden.

127

Literatur [ I] C.W. Pugh, "Storage Hierarchies:

Gaps, Cliffs and Trends",

IEEE Transactions on Magnetics, Vol. Mag-7, No. 4, Dez. 1971 [ 2] C. Johnson, "IBM 3850-Mass Storage System", Nat. Comp. Conf.

1975, S. 509

[ 3] J. Kelly, "The Development of an Experimental Electron-BeamAddressable Memory Module", Computer, Februar 1975 [ 4] W.C. Hughes et. al., "BEAMOS, A New Electronic Digital Memory", Nat. Comp. Conf. [ 5] G.F. Amelio,

1975, S. 5-41

"Charge-Coupled Devices for Memory Application",

Nat. Comp. Conf. 1975, S. 515 [ 6] W.S. Boyle et. al., "Charge-Coupled Devices - A New Approach to MIS Device Structures", IEEE Spectrum, Juli 1971, S. 18 [ 7] A.H. Bobeck et. al., "A New Approach to Memory and Logic: Cylindrical Domain Devices", Proc. AFIPS Conf., Vol. 55, 1969 [ 8] R.R. Martin et. al., "Electronic Disks in the 1980's", Computer, Februar 1975, S. 24 [ 9] D.H. Gibson, "Considerations

in Block-Oriented Systems Design",

AFIPS Proc., Vol. 30, SJCC 1967, S. 75-80

128

I m

SPEICHERMEDIUM (HOMOGENIT~T, BITDICHTE)

BiTZAHL PRO SCHREIB-LESE-STATION ]-ECHNOLOGIE - (MATRIX-/SEQUENTIELLE ANORDNUNG) PARAMETER

-

i -

ATENTRANSPORT

ZUGRIFFSZEIT

i- OBERTRAGUNGSZEIT = F(OBERTRAGUNGSBREITE, TAKTFREQUENZ)

BLOCKL~NGE,

- MODULARITAT----ZUGRIFFSRATE )PERATIONSPARAMETER

- BITKOSTEN---KAPAZIT~T - ZUVERLASSIGKEIT - FLOCHTIGKEIT ,- ADRESSIERBARE EINHEIT (BYTE/BLOCK-ADRESSIERUNG)

TABELLE

I

SPEICHERPARAMETER

0,075 0,9

0,075 0,009

0,03 0,04

0,03 0,04

O,O3 0,04

P+H+E+SP+B

P+H+E+SP+B Multiprogr.

P+H+E+SP+F+B Multiprogr.

FP

,32 1

1,82

70

100

(0,3 4) 2,82

0,2

i

Pufferspeicher Hauptspeicher Elektronischer Nassenspeicher Starre Platte Flexible Platte Band

B

(Prozessor 2 MIPS,

TABELLE II

0 , 0 1 5 (O, 7) 5,88

2 Zugriffe/Instruktion)

1,4 1,32

47

1,87

0,53

0,3

3,2

2,1

0,67

1 ,27

21

0,85

(1,1

0,2

)

0,084

1,27

2,8

0,11

0,3

[~s]

[I06~1

GesamtKosten

9,3

Prozessor Auslastung

[%]

imax. Zges

[106/s1

B

tges

P H g SP FP

Modellhierarchie-Leistungsparameter

0,075 O,O O 9 % O O 4

0,9

O,03 0,04

SP

P+H+SP+B Multiprogr.

E 9

H

0 , 0 3 0,O4

P

t [ps]/Pelf

P+H+SP+B

KONF IGU RAT ION

~D

130

HIERARCHIEEBENE NR,

TECHNOLOGIE TYP. KAPAZITAT

FUNKTION

1

BIP PUFFER- 4-16K BYTES SCHNELLER ARBEITSSPEICHER FOR VERKNQPFUNG VON DATEN MIT SPEICHER PROGRAMMEN

2

FET HAUPTSPEICHER

5

I05-10ZB

BEREITSTELLUNGVON PROGRAMMEN UND DATENFOR OBERSCHAUBAREN OPERATIONSZEITRAUM

SCHIEBERE- I07-I09B GISTER- BZW E-ST~HLSPEICHER

HALTEN VON H~UFIGEN PROGRAMMEN Z,B. BETRIEBSSYSTEM UND ARBEITSDATEN Z.B, INDEXTABELLEN, DESKRIPTOREN, KATALOGE, ZEIGERNETZE USW.

PLATTENSPEiCHER

I08-1010B

BANDSPEiCHER (AUTOMAT, BANDTRANSPORT)

i010-i013 B DOKUMENTEN-DATENBANK DATENSICHERUNG, ARCHIVIERUNG

DATEIEN FOR PROFESSIONELLE BENUTZUNG, DATENSICHERUNG

TABELLE ZII

FUNKTIONSVERTEILUNG BEI DATENBANKBETRIEB

131 I Ill l

I

BANDSPEICHERMIT AUTOMATISCHEMLADEN

I I

rain II

1 L .......

PLATTENSPEICHER

HAUPTSPEICHER

---J PUFFERSPEICHER

I

I

~I

~ 40 ms

/~s

50 ns

STEUERKANALE

Abb. 1

SPEICHERHIERARCHIE HEUTE

i

BANDSPEICHERMIT ~~10s 1 MANUELLEMLADEN

I32

MATR ~X ~cts/bit I bits

SEQUENT|E L L

BiP FET BUBBLES

ROHRE PLATTE

E-

,

I

log i

@

I i 1 1 J I I l

o

•i

104

AUTOM. BAND

I

Xt

MITTLERE ZUGRIFFSZEtT

m

ADRESSIERBARE EINHEIT

D ×

i l

102

BITKOSTEN ( Marktpreise ) x

10-2 i _ _ |

I

I 102

104

108

106 I

1010 I t012 B!TS / LESE - SCHREIBSTATION t

J i

I

!

i

~

Abb. 2

el

~

i

Q

el

e~

mech

J

MEDIUM -

i

DATENTRANSPORT (ELEKTRONISCH / MECHANISCH )

4-I+ HOMOGENIT~T mech

OPERATIONSPARAMETER ALS FUNKTION OER TECHNOLOGIEPARAMETER

133

BITS

t 10 7-

3340 x CDC 9762 x x 3330 - 002

l o 6-

× 3330 - 0 0 1 x 2314 10 5-

10 4

× IBM 2311 I I

I

I

10 3.

II

10 2.

1960

I

!

1970

1980

x BITFL,~CHENDICHTE

BITS / INCH 2

• BITSPURDICHTE

BITS / INCH

• SPURDICHTE

SPUREN / INCH

Abb, 3

PLATTENSPEICHER -

BITDICHTE

JAHRESZAHL

134 ~NDEXTABELLE

SATZ 5 SATZ 2 SEQUENTIELLES SUCHEN

OATENSPUR

DIREKTE ADRESSE

0BERLAUFZEIGER SATZ 3 0BERLAUFSPUR

A) PLATTENSPEICHE R

I

I INDEX 2

ADR. X SATZ 3

INDEX 3

ADR. Y

i INDEX 5 ADR. Z

Abb. 4

ADRESSiERUNGSSYSTEME

SATZ 5

B) ELEKTRONENSTRAHLSPEICHER

135

SPEICHERKAPAZITAT BITS

1014 MAGN. BANDSPEICHER ( automatisch )

1012.

E-STRAHL 1010. MAGN. PLATTE,

108 -

106 -

104 -

102

I

10-8

I

10 - 6

I

10 - 4

!

10 - 2

1 ~4=,,,.-

1 ' LOCKE '

~-~ I

L Abb. 5

TECHNOLOGIE - 0BERSICHT

(ohne opt. Techn,)

I

102 ZUGRIFFSZEIT

s

136 DATENSPEICHER

5

i

AUTOMAT. BAND

1 4

HOMOG ENIT.,~T MEDIUM DATENTRANSPORT MECHANISCH ADRESSIEREINHEIT BLOC KLANG E ZUGRIFFSZEIT STEUERUNGSAUFWAND ( SOFTW,~,RE ) K.APAZITAT TREFFERRATE

'I

' L PLATT i I ~DR'TA~:S'~0J "L ......

I i '--SC"'EBEREO')4 I

2

' s I I J

]ADR. TAB. St. 3 - - 4 J

1 FET ADR. TAB. St, 2

STUFE B

1

BIP

l

,

!

m J

_jL

}..*'DATENRATE

1

TAKTFREQUENZBusBREITE HARDWARE

PROCESSOR

PROCESSORSYSTEM

Abb. 6

MODULARITAT BITKOSTEN DATENFLOCHTIGKEIT DATENTRANSPORT ELEKTRONISCH

I~

-

STEUERUNG

STEIGENDER TREND

PARAMETERTREND 0BER HtERARCHIESPEKTRUM

137 BITS PARALLELZUGRIFFE

CTS / BIT 1-h

BIT - KOSTEN

KAPAZIT,~T .1012

_ 1010

10-2

108

10-4

lO6

10-6

- 10 4

10-8

102

10-10.

I 10--8

I 10--6

I'" 10 .-4

I' 10 -2

I 1

I t0 2

ZUGRIFFSZEIT s

P H E SP FP B

Abb. 7

PUFFERSPEICHER HAUPTSPEICHER ELEKTRON. MASSENSPEICHER STARRE PLATTE FLEXIBLE PLATTE BAND

MODELLPARAMETER

138

10 6 S

~=10 6 $

1 ZUGRIFFSRATE 14-

12-

8 -3

// / //+/

GESAMTBtTKOSTEN

4.!

2-

|

2

I

I

6

I

I

10

1

l

I

14 -

Abb. 8

I

I

I

18

MODELLERGEBNISSE GLEICHM~.SSIGE STUFUNG { im log. Mal~stab )

~

STUFENZAHL

System R:

A Relational Data Base.Management System

Morton M. Astrahan, IBM Research Laboratory, San Jose, California Donald D. Chamberlin, IBM Research Laboratory, San Jose, California W. Frank King, IBM Research Laboratory, San Jose, California Irving L. Traiger, IBM Research Laboratory, San Jose, California INTRODUCTION System R is a data base management system which provides a high-level, non-procedural relational data interface. The system provides a high level of data independence by isolating the end user as much as possible from underlying storage structures. The system permits definition of a variety of relational views on common underlying data. Data control assertions,

features

are

also

provided,

including

authorization,

integrity

triggered transactions, a logging and recovery subsystem, and f a c i l i t i e s

for maintaining data consistency in a shared-update environment. The relational model of data was introduced by Codd [ I ] in 1970 as an approach toward providing solutions to the various outstanding problems of current data base management systems. In particular, Codd addressed the problems of providing a data model

or view which isdivorced from various implementation considerations (the data

independence problem) and also the problem ofproviding the data very

high-level,

non-procedural

stressed here that the relational model is a framework compatible

solutions

to

base user with

data sublanguage for accessing data.

these and other

or

problems in

philosophy

a

I t should be for

finding

data base management; the

relational approach is thought to make solutions more elegant and perhaps simpler but the

approach by i t s e l f does not solve these problems.

With this caveat in mind, our

f i r s t purpose is to b r i e f l y describe a related set of data base problems which we are attempting to solve in a coherent way following the relational approach. Our solutions are embodied in an experimental prototype

data

management system called

System R which is currently being designed, implemented, and evaluated at the IBM San Jose Research Laboratory. We wish to emphasize that System R is a vehicle for research in data base architecture, and is not available as a product. Furthermore, the ideas discussed in this paper should not be considered as having product implications.

140 To a large extent, the acceptance and value of the relational approach hinges on the demonstration that a system can

be b u i l t

which is

operationally

complete (can

actually be used in a real environment to solve real problems) and has performance at least comparable to today's existing systems.

With the

present

state

of

systems

performance prediction, the only credible demonstration is to actually construct such a system, and to evaluate i t in a real environment.

The point of this

paper,

then,

is to describe the set of problems which are being studied in the System R framework, to discuss the objectives of the system (which amounts to a description or definition of

the term operationally complete), and to describe the architecture of the system,

including overall structure, interfaces, and functional design. The System R project is not the f i r s t however, we know of complete capability. related

no other

implementation

hence data

the

relational

Other efforts have demonstrated f e a s i b i l i t y in various

problem areas.

these

of

projects

the

No concurrent sharing of data was permitted

control, locking, and recovery issues were greatly simplified.

INGRES project [4] at U.C. Berkeley is also single-user oriented. of

approach;

For example, both the IS/I system [2] and the Phase/O SEQUEL

prototype [3] were single-user systems. and

of

system which is r e a l l y aimed at an operationally

In addition,

The each

has an incomplete treatment of views, i . e . , of providing various

views of data to various users. The next section describes the overall goals of System R and describes capabilities

which we believe

the

list

to be necessary in an operational environment.

of The

following section describes the architecture of the system, and describes in overview terms i t s major interfaces and the components which support these interfaces SYSTEM OBJECTIVES System R is focused on f i v e main goals: I.

To provide a high l e v e l , non-procedural relational data interface.

2.

To provide the maximum possible data independence for

the

basic

data

objects

(base relations). 3.

To support derived relational views.

4.

To provide f a c i l i t i e s for data control consistent with the high level of the data interface.

5.

To discover

the

performance trade-offs

inherent

in

this

type of data base

capability. F i r s t , each of these goals w i l l be discussed and i l l u s t r a t e d . I. High Level Non-Procedural Relational Data Interface The trend toward higher level languages has long been evident in the programming

141 domain.

Set-oriented

data

Information Algebra [5].

sublanguages were introduced

in

1962 in the CODASYL

Codd's ALPHA language [6] and Relational Algebra [7] raised

the level of data sublanguages by letting the user specify the properties of the data required without describing the access Path or detailed sequence of operations to

be

used to obtain the data. This trend toward higher level non-procedural programming [8] is aimed at reducing the number of decisions the programmer must make in order to express his problem/solution, and at making the decisions more relevant to the solution (as opposed to being relevant to the programming of a specific computer). Halstead

has examined two programs solving

the

same problem using his software

physics techniques [9], one written in ALPHA and the other in DBTG-COBOLand for this case found that the ALPHA solution required 30 times fewer mental discriminations than the lower level solution This observation should be directly translatable into increased

programmer productivity and ease of maintenance.

is one strong reason for the goal of supporting

Thus, human productivity

a high-level,

non-procedural

data

interface. The other reason for moving in the direction of non-procedural interfaces is related to the optimization of the execution of the program. to

I f the data base were dedicated

a single application, its structure could be optimized for that application only,

and the application could be written in terms of that optimized structure. in

an integrated

inefficient.

data

Hence, the

application

on a data

applications.

base environment,

application intent optimization.

such local optimization is l i k e l y to be

system must i t s e l f

optimize

base whose structure

The non-procedural, and hence is

is

high-level easier

the

execution

for

rather

much mathematical

the

sophistication

better

system to

algebra

projection,

join,

introduces division,

a collection etc.)

relational results. The need to relational languages became apparent research groups [11,12].

which

of

each

on the aggregrate

have relational

reveals

the

use as a basis for

part

particular, the ALPHA language is based on the f i r s t order predicate relational

of

a compromise among the various

specification

The available relational languages (ALPHA, Relational Algebra) were very required

However,

formal

of the user. calculus.

and In The

operators (selection, operands and produce

discover more user-oriented, non-mathematical and is currently being pursued by several

The principal external interface of System R is called the Relational Data Interface (RDI), and provides relationally complete [7] f a c i l i t i e s for data manipulation, data definition, and data control. To support high-level, non-procedural~ set-oriented applications, the RDI contains the SEQUEL data sublanguage in its entirety. SEQUEL is documented in [I0].

142 Of course, not a l l requirements can best be met through a non-procedural approach and f o r this reason the RDI

contains

single-tuple-oriented

operators

(FETCH, INSERT,

DELETE, REPLACE, e t c . ) in addition to the set-oriented c a p a b i l i t i e s of SEQUEL. We have designed the RDI to be used in two modes: (a) D i r e c t l y by an application

program

(e.g.,

a

COBOL program)

which

uses RDI

operators to access the data base. (b) As the target of a t r a n s l a t o r program (a special case of an application

program)

which is emulating some other type of user interface. 2.

Data Independence

Date [13] has defined data independence as the immunity of applications to change storage structure and access strategy.

the a b i l i t y of a data base system to provide various logical views of the data for

example to make v i s i b l e only selected records of a f i l e ,

of each record. application

By view,informally we mean a

can

access

the

data

base.

relational

The

to

distinguish

window through

which

an

term "window" is used to imply that the

these two notions of data independence.

address the only f i r s t

base;

and selected a t t r i b u t e s

changes to the data base which a f f e c t the view are v i s i b l e to wish

in

Often, however, the notion is associated with

application.

We

In t h i s subsection we

notion of data independence; the second~ which

we call

the

support of derived views, is discussed in the next subsection. Typically,

data

management systems permit two levels of data d e f i n i t i o n .

The lower

l e v e l , or "schema", describes the p r i m i t i v e data objects being managed by the system. In System R, these p r i m i t i v e objects are called base relations.

The description of a

base r e l a t i o n includes the r e l a t i o n name, a t t r i b u t e names, description of

the

units

of each a t t r i b u t e , the domain of each a t t r i b u t e , the order of the a t t r i b u t e s within a r e l a t i o n , the order ( i f any) of the tuples within a r e l a t i o n , the

definition

of

a

base table

storage or available physical access paths to the data. has

a very

direct

etc.

In

particular,

does not include any information about physical However, each base r e l a t i o n

physical representation, i . e . , each tuple of the r e l a t i o n has a

stored representation.

Data independence implies

that

the

base

relation

can

be

supported by a v a r i e t y of physical structures and access strategies. Clearly

data

independence

is important i f a system is to allow growth and meet the

changing requirements of various applications. access structures. 3.

System R provides

a

rich

set

of

Any of these can be used to support a given base r e l a t i o n .

Support of Derived Views

The higher level of data independence consists of the a b i l i t y to define a l t e r n a t i v e views in terms of the p r i m i t i v e data objects. This notion appears in most

143 contemporary data management systems and the usefulness of such systems depends in large measure on the capability of the system to support derived views. The i n a b i l i t y to support views which d i f f e r from the primitive views often leads to programs which are complex, because they are warped to use views which are not natural but can be supported, and which require extensive maintenance as changes over time.

the

system

As an example of the usefulness of derived views, consider a data base containing the following

two

types

of

records:

CATALOG (PARTNO,DESC,PRICE) and

SALES

(SALENO,PARTNO,QSOLD). The CATALOG f i l e is ordered by part number, and gives the description and price of each part. The SALES f i l e is ordered by sale number, and gives the part number and quantity sold for each sale. Suppose we wish to print out all the SALES records for parts which have a price greater than $I000. We could write a program to scan through the CATALOG f i l e , finding parts $I000;

for

with

PRICE>

each such part, a separate scan could be made through the SALES table to

find all the corresponding records.

This program would

be highly

procedural;

it

would require repeated scanning of the SALES table, and would give the system l i t t l e opportunity to optimize the query by choosing among alternate access paths. However, i f our system permits the specification of derived views, the user might specify a view consisting of the join of the two f i l e s , as follows: SALES-CAT (SALENO,PARTNO, DESC,PRICE,QSOLD). The program could then consist of a single through

the

SALES-CATview.

the system f l e x i b i l i t y

to take

scan

Besides being easier to write, this program would give advantage

of

new access paths

which

may become

available (such as a PARTNOindex on the SALES f i l e ) without requiring changes in the program. A major goal of the System R project is to develop and investigate the technology derived views. studied:

This

problem has

three

of

distinct aspects, each of which is being

(a) Exactly what set of operations on derived views is supportable? As an example of this issue, imagine a request to delete a tuple from the SALES-CAT view described above. Since this view is a join of two underlying f i l e s , i t is not obvious what actions should be taken on the f i l e s to support the deletion. (Should we delete the SALES record but retain the CATALOG record?) For some kinds of view modification requests, there may be several possible actions which would produce the desired result; for other kinds of requests, there may be no possible supporting action. Codd [18] has described some examples of the l a t t e r phenomenon. (b) How should the view be bound to the available physical structures and access paths? This aspect of the binding problem concerns the optimization of the view and

144 accesses on scan, etc.

the

view in terms of available access paths, e.g., indexes~ sequential

(c) When should binding be performed?

For dynamic view d e f i n i t i o n , the binding must

also be dynamic.

In System R, we are investigating various binding-time

dynamic

w i l l occur for dynamically defined views but for certain often-used

binding

or very demanding views, the binding w i l l be done s t a t i c a l l y

with

strategies;

(hopefully)

an

increase in performance. 4.

Data Control F a c i l i t i e s

Data Control includes those aspects of a data base system which control the access to and

use

of data.

We distinguish four types of data control, each of which is being

investigated in System R. (a) Authorization.

This

form

almost a l l current systems.

of control is the most common type, being present in

Authorization is the mechanism to

permit

or

creation and manipulation of data structures and views by various users. System R may p o t e n t i a l l y be authorized selectively

grant

to

create

new tables

and

authorizations for his objects to other users.

deny the Any user of

views,

and

to

The authorization

mechanism of System R is described more f u l l y in [14]. (b) I n t e g r i t y .

I n t e g r i t y control provides a mechanism for enforcing that the data in

the data base obeys certain rules or predicates system.

which

have been declared

is l e f t to protocols imbedded in various application programs. types

of

control

facilities

are

provided:

integrity

I n t e g r i t y assertions are expressed in the SEQUEL language data

in

the

predicates. type

to

the

This form of control is t y p i c a l l y not found in current data base systems but

of

data

b a s e [15].

The

system

then

In System R, two main

assertions as

and triggers.

predicates

guarantees

the

Exactly when the system checks an assertion is a function

assertion

and

the

transaction

about

the

truth of these of

both

the

boundary which caused the assertion to be

checked. Triggers are actions that are invoked when some triggering detected.

For

example,

this

or

action

is

suppose that the DEPT r e l a t i o n contains an a t t r i b u t e NEMPS

which represents the number of employees in the department. of

condition

To maintain the v a l i d i t y

value~ we can declare triggers to update t h i s f i e l d whenever an employee is

hired, f i r e d , or transferred. (c) Consistency.

Integrity

implies

the

static

correctness

consistency is concerned with the dynamic correctness.

of the data base and

Suppose that one

application

program is t r a n s f e r r i n g a set of employees from Dept. 48 to Dept. 50, while simultaneously another application program is giving raises to a l l employees in Dept, 50. The interaction of these programs may have the undesirable r e s u l t that some but not a l l of the transferred employees receive the raise. E v e n worse, i f the transferring program encounters a f a i l u r e and backs out i t s updates, i t may develop

t45 that a raise has been given to In

current

systems

the

someone in Dept. 48.

application would contain specific statements (e.g., "LOCK

DEPT 50") to avoid these problems. defensive

A major goal of System R is

to

eliminate

coding which is not a part of the problem being solved but is related only

to the fact that the solution is running in a certain environment. cannot

know in

advance the

exact

environment

is

not

needed),

consistency. boundaries

the

system must

provide

The approach being pursued is to of

atomic unit. environment

Since

the

the

require

in

control that

this

case

user

define

the

a transaction, which is a sequence of statements to be executed as an The system then requests whatever resources i t needs

to

guaranteed

the

needed to enforce

the

guarantee

atomicity.

in

the

run-time

Furthermore, this same atomic unit is used as

the unit of i n t e g r i t y , i . e . , i n t e g r i t y may be suspended within a transaction is

user

in which his application w i l l run

(perhaps no other users are currently updating employee records; lock

such

at the transaction endpoints.

but

it

I f a transaction violates i n t e g r i t y at

i t s endpoint, then the transaction is backed out. (d) Recovery.

The fourth

aspect

of data control is concerned with preserving the

i n t e g r i t y of the data i f the system experiences a malfunction or backs

up either

voluntarily

if

an

application

or i n v o l u n t a r i l y , (e.g., as in the case of deadlock).

The recovery c a p a b i l i t i e s of System R include the usual checkpoint/restart as well

as

functions

the a b i l i t y to back up an ongoing transaction to user-specified points.

These c a p a b i l i t i e s are examples of functions which are required in order to

have an

operationally complete c a p a b i l i t y . ARCHITECTURE AND SYSTEM STRUCTURE We w i l l describe the overall architecture of Sytem R from two viewpoints. will

describe

description. a functional

the

system

as

seen by

Second, we w i l l investigate

a

single

i t s multi-user dimensions.

Figure 1 gives

programming language,

or

used to

directly

support various other interfaces.

The

Relational Storage Interface (RSI) is the access-method-like level which handles

the

access

a

we

view of the system including i t s major interfaces and components. The

RDI, as described previously, is the external interface which can be called from

First,

transaction, i . e . , a monolithic

to single tuples of base r e l a t i o n s .

This interface and i t s supporting system

(Relational Storage System - RSS) is actually a complete storage subsystem in that i t manages devices,

space

allocation,

storage buffers (one level s t o r e ) , transaction

consistency and locking, deadlock, backout, transaction recovery and Furthermore, i t maintains indexes on selected a t t r i b u t e s of base relations.

logging.

t46 r- -"i

!

r - --~

I ! !

I I I I I

t I

I

I

Relational Data Interface (RDI)

<-----

Relational Storage Interface (RSl)

I I I I I

Relational Storage System (RSS)

I

<___

I I l

I I

Programs to support various interfaces: Stand-alone SEQUEL, Query By Example, etc.

l

Relational Data System (RSS)

!

<---

I I I l

i

I I

Figure I Architecture of System R

With this brief description of the RSS f a c i l i t i e s , we can return to the RDI and its supporting system (Relational Data System - RDS). The major functions performed by the RDS are authorization, i n t e g r i t y enforcement, and nonprimitive view support which includes all the binding issues discussed previously. the

catalogs

of

external

In addition, the RDS maintains

names, since the RSS uses only system-generated internal

names. The RDS contains a sophisticated optimizer which chooses the best access path for

any given

request

from

among the paths supported by the RSS. The operating

system enviornment for this system is VM/370 [16]. Several extensions to this virtual machine capability have been made [17] in order to support the multi-user environment of System R. ACKNOWLEDGEMENT The authors wish to acknowledge many helpful discussions with E. Fo Codd, originator of the relational model of data, and with L. Y. Liu, manager of the Computer Science Department of the IBM Research Laboratory. We also wish to acknowledge the extensive contributions to System R of Paul L. Fehder, who has transferred to another location, and Raymond F. Boyce, who served as one of the project managers until his untimely death in June of 1974.

147 REFERENCES [ I]

E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, June 1970.

[ 2]

M. G. Notley. The Peterlee IS/I System. Report UKSC-O018, March 1972.

[ 3]

M. M. Astrahan and D. D. Chamberlin.

IBM UK Scientific Center

Implementation of a

Structured English Query Language. Presented at ACM SIGMOD conference, San Jose, California, May 1975; to be published in Communications of the ACM, October 1975. [ 4]

G. D. Held, M. R. Stonebraker, and E. Wong. INGRES: A Relational Data Base System. Proc. AFIPS National Computer Computer Conference, Anaheim, California, May 1975.

[ 5]

CODASYL Development Committee.

An Information Algebra.

Communications of the ACM, April 1962.

[ 6]

[ 7]

E. F. Codd. A Data Base Sublanguage Founded on the Relational Calculus. Proc ACM SIGFIDET Workshop, San Diego, California, November 1971. E. F. Codd. Relational Completeness of Data Base Sublanguages. Courant Computer Science Symposia, Vol. 6: Prentice Hall, New York, 1971.

Data Base Systems.

[ 8]

B. M. Leavenworth. Nonprocedural Programming. IBM Research Report RC4968, IBM Research Center, Yorktown Heights, New York., August 1974.

[ 9]

M. H. Halstead.

Software Physics Comparison of a Sample Program

in DSL Alpha and COBOL. IBM Research Report RJI460, IBM Research Laboratory, San Jose, California, October 1974. [IO]

D. D. Chamberlin and R. F. Boyce, SEQUEL: A Structured English Query Language. Proc. ACM SIGFIDET Workshop, Ann Arbor, Michigan, May ]974.

[11]

N. McDonald and M. Stonebraker. Language. Proc.

CUPID: The Friendly Query

ACM Pacific Conf., San Francisco, California,

148

April 1975. Available from Boole and Babbage, 850 Stewart Drive, Sunnyvale, California 94086.

[12]

Mo M. Zloof. Query By Example° Proco AFIPS National Conference, Anaheim, California, May 1975.

[13]

C. J. Date. Wesley, 1975.

[14]

D.

D.

An

Chamberlin~

Authorization,

Introduction

J.

N.

and Locking

to

Data Base Systems. Addison

Gray~ and in

Computer

!.

L.

a Relational

Traiger.

Views,

Data Base System.

Proc. AFIPS National Computer Conference, Anaheim, California, May 1975. [15]

K. P. Eswaran and D. Do Chamberlin. a Subsystem for Data Base integrity.

Functional Specifications of IBM Research Report RJI601,

IBM Research Laboratory, San Jose, California, June 1975.

[16]

Introduction

to

VM/370.

IBM Publication

No. GC20-1800. !BM,

White Plains, New York. [17]

J. N. Gray and V. Natson. A Shared Segment and Inter-process Communication Facility for VM/370. IBM Research Report RJ1579, IBM Research Laboratory~ San Jose, California, February 1975.

[18]

E,

F.

Codd.

Recent

Investigations

in

Relational

Data Base

Systems. Proc. IFIPS Congress, Stockholm, Sweden, August 1974.

GEOGRAPHIC BASE FILES: Applications in the Integration and Extraction of Data from Diverse Sources Patrick E. Mantey, Eric D. Carlson, IBM Research Laboratory, San Jose, California Abstract This paper

addresses

the

development

of

integrated

municipal data bases, with

consideration given to p o l i t i c a l r e a l i t i e s and to the sources of data now available in

municipalities.

First,

the

potential users and potential uses of a municipal

data base are discussed, and an information system which would and

uses is considered.

serve

these

users

Next, the "current" status of data bases in municipalities

is reviewed and i t is concluded that there is a large quantity of data available many m u n i c i p a l i t i e s ,

but

that integrated data bases fer supporting an information

system are not usually a r e a l i t y . from

the

The problem of building an integrated

permits

base

integration

of

Geographic

Base File

A

(GBF)

the construction of extracted data f i l e s from these multiple sources

to support information system applications.

The concept

of

extraction,

for

the

diverse source f i l e s via geographic references, is developed (and a

prototype implementation is described in the Appendix). data

data

v a r i e t y of data sources presented by local agencies is then addressed.

central ingredient for an integrated data base is the which

in

Using the GBF, and

source

from various municipal functions, extracted data bases can be r a p i d l y b u i l t to

serve a v a r i e t y of applications of an

information

system

in

the

decision-making

situations of municipalities. I.

APPLICATIONS OF A MUNICIPAL DATA BASE

Municipal governments are, in essence, created to d e l i v e r services to a geographical area.

There

offered. into

is an unusual v a r i e t y , in comparison to private industry, of services

Local government is often structured (or fractured) along functional lines

special

districts,

as well as by geography.

Such structuring has precluded

concentration of power, but i t has increased the complexity

of

planning,

resource

a l l o c a t i o n , or management. Many of

the

routine.

Rather, they require

problems

in

municipal the

government

professional

require insight

decisions which are not and judgment

decision makers who consider the specific conditions of each problem.

of

human

Ideally, this

150

insight and judgment would be aided and guided by appropriate information derived from a comprehensive data base. This is the objective of a municipal information system:

to

facilitate

effective

analysis

supporting human decision makers with readily

usable

form.

data

resources

and analysis

functions

in

Because a municipality provides services to a geographical

area, much of the data relevant to decision agencies

and solution of specific problems by

making or

problem solving

in

local

w i l l have geographical attributes, and can be given spatial interpretation

via maps. A key attribute of a municipal information system is the

capability

for

displaying information in the form of maps. Another requirement for such systems to be effective is that they support ready use by decision makers who know very about computers.

objectives or decision c r i t e r i a for solving a problem. decision

little

The system must help the decision makers develop their precise

making requires

exploratory

analysis,

The solution process in such

selection

meaningful data presentation in an interactive environment, such c a p a b i l i t i e s ,

called

GADS (Geo-data Analysis

of relevant data, and A system which provides

and Display System) has been

developed and evaluated in several applications, such as police manpower allocation and analysis of urban development policies [ I - 4 ] . that interactive analysis

and display

The evaluations of GADS indicate

systems have a great

potential

in

the

operations, management, and planning of municipalities. As an example,

consider

a municipality which maintains a computerized property

information f i l e (via the tax assessor function).

Such a f i l e would have data

on

each parcel, possibly including: address owner zoning improvements date constructed type construction size (area) current use area centroid assessed value If

this

data were accessed via an interactive information system, a decision maker

could readily obtain, for example, the address and assessed value of a l l

residences

constructed between 1960 and 1962 and having floor area between 1600 and 1800 square feet, on lots with 6800 to 7200 square feet. real-estate

appraiser,

this

information

I f the in

user

tabular

of

form

the

system were a

might

be of value in

determining i f a particular home is f a i r l y appraised (Figure I ) . the

distribution

A histogram giving

of assessed value of these homes would provide additional insight

151 (Figure 2) and a map r e l a t i n g the average value of such homes in the c i t y ' s planning areas to

the

city-wide

average

(Figure

3)

would

provide

the

appraiser

with

information in a spatial framework. If

the

the

user

of the information system were an assessor concerned with determining

neighborhoods

computer-aided

which

could

appraisal,

be

considered

additional

data

equivalent

would

for

purposes

be required.

of

I f recent sales

records are the basis f o r c a l i b r a t i n g the assessment model, i t may be found that the assessor's fitting

data alone cannot be used to model variations in s e l l i n g price of houses

the description above.

selling

Showing on a map the

mean and variance

of

the

price of such houses by neighborhood w i l l o f f e r the assessor a visual means

for examining the q u a l i t y of the

assessment model.

The display

may cause

the

assessor to consider other factors to explain the v a r i a t i o n s ; e.g. crime rate, level of public f a c i l i t i e s and services (such as the influence

of

an

adjacent

regional

park [ 5 ] ) or the influence of other near-by land uses. In

making

decisions

related

to

residential

zonings,

pertinent

questions

and

information displays would relate to adjacent land use, the e f f e c t that the proposed development would have on the mix of housing stock available in the community, or to the e f f e c t these new residents would make on the area.

per-capita

park

acerage

in

the

The school o f f i c i a l s (and in some areas the local zoning authority) need to

evaluate the impact such a development w i l l have on the existing school Each of

these

questions,

facilities.

and many others, are of i n t e r e s t to d i f f e r e n t decision

makers involved in the area of community development. An example of the use of a municipal data base in resource allocation to

finding

the

neighborhood. relating

to

best

approach

to

One group may advocate these

burglaries

If

the

area

is

reduction

better

show they

probably offers greater promise. examined.

the

street

are

lighting.

day-time

Certainly the level of

found

to

base contains

police

to

school

roam about

the

queries

patrols

would

be

then

some suspicion

I f a large number of burglaries occur on school days, i t

off

However,

if

the

information so that i t can be determined that a school

adjacent to the neighborhood has f l e x i b l e scheduling and that free

If

have a high percentage of two wage-earner

may appear that the school-age children are not the perpetrators. data

relate

crimes, another approach

f a m i l i e s , and the majority of burglaries are during the week, w i l l f a l l on the children.

could

of burglaries in a residential

school

school

children

are

grounds, the source of the trouble may have been

identified. As a l a s t example, consider the application f o r a building permit f o r an service

station.

municipalities institutions

are are

With

the

recent

reluctant probably

to

wary

wave of

grant about

such loaning

service permits.

automobile

station abandonments, many Similarly,

money for such ventures.

financial For a l l

152 and

concerned,

for

the

public

interest

good, would

a careful include

analysis of such a proposal is

required.

Factors of

location

and

number of

existing

stations,

t r a f f i c access and t r a f f i c patterns at the proposed s i t e , and an estimate

of the automobile ownership and disposable income in the surrounding areas. In these examples i t has been assumed that the information system has comprehensive municipal data base,

access

to

However, such data bases are a r a r i t y today.

the next section the current status of municipal data bases

is

discussed,

and

a

In in

Section I I I an approach toward the provision of integrated data is offered. I I . CURRENT STATUS OF MUNICIPAL DATA BASES The importance of a comprehensive integrated data base to support decision-making in municipal

government has been widely recognized.

There have been several d i f f e r e n t

approaches taken to the development of comprehensive municipal data bases. One approach,

which

comprehensive collection

"data

bank".

This

census~ and

often

transportation

and

or

comprehensive

was popular

in

the

data

1960's,

bank was

was carried land

was the

use

development

of

a

usually generated as a special

out

and

planning

funded

study.

studies, detailed f i e l d surveys of land use (at the parcel

as

part

of

a

As a part of such

level)

were

conducted,

and survey data also was gathered on employment, income d i s t r i b u t i o n , and commercial a c t i v i t y (Figure 4). resources,

and

The data acquisition consumed a major

these

data

banks,

their

value

was s h o r t - l i v e d

and extending these data banks.

becoming a tool computers

the

Although accurate data was often gathered in the

snap-shot of the state of a very dynamic system, and updating

of

in

the

operations

by municipalities

of

beginning

development

because they were at best a

no means were

provided

routine

transactions.

in

governments.

the

The

1960's

computer

functions which have previously been computerized accounting,

billing,

budget

status

The application

is in

in

operations

of

law

of

can be characterized as a involving

generally u t i l i z e d in those private

industry:

payroll,

reporting, personnel records, etc.

Also, the

1960's saw wide-spread use of computers in the processes associated and

for

At about the same times the computer was local

function-by-function approach, with data processing introduced into tasks high-volume

study

did not provide any information r e l a t i n g to many municipal services

(e.g. public s a f e t y ) . of

portion

enforcement agencies.

isolated from each other~ and no attempts

were

with

elections

Usually these applications were made to

make t h i s

information

available for use by other municipal functions° The

real

property assessment function of local government in the l a t e 1960's began

to recognize the potential of computer applications.

The court decisions in various

locales requiring property to be appraised at current s i g n i f i c a n t l y increased work load on assessment o f f i c i a l s .

market value placed a In numerous areas, the

assessor has turned to computer-aided appraisal to meet these demands.

If

a model

153 is

to be b u i l t and calibrated to r e l i a b l y estimate fair-market value of residential

properties, a comprehensive real property data base is a must. were constructed,

to

Someproperty data bases,

Computerized real property

the f i e l d acquisition of very comprehensive data.

systems required

Note the detailed land

but

and

used in this system. There was a significant increase in the

amount of data used and required by the introduction of this computerized system,

the

A work sheet for f i e l d surveys by

appraisers in a California county is shown in Figure 5. attributes

besides

assessor map, book and page and to situs address, also added

geographic data such as a "centroid".

building

bases

often by computerizing the data contained on the assessor's f i l e

cards which were maintained on each parcel. usual references

These data

appraisal

much of the data on this sheet would not change from year-to-year, and

the appraiser in the f i e l d would only need to correct those

data

items

which had

changed. An additional selling price

requirement for computer-aided appraisal is data relating to current of

residential

properties.

This

data

provides

the

calibration

information for the regression models, and is available, depending upon the state or local laws, from the registrar of deeds, from the collector of

transfer

taxes,

or

from t i t l e companies. Assessors in some locales obtain this data by questionnaires which buyers are required by law to complete and return (Figure 6). and/or

others

the property.

also

These sources

can be used to obtain financial data (e.g. mortgage terms) for

Clearly this data base is an integral part of a municipal

data

base

for applications such as i l l u s t r a t e d by the examples in Section I. Another approach related to the development of municipal data bases is characterized by the USAC projects [6], p a r t i c u l a r l y Wichita

Falls,

Texas.

These c i t i e s

those

of

Charlotte,

North

Carolina,

and

were funded by the Federal USAC project to

build Integrated Municipal Information Systems, (IMIS).

The concepts

of

IMIS are

[6]: "(1) Integrated data processing systems should i n t e r - r e l a t e municipal processes. (2) A fundamental analysis of municipal operations and i d e n t i f i c a t i o n of related data processing components is a precondition to the effective use of computers. (3) A systems approach is required throughout the development process. (4) The automation of municipal operations must exploit the f u l l range of computer technology. (5) Automation of routine municipal processes is a fundamental condition to the realization of an IMIS. (6) An IMIS views the municipality as a basic building block for intergovernmental information systems. (7) Municipal information systems are by-products of computer-driven, operationsbased systems.

154 (8) Adequately designed data processing systems can be transferred from one municipality to another. (9) The integrated approach to municipal systems development must proceed on the basis of a plan within which incremental i n s t a l l a t i o n may be achieved in accordance with the p r i o r i t i e s and resources of any p a r t i c u l a r c i t y . " The USAC e f f o r t s involve c i t y governments, and were the consequence of studies as

the

IBM/New Haven project

[7] and the USC/Burbank project [ 8 ] .

sought to develop a methodology, via a "systems approach", for computers

by municipalities

[6]

but

did

the

such

These groups

application

of

not r e s u l t in system implementation of

integrated municipal information systems. The USAC approach wisely focused on operational sources to provide the current required

f o r municipal decision-making.

data

In the implementations, which are s t i l l

in

progress, the c i t i e s have concentrated on building up operational uses of computers, and

on

implementing

these

applications on a central computer under an integrated

data-base management system. has

been confirmed

[9]

but

decision making are s t i l l overcome

in

providing

The value of computers to these operational the

functions

applications of IMIS in the areas of management

to be demonstrated.

One of the d i f f i c u l t i e s that must

be

a comprehensive municipal data base, (even in ci~ies with a

f u l l y integrated and operational IMIS constructed according to the USAC philosophy), is

that

complete

i n t e g r a t i o n , where a l l municipal functions use the same computer

and data base management system, governmental

structure

is

and with

not

a

likely

prospect

with

current

example the data pertinent to decision making in a c i t y may be gathered agency,

such as

the

tax

assessor

by

For

another

(and conversely), and may reside on d i f f e r e n t

computers, under d i f f e r e n t data management schemes and in In

local

the limited resources of local governments.

different

file

formats.

addition, problems of data security, c o m p a t i b i l i t y of f i l e s , and high processing

costs may make complete integration u n r e a l i s t i c for many m u n i c i p a l i t i e s . Special data c o l l e c t i o n s , such as the U.S. Census, and data sources,

available

must also be r e a d i l y incorporated into a municipal data base.

data gathered according to blocks, block groups and census

tracts,

with

from

state

With census assessors

property data coded according to assessor map, book and page, with public works data in state-plane coordinates, and school data gathered by school attendance area, building

and maintenance

of

a truly

integrated

the

municipal data base presents a

formidable task. I I I . APPROACHES PERMITTING DEVELOPMENT OF INTEGRATED FILES A completely integrated data base would have a l l data r e l a t i n g to any functions of a municipality residing on the same computer system, under the system

and

organized

and

indexed

to

same data

f a c i l i t a t e correlation.

management

This ideal is not

155 attainable,

given

present

most municipalities. functions

are

organizational structures and computing capabilities in

However, i f

"properly

such

an

computerized

structured",

benefits as i f there existed addition,

the

it

will

a completely

approach w i l l

files

various

be possible

integrated

not

of

require

to achieve the same

municipal

data

base.

re-implementation

applications, but rather leaves the application data base in

municipal

the

of

In

current

control

of

the

function responsible for i t s primary maintenance and use. The approach taken

is

to

make use of

data,

when data

structured", to develop e f f e c t i v e l y the results as i f integrated

data

are "properly

existed

a completely

base without requiring that complete integration take place.

should not be construed as an argument against politically

there

files

and technically

possible,

integration.

provides

required

economically a t t r a c t i v e , i t should be implemented. Even with

If

This

integration

data

is

security, and is

an integrated

data

base, there w i l l always be decisions which require different groupings of data than those supported by the integrated data base. (There w i l l also remain, in data

sources

which cannot be integrated.)

comprehensive data base from multiple

data

So,

practice,

the

problem of

providing

sources is

unavoidable

and is

a not

completely solved by an "integrated" data base. The

"proper

structuring"

i l l u s t r a t e d by example. wishes

to

relate

required

to

make data

integration

possible

I f one is interested in information about burglaries,

this

files. in

each beat

for

terms

each day.

of

police If

and tax

no small

area

beats,

e.g.

the

number of

one wishes to use census data for

socio-economic information, and i f the census tracts boundaries,

and

Suppose the police dispatch data is used for burglary incidence,

and that such data is available in burglaries

be

information to neighborhood conditions, data sources could

include police dispatch f i l e s , criminal justice arrest f i l e s , census data assessor

can

and beats

have few common

information is obtainable relating these data sources.

A l t e r n a t i v e l y , i f the police dispatch data is captured by the street address of

the

c a l l , and i f a directory exists for the c i t y which w i l l permit i d e n t i f i c a t i o n of the census tract for each street address, then burglaries and socio-economic data can be related at the census tract level. "Proper

structuring"

of the data ( i . e . offer

data

to

of data f i l e s only has meaning with respect to potential uses

data f i l e s are not an end in themselves). support

decision making in a wide range of problem areas, then the

data f i l e s must be as detailed as possible, within privacy

and security.

I f the objective is to

the

constraints

of

economics,

The detailed data can then make possible the development of

the widest variety of data subsets and aggregations, and is more l i k e l y

to

permit

development of the required set of integrated data for a particular decision-making context.

An additional requirement is the existence of data elements in

which w i l l

facilitate

relating

the

data to that from different f i l e s .

each f i l e (In this

156 paper,

geographical

references

function in municipal f i l e s . or

personnel

identifiers

will

be singled out as data elements serving this

Commonreferences to account numbers, project are

other

examples

r e l a t i n g of data from d i f f e r e n t source f i l e s . ) "properly

structured"

if

of

data

A set

numbers

elements permitting

of

files

will

the

be called

they contain information permitting the r e l a t i n g of data

from d i f f e r e n t source f i l e s so that integrated subsets of data

at

the

appropriate

level of detail can be developed to support the requirements of problem solvers. Because municipal

government

is a service delivery function, mutual references to

geography can often be used to relate data municipalities. these

A powerful

common geographical

file

in

references,

from

the

diverse

files

available

in

f a c i l i t a t i n g the r e l a t i n g of data, based on is

a

Geographic

Base

File

(GBF).

Functionally, the GBF contains data to support the r e l a t i n g of data from other f i l e s to geographical location and also the display of t h i s data on a map. of

The creation

a GBF for a municipality is a key requirement in the development of a municipal

data base from source f i l e s .

Several d i f f e r e n t approaches have been taken.

The simplest GBF is a f i l e sometimes called a Property Location

Index

(PLI)

which

contains a l i s t of the v a l i d addresses in the municipality and an x,y coordinate for each.

This approach is the one used in Lane County, Oregon, and by the Assessor

Santa

Clara

County , C a l i f o r n i a .

and s t r e e t intersections and t h e i r x,y coordinates is appended. is

then

possible

to

I f the GBF also

contains

the

police

beat,

census

and municipality for each address, then i t is very simple computationally to

count the number of c a l l s in each beat. also

With such a GBF i t

automatically convert addresses (in the police c a l l f i l e for

example) to x , y coordinates. tract,

in

To make t h i s more useful, a l i s t of public place

permit

consideration

of

Evaluation of c a l l s by census t r a c t

would

socio-economic data with the crime data (of course,

police o f f i c e r s could also encode c a l l s by beat and census t r a c t , but this

approach

is l i a b l e to s i g n i f i c a n t errors and seems to be a poor use of police manpower). The

most

detailed

GBF's contain

locations, building outlines, u t i l i t y along

with

land

parcel

boundaries,

placements, and even topographic

easement

information,

s t r e e t address information on a l l parcels and names of a l l public lands

and buildings. suitable

digitized

for

This GBF is at the level

of

detail

of

surveyor's

engineering applications and detailed map building.

National Capital Commission [ I I ]

data,

and

is

Ottawa, Canada's

has pioneered in the development of

this

kind

of

GBF. The most

common GBF at the present time is the r e s u l t of work by the U.S. Census

Bureau in conjunction with the 1970 Census. massive

feature Independent

Map Series,

a

labeling and d i g i t i z a t i o n was performed for 200 major metropolitan

areas in the United States. (Dual

Using the Metropolitan

The resulting computerized maps were

Map Encoding) f i l e s [12].

called

the

DIME

Each entry (record) in the DIME f i l e

157 represents

a line

limit, etc.). is

segment (a

portion of a street segment, r a i l r o a d , creek, c i t y

Figure 7 shows a sample record and the map data from which the record

derived.

The segment has a "From" node and a "To" node, as well as a Left and

Right side.

Thus, each entry describes

description

of

and

two

ends and

its

two

The nodes are labelled as f a l l i n g on a

particular

given a sequential number within map and census t r a c t .

identified

describes

the

adjacent land.

The other data

the

street

segment sides are also given.

name and low address

(features

The description

of

a side

without

High and low

address

ranges

The records are ordered by feature

addresses

have

no secondary

ordering).

overlays (e.g. beats, census t r a c t s ) can be r e a d i l y defined in terms

Administrative

of segments of this f i l e . these

Each feature

The census t r a c t number, the block number,

and the place ( c i t y ) code are included for each side. for

the

by a p r e f i x , name, s u f f i x , type (e.g., North Army Southwest Street);

only name and type are required (e.g., Coyote Creek), actually

The

map of

on each entry is feature identif#cation f o r the segment and i t s sides. is

sides.

the nodes includes x,y coordinates, produced by the Census Bureau's

map d i g i t i z a t i o n . series,

its

computerized

Used in combination

with

"point-in

polygon

routines",

overlays f a c i l i t a t e development of counts of events in areas of

any specified overlay map. As in the development of any large machine readable f i l e , errors,

and poor

high startup

standardization have hindered development of GBF's.

costs,

data

But the key

problem in the development and use of a GBF is editing (corrections and additions).* Because of

the

startup cost, accuracy, and standardization problems, editing is a

key aspect of development. and

I t is p a r t i c u l a r l y important to v e r i f y

coordinate accuracy of the f i l e .

the

topological

Even i f there were no developmental problems,

"geographic" changes, such as new streets or changing

area

boundaries,

make f i l e

editing essential to a useful GBF. The Census Bureau and related e f f o r t s have produced programs for o f f - l i n e creation and batch editing of a DIME f i l e [12-13]. data

entry

and take

large

amounts of

Although the procedures were used to editing,

and

hence l i t t l e

use,

These programs require computer

create

of

these

200 GBF's, files,

a digitizer

there

has

been l i t t l e

Some c i t i e s (e.g., [14]) have

developed t h e i r own GBF's s i m i l a r to DIME. These e f f o r t s are also characterized the

use

of

a digitizer

and

batch

cumbersome f i l e editing procedures. on-line

computer

There

by

programs for f i l e creation, and by

have been a few

efforts

to

develop

d i g i t i z a t i o n systems, (e.g., [15], and there are experimental systems which

could support on-line d i g i t i z a t i o n with visual feedback [16-17]. systems

for

and c l e r i c a l time for editing.

provide

all

of

the

Yet, none of these

c a p a b i l i t i e s required for e f f e c t i v e GBF creation and

editing. *The Census Bureau uses the word "editing" to mean topological v e r i f i c a t i o n , and uses "update" for what we define as editing in t h i s paper.

158 Conclusions drawn from the IBM study [ I 0 ] regarding the requirements f o r i n t e r a c t i v e GBF editing and maintenance were: I.

There must be a c a p a b i l i t y for projecting hard copy maps and/or photographs onto the

display

screen.

It

must

be

possible

to select a r b i t r a r y (contiguous)

sections of the maps, and to produce a range of scales. 2.

The display

system must be able to handle m u l t i p l e , non-rectangular geographic

3.

The

4.

The display system must enable selection of any addressable point on the screen,

coordinate systems. display

system

must be able to produce both t e x t and l i n e s , with at least

three colors for lines (in order to be able to distinguish two maps). whether or not anything is displayed at that point. 5.

The creation

and editing

functions

must

include:

d i g i t i z a t i o n of base and

overlay maps; labeling of points, l i n e s , and polygons in the deleting

points

maps; moving

and

and l i n e s ; display of any section of the maps, and of specific

points, lines and polygons; and checking for topological accuracy. IV. DATA EXTRACTION A.

Philosophy and Operation

In

the

previous

section,

the combination of a GBF and properly structured source

f i l e s containing geographic references were i d e n t i f i e d as the basis f o r problem

solver

an e f f e c t i v e l y

integrated

data

offering

a

base to support decision making.

Recent studies of i n t e r a c t i v e information systems applications in

the

solution

of

unstructured problems [4,18,19] have i d e n t i f i e d the need f o r reduced subsets of data for supporting the problem solving.

Data reduction is required because:

a.

the p o t e n t i a l l y useful data base w i l l be much larger

b.

used, the user w i l l want access to varying levels of d e t a i l in the data base,

c.

the relevant subset of data w i l l vary during the problem-solving process,

d.

some data (e.g, census

and

event

data

such as

than

police

the

data

calls)

actually

may not

be

compatible at the detail level of the data captured in the source f i l e s . Extraction

is a process by which an integrated subset of data is developed from the

source f i l e s relevant to a p a r t i c u l a r problem-solving application. provides

the

user

with

a capability

integrated data base, without requiring the development of such an base

at

the

detail

level

of

the

Extraction

thus

e f f e c t i v e l y indistinguishable from a f u l l y

source

files,

integrated

data

i . e . i t provides a " v i r t u a l l y "

integrated data base. The extraction approach builds a data base subset from the source f i l e s according to a p r i o r i specifications for a p a r t i c u l a r source

files,

and dynamic

application.

Total

integration

of

the

aggregation and subsetting of the data at the time the

159

items are required is of course an a l t e r n a t i v e approach.

data

This approach is not

a t t r a c t i v e in today's environment because: a.

f o r any a p p l i c a t i o n a l l the relevant source f i l e s would have to

be

on-line

to

support conversational interaction, b.

protection of the source f i l e s would be more d i f f i c u l t ,

c.

development of

conversational

standardized d a t a structures

information

systems would require additional

and codes for

the

dynamic aggregation

and

subsetting, d.

better conversational performance is possible when the problem solving accesses a smaller data base.

Clearly the

development of

a fully-integrated, on-line data base from the source

f i l e s , solely for problem-solving applications, is not (currently) economical. Such an approach would also require special procedures for keeping the duplicate records current and consistent. be relevant to

the

With the extraction approach, the subset of data thought to particular

problem is

developed and made accessable to the

problem-solving system in an extracted data base. The subset is an extract from the available source f i l e s

at

the

(that phase of) problem solving. set of tables. source f i l e s . (e.g.

level of detail desired by the decision maker for This extracted data base may be thought of

For each variable there is one value in the table for each basic unit

zone, account, employee) used for the problem solving.

New variables can be

added directly to the extracted data base as an added column of example of

as a

Each table contains values for a set of variables extracted from the

the

tables.

The extracted data tables are formed from: source f i l e s containing lO years' on

crimes,

An

an extracted data base is shown in Figure 8, for use in crime analysis. land

use,

and

population;

a special

purpose map of

data police

beat-building-blocks (basic zones); and an extraction specification for computing 20 crime categories and selecting population and number of houses by year.

The result

is lO tables (one for each year) giving crime by category, population, and number of houses for each basic zone. The extraction approach leaves control of the operational source f i l e s in the hands of the originating application. current

at

the

time of

The extracted data bases are "snapshots" which are

their development. The problem solver can re-invoke the

extraction process at any time to get a more current

extracted d a t a base.

This

process decouples the data base used in problem solving from the operational f i l e s , and assures the problem solver that the data base upon which he makes decisions under his control.

is

This user control of the extracted data base, and the potential

performance advantages offered by access to

the

smaller extracted data set

as

compared to access to the total set of data, make the extraction approach attractive even in installations where an integrated data base exists (as with a complete IMIS as in the USAC approach described in Section I I ) .

Extraction is simplified with the

160 existence

of

an

integrated data base, because there are then no d i f f i c u l t i e s with

f i l e formats and data conversion. B.

Extraction System Architecture

The architecture

of

a municipal

information

system

designed

extraction

philosophy

Figure 9).

The f i r s t set would be the source data f i l e s and

data

entry,

the

data

would have three major sets of programs and data bases (e.g.

update, and other routine processing.

structured" as defined e a r l i e r in this paper. files

using

related

programs

for

These f i l e s should be "properly

The data base management f o r

these

may be an integrated system, such as IBM's IMS, or a more t r a d i t i o n a l system

such as those provided reference

files

by

IBM's

(indices),

DOS.

such as

The the

second component includes Geographic

Base F i l e ,

accurate

Programs for

maintaining these f i l e s , and programs for providing the data extraction functions of data

matching,

subsetting,

and

aggregation.

integrating the data base of source f i l e s . possible

to

develop

general

purpose

than

through

the

complicated

the

interface

the

key

to

The GADS experience indicates that i t is through

user-invoked

processing,

data structures and accompanying processing

overhead often found in integrated data base systems. are

component is

programs for the data extraction functions.

Essentially, these functions provide integration rather

This

The data extraction

programs

between the municipal data base and the t h i r d component of the

architecture, the extracted data bases and associated decision support system.

The

GADS analysis and display functions are an example of a dec%ion support system for non-programmer users. data

bases

systems. budget etc.,

for

A data extraction interface can

provide

multiple

For example there might be decision support systems for preparation,

all

extracted

a single decision support system, or for multiple decision support urban

supported

by

planning,

cash management,

computer-assisted appraisal, crime analysis,

a common extraction

interface.

The data

management

techniques f o r the extracted data bases should be t a i l o r e d f o r each decision support system.

However, the data access techniques may be the same as those

provided

for

the source data f i l e s . The d e t a i l s of the data extraction architecture and the implementation requirements are beyond the scope of t h i s paper and there w i l l be i n s t a l l a t i o n - s p e c i f i c comments. (An

extraction

implementation

is

briefly

described in the Appendix).

however, one general requirement f o r any data extraction system. pertains

to

the

data

municipal

kinds of extracted data to be developed from these sources.

is l i m i t e d to data which can be

related

networks,

numbers,

fashion.

budget

items,

requirement

aggregation functions of extraction and can be described by

considering examples of the data sources encountered in the

This

There i s ,

part

to

points

etc.,

or

should

areas.

governments

and

Consideration here Data

related

to

be handled in an analogous

t61 I,

Compatible data

This

is

the

easiest,

and fortunately

captured as specified in Section I I I . identified

with

the

most frequent s i t u a t i o n , i f data are

The data

in

source

files

geographic points (x,y) can be d i r e c t l y related.

which

can

be

I f the extracted

data base is to be relevant to a study of slum dwellings, f o r example, and i f health cases,

fire

alarms

and

building

code v i o l a t i o n s are a l l data sources which are

available at the event l e v e l , ( i . e . by address) then an extracted data table showing incidence

of

each of these events for specified address can be d i r e c t l y developed,

Another frequently used extracted data base is the tabulation of such event data geographical

area, in terms of a specified map. (The extracted data base in Figure

8 is an example of t h i s ) . matching

by

Extracted data

bases in

such cases

are

obtained

by

coordinates of events to the corresponding map areas (via point-in-polygon

processing of the event

coordinates

against

the

map boundaries

specification).

Figure lO i l l u s t r a t e s the extraction and aggregation to relate property (assessment) data and census data to support inquiries at

compatible

levels

(e.g,)

blocks

or

block groups), and f u r t h e r aggregation to support t r a n s i t planning models. 2.

Non-compatible area data

If

data

is

available

by

areas

in

the

source

files,

and these areas are not

compatible ( i . e . one map is not a subset of the other), then the extraction is

more complicated.

For a chosen set of variables from the source f i l e s , there is

a minimum level of aggregation at which an extracted data example,

school

attendance

areas

base is

possible.

For

and police beats (and therefore the associated

data) may only be compatible at the census (different)

process

tract

f i n e r p a r t i t i o n s of census t r a c t s .

level,

i,e.

they

may both

be

The extraction process should a l e r t

the user to the non-compatibility and display f o r the

user

the

minimum level

of

aggregation necessary f o r compatibility of the data sources of i n t e r e s t , in the form of a map, and permit the user to desired. If

the

user

desires

specify

an extracted

data

further

aggregation

base at

from

this

map as

a d e t a i l level f i n e r than is

compatible with the data sources given, the user must supply additional information. For

example,

suppose the

user is studying property values vs age d i s t r i b u t i o n of

inhabitants, with the age data on citizens available from the census only at tract

levels

of aggregation.

census

Compatability exists at the census t r a c t l e v e l .

f i n e r detailed extracted data base, at the c i t y block level for example, could

Any only

be developed i f the user is w i l l i n g to make assumptions (such as homogenity of the d i s t r i b u t i o n of population ages in the census t r a c t ) . V.

SUMMARYAND CONCLUSIONS

The development

of

information

for

decision-making

in

municipalities requires

integration of data from the various operational f i l e s which are generated in

local

162 government.

E v e n when an

possible to develop conjunction

with

integrated

integrated

data

a well-maintained

municipal

from

properly

data base does not e x i s t , i t is structured

Geographic Base F i l e .

source

files

in

The current sources of

information developed in m u n i c i p a l i t i e s , in p a r t i c u l a r the property data of the assessor

function

and

the

tax

operating f i l e s of various service d e l i v e r y functions,

provide a rich source of information, augmented by special collections such as

the

U.S. Census. Data

Extraction

source f i l e s to interface

to

is

the process of developing integrated data subsets from diverse

support large

interactive

data

bases of

subsetting and aggregation functions. extraction

is

useful

when the

problem

solving.

source

files

Extraction

provides

and provides data description,

Our experience with GADS has shown that

with a decision support system.

working

on unstructured problems.

on

a variety

of

computer

professional

The data extraction interface matches the

functional and response time requirements of i n t e r a c t i v e decision implemented

response)

These characteristics are l i k e l y to be

encountered when designing decision support systems for nonprogrammer, users

data

user or problem characteristics require access to

varying amounts, d e t a i l , and selection of data, and conversational (rapid interaction

the

support,

can

be

system configurations, and can reduce the

operating costs of the decision support system. Because data extraction operations can produce multiple extracted data different

structures~

a

decision support systems.

single

data

extraction

interface

In addition, existing decision

bases,

with

can support multiple

support

systems

supported and enhanced by data extraction without major program revisions.

can

be

163 APPENDIX: An Extraction System Implementation A project

in

the

IBM Research Division

has

Analysis and Display System (GADS as a vehicle solving

[I-4].

GADS supports

developed an i n t e r a c t i v e Geo-data for

studying

interactive

where the relevant data can be related to a geographic location. problems

for

which GADS has been used include:

was recognized

during

the

first

studies

police

The need for

of

p a r t i c u l a r , the need for data aggregated to a v a r i e t y of block,

Examples of

the

land use planning, police manpower

a l l o c a t i o n , school d i s t r i c t i n g , and commercial s i t e location. extraction

problem

nonprogrammer users solving unstructured problems

the

data

use of GADS. In

geographic

levels

(e.g.,

beat, census t r a c t , neighborhood), and changing data needs expressed

by users indicated the inadequacies of the s t a t i c , special purpose data base and the one-level, integrated data base approaches. GADS data extraction is configured e s s e n t i a l l y as shown in Figure 9.

The extraction

implemented in GADS is limited to compatible event data.

a requirement

There

is

that each record of each f i l e in the large data base contain a geographic code (such as an address, x,y coordinates or block related

to

number) so

that

extracted

points, l i n e s , and polygons on a map. A u t i l i t y

data

can

transform geographic codes into x,y coordinates i f necessary for data extraction display. Figure

A data 8.

be

program is provided to or

base developed by extraction is a table; an example is shown on

Adding

another

crime

type,

acres

of

commercial

land

use,

or

re-aggregating by census tracts would take only a few minutes. In

the

GADS implementation

the

large

data

base management system is a special

purpose one designed to handle fixed format f i l e s with no hierarchies groups.

Simultaneous

are not supported. from

the

access

fixed

representations

files.

binary, can

there is a u t i l i t y

to multiple f i l e s , and shared access to single f i l e s

be

Sequential

and d i r e c t

packed decimal, used.

access

and f l o a t i n g

description are

are

allows

capabilities. lla).

The subsetting language includes constructs f o r :

subsetting creating

of

Results from subsetting can be displayed as l i s t s (Figure l l c ) or as

on a map (Figure l l d ) .

subsetting is possible. select

Seven data

conditional subsetting or creation (IF, THEN, ELSE), and function c a l l s

(Figure l l b . ) .

facility

The

d i f f e r e n t formats to be used for the same

based on any arithmetic or logical combination of the items in a f i l e ,

locations

data

The entire large data base is stored on disk, and

implementation

allowed.

new items,

provided.

(binary)

for loading f i l e s from tapes.

f i l e or the same formats to be used for d i f f e r e n t f i l e s (Figure types

I/0

point

Figure I I gives examples of the data description and subsetting data

repeating

Multiple f i l e extractions are handled by consecutive extractions

individual

Character,

or

only is

those

Using the display c a p a b i l i t i e s , two dimensional

That i s , the user can draw a polygon

on the

screen,

and

elements of a f i l e whose location is within that polygon. This

much more user-oriented than algebraic specifications for subsetting,

164 and other graphic subsetting operators would be useful (e.g., display a l l the crimes of the same type as the one being pointed a t ) . The aggregation operations in the extracted

implementation

are

restricted

data base f o r the GADS analysis and display functions.

aggregated by areas of a map.

to

forming

I t is stored on disk, and is accessed by column name.

GADS is implemented in FORTRAN, but the data extraction components were in

PL/I

The combination system runs on the

12OK.

The

Separating extraction reduces the main storage requirement

or

S/370

limiting

about

the

data

rate

to

the

terminal

are

The the

facotrs in extraction response times ( i . e . selection and aggregation times

are negligible compared to I/O times). or

to

user terminals may be IBM 2250s or storage tube display terminals.

I/O time from the large data base, and

display

an

entire f i l e ,

Although f i v e minutes

extraction.

may be required

to

the user can see the results unfolding (e.g. the

selected items are l i s t e d as they are selected). during

IBM S/360

under the Time Sharing Option (TSO). The combination requires 220K bytes of

main storage.

list

implemented

because of i t s larger set of data types, and better functional s u i t a b i l i t y

for the extraction tasks. series

the

This data base is

Thus users seem w i l l i n g

to

wait

After a l l , the batch mode equivalent c a p a b i l i t i e s have response

times of days, and manual methods have response times of weeks or months.

165 REFERENCES [ I]

P.E. Mantey, J. L. Bennett, E. D. Carlson, Information for Problem Solving: The Development of an Interactive Geographic Information System. IEEE Int. Conf. on Communication, Vol. I I . Seattle, Wash. June 1973.

[ 2]

E. Jo Cristiani, R. J. Evey, R. E. Goldman, P. E. Mantey. "An Interactive System for Aiding Evaluation of Local Government Policies," IEEE Transactions on Systems, Man & Cybernetics, Vol. SMC-3, No. 2, March 1973, pp. 141-146.

[ 3]

E.D. Carlson, J. L. Bennett, G. M. Giddings and P. E. Mantey. "The Design and Evaluation of an Interactive Geo-data Analysis and Display System," Proceedings of the IFIP Congress 74, International Federation for Information Processing, Stockholm, August 1974. North Holland Publishing Company, Amsterdam, 1974.

[ 4]

E.D. Carlson and J. A. Sutton, A Case Study of Non-programmer Interactive Problem-Solving, IBM Research Report, RJ 1382, IBM Research Laboratory, San Jose, California, April 1974.

[ 5]

T.R. Hammer, R. E. Coughlin, E. T. Horn IV, "The Effect of a Large Urban Park on Real Estate Values," Journal of the American Institute of Planners, Vol. 40, No. 4, July 1974, pp. 274-277.

[ 6]

"City Hall's Approaching Revolution in Service Delivery," Nation's Cities, January 1972.

[ 7]

Conceptsof an Urban Management Information System," a Report to the City of New Haven, Connecticut, by Advanced Systems Development Division, IBM Corporation, Yorktown, January 1967.

[ 8]

A Municipal Information and Decision System. University of Southern California, School of Public Administration, 1968.

[ 9]

R.L. Stickrod and L. C. Martin. Data Processing: Analysis of Costs, Benefits, and Resource Allocations. Lane County, Oregon, Management Report, February, 1973.

[I0]

G.M. Giddings and E. D. Carlson, An Interactive System for Creating, Editing and Displaying a Geographic Base File. IBM Research Report, IBM Research Laboratory, San Jose, California, 1973.

[II]

D.C. Symons, A Parcel Geocoding System for Urban and Rural Information, Ottawa, Ontario, National Capital Commission, 1970.

[12]

U.S. Bureau of the Census, Census Use Study, The DIME Geocoding System Report No. 4, Washington D.C., 1970.

166

[13]

U. S. Bureau of the Census: Census Use Study, The DIME Edjtin ~ S~stem Washington D.C.~ 1970.

[14]

R. J u l i , Geo-modeling:

A Local Approach, Eugene, Oregon, Lane Council of

Governments, 1972.

[15]

R. D. Hogan, Remote #raphic Terminal and Urban Geographic Information System Demonstation, Gaithersburg, Maryland, IBM Federal Systems Center, 1968.

[16]

R. D. M e r r i l l , "Representation of Contours and Regions for Efficient Computer Search," Communications of the ACM, Vol. 16, No. 2, February 1973, pp. 69-82.

[17]

B. V. Saderholm, "Paper 'Keyboard' Runs Experimental IBM System," IBM Research Division Press Release, Yorktown Heights, N. Y., March 8, 1973.

[18]

R, M. Cyert, H. A. Simon, and D. B. Throw. Observation of a Business Decision.

[19]

Journal of Business, 29, (1956), 237-248.

D. M. S. Peace and R. S. Easterby. The Evaluation of User Interaction with Computer-based Management Information Systems. Human Factors, 15, April 1973, pp. 163-177.

AR.CF I N U M B E R.

Figure 1. Tabular display of assessment data

7145 6~65 6500 7!25 7200 71R~ 6995 6q55 7095 7!55 7005 6@95 7100 7055 7020 7!65 7190

YEAR __~ CONST SIZE

ABBYWOOD CT. 196.2 A BBvwooD CT: !9~2 ABBYWOOD CT. 1962 ASBV~Q~D CT: 1961 AB!NATE LN 1960 AS!~!^TE t~!e60 AS]NATE LN. I.o 6~0 A.S!NAT~ iN: --!q60 ABINATE LN. 1860 AB!NATE L,~!. . . . !<)6! ~BINATE [N. 1960 ~8I~:ATE i N , !qAO ABINATE LN. 1960 ~N^,TE t~: !060 A F T O N CT. 1960 ~TON CT!960 A~Tf]N C T . 1961

__A r)F)RE 9,°,

123-06-174 I00 12~--06--175 !02 123-06-176 104 123-06-!./-7 106 123--06-150 112 [ P ~ - O A - ! ~I !!4 123-06-1152 116 __123--06-!53 117 123-06-154 120 !_~--06--!58---]q? 123-06-160 III ~ 3 - 0 6 -]A1 113 123-06-162 115 ]P~--06--!63 !IY 123-06-184 31 1?~-oA-I~ 59 123--06--186 35

__~

~7NNN

I ~Nr~

1700 30000 !850 - 3 1 0 0 0 1900 32000 !750 -29000 1975 34000 !650 27000 1750 30000 1780 30000 1650 29000 1600 _27n(~0 1775 29000

2 8000 29000 27000 25~ 35000

1675 1650 1600 1650 1800

FLOOR . ~ _ . S ~.P n AREA VALUE

O.

100.

UARNAM[ ~P[ S ;N OGR

111~11 • UNIT -1 19

.

US[MAP g

if!:

i'.z'"

i

Figure 2. Histogram display of housing values

RETURN

R[DR~M

~UIOSC~L(

OUPS

ZOHESYMD

ZONE

ZlliJl.

co

l

MAP

6

SUBMAP

EXPAND

SHRINK

NORMAL

STATEMENTS

Figure 3a. Map display of relative housing values

OU[RLAY

ZONESYMrB

ZONE-UALUES

CLEAR

SSMODE RETURN

FIND

ZONE

?

ZON£S

cO

M#P

I

e

SUBMAP EXPAND

-

_

NORMAL

STATEMENTS

%

"+~'F ~ l_-"~_

SHRINK

*

+

Figure 3b. Simplified display of housing values

OVERL#Y

~ONESYMB

÷+.~*+...÷

._*÷.,

ZONE-UALUES

*

+

MAPS

I

CLEAR

SSMODE

"~

RETURN

g

FIND

ZONE

?

ZONE1 t

C)

171

AVE $ IN

AVERAGE IHCOtIE PER HOUSEHOLD

ACCESS $

ACCESSABILITY TO DISPOSABLE INCOME

ACCESS

ACCESSABILtTY TO EHPLOYMENT

AVLAND-S AVLAND-M AVLAND-C AVLAND-I

ACRES ACRES ACRES ACRES

GROWl NXS GROH NXM

GROWTH FACTOR IH SINGLE FAMILY GROWTH FACTOR IH MULTIPLE FAMILY

ODWU/A-S ODWU/A-M ODNU/A-T

NO SINGLE FAMILY DWELLIt!G UNITS PER RESIDENTIAL NO MULTIPLE FAMILY DWELLING UNITS PER RESIDENTI TOTAL NO DUELLING UNITS PER ACRE OF RESIDENTIAL

EMP-MFG EHP-WHOL EMP-COMH EMP-TC&U EMP-GOVT EHP-TOTL

NO. OF Et,IPLOYEES HORKING NO. OF EHPLOYEES WORKING NO. OF EHPLOYEES WORKING NO. OF EMPLOYEES WORKING NO. OF EMPLOYEES WORKING TOTAL NO OF EtIPLOYEES

EHPDEN-C

NO. COHHERCIAL EMPLOYEES PER ACRE OF COMMERCIAL

HHOLD0-6 HHLD6-10 HHLD1015 HHLDZ5+

NO. NO. NO. NO.

ISHOPCTR IBAY

ZONES OF ANTICIPATED SHOPPING CENTERS ZONES TREATED AS BAYLANDS

OCDWUN-S OCDWUN-M OCDHUN-T

NO. OF EXISTING SINGLE FAMILY DWELLING UNITS NO. OF MULTIPLE FAMILY DWELLING UNITS TOTAL NO. OF SINGLE AND MULTIPLE FAMILY DWELLIN

OCLAND-S OCLAND-M OCLAND-C OCLAND-I

ACRES ACRES ACRES ACRES

PRODEN-S PRODEN-M

PROJECTED DEI;SITY FOR SINGLE FAMILY DEVELOPMENT PROJECTED DEHSITY FOR M U L T I P L E - F A M I L Y DEVELOPME

POPUL-S POPUL-M POPUL-T

TOTAL POPULATION IN SINGLE FAMILY D#IELLINGS TOTAL POPULATION IN MUt_TIPLE FAHtLY DWELLINGS TOTAL POPULATION

POP/HH-S POP/HH-M POP/HH-T

POPULATION PER HOUSEHOLD FOR SINGLE FAMILY DWEL POPULATION PER HOUSEHOLD FOR MULTIPLE FAMILY DW POPULATION PER HOUSEHOLD FOR ALL DWELLINGS

RES-LAND

RESERVED LAND-NOT

OF OF OF OF

AVAILABLE AVAILABLE AVAILABLE AVAILABLE

FOR FOR FOR FOR

HOUSEHOLDS HOUSEHOLDS HOUSEHOLDS HOUSEHOLDS

OCCUPIED OCCUPIED OCCUPIED OCCUPIED

BY BY BY BY

SINGLE FAMILY DEVELOPMENT MULTIPLE FAMILY DEVELOPMENT COMMERCIAL DEVELOPMENT I~DUSTRIAL DEVELOPMENT

WfTH HITH ~qlTH I'IITH

IN IN IN IN IN

MANUFACTURING WHOLESALE AND TRUCK COMMERCIAL(RETAIL) TRANS, COMMUN,AND U GOVERNMENT

IHCOME: 0- 6000 INCOME: 6 0 0 0 - 1 0 0 0 0 INCOME:lO000-15000 INCOME:IS000+

SINGLE FAMILY DWELLINGS MULTIPLE FAMILY DWELLINGS COMMERCIAL DEVELOPMENT INDUSTRIAL DEVELOPMENT

AVAILABLE FOR CURRENT DEVELOP

SLOPE

PERCENT OF LAND tilTH

TSKTR

CHANGE FACTOR IH INTRAZONAL TRAVEL TIME

DISP SIN

DISPOSABLE I I;COHE PER ItOUSEHOLD

S EI'IER FLOOD

LESS TH#N 10~ SLOPE

SEWER SERVICE DISTRICT FLOOD CONTROL AREA

Figure 4. Land use/transportation planning data base [2]

N 1 y2

~--5[

I AG06 ]

Single Faro.

._ D~ S 2 G ~

P i A 2 G3

N ~-

Alley

N ~ y2

Condominium

•

Fireplace Cost

IAK!9

PoolArea Pool Extras Mi sc, Cost

N 1 y~ N'; Y~ N I y2

Nt

y:Z

Gr.do Bank

A!~ I L I I

"

A

IL°'l

N;"Y ~

.

J

ii

IAL0i I .!~ ~3 / AL0_~ N'Y___L IA.B3 I N'Y'

TOPOGRAPHY

Misc, Stroct. Cost I AK22

IAK21

]AK20

IAK18

Patio Factor

IAK17 N 1 y2

N'~ -.r,2 L'~ M2 H3

__

J.AKml ! i i I

. ~ A K I ~

N I y2

Patio Area

~st--

A~13 AK14

AK11 I

Slope

~25

A J22

' ' ' ~ ~ - ' - -

~08

AK_~ ~

~,~t i~ i

N 1 y; N I y2

Z

I"I' l ! ! ,,,,,i - - -

I i i i

,-t1~.

!

! TT--

i i

Carport Area Pch. A~'ea Pch. Factor

N ~ y2

$

Car~ Factor

Addn. Factor

Other

Ti le Roof

Stru. Failure

Built~Ins

Fireplace

A J21

Guest House

Factor

N 1 y2 N ~ y2

P~ A2 G3 N I y2

P, A' G3 P' A' ~

Bsmt. Addn. Area

Base,ot

A,!~ ....

N 1 y2 N' Y~

N 1 y2

~20

A J18

A~m AJ17

A Jt4

A J12 AJ13 A J13

I AJ l l /

j~9/ t ~'° /

Fence

Pool

Decking

Patio

Cavort

Gar, Cony,

Gar.

Cooling (Ducted)

Heating (Ducted)

Stg. Space

~or.=nshi~

cond,,,o~

AK05

3~d Fir, Factor AK__~

AK04

3rd Flr. Area

AZ02i

~o3

Bsmt. Area

I

AK01 i } i--i-----

2~dElf. E~tor

1st Flr. Area

I

AC-~ - - ~

__SHEETS

ACO5 I i i i --

2ndFlr, Area

FIELD USE oNLY - D O NOT ENCODE

Horses

1 I

AI27 I

~i26

N 1 y2

H ~ A3 L 3

P ~A 2 G3

N 1 y2 N'Y 2

N 1 y2

N1 y N 1 y2

N ~ y2

N t Y~ N I yZ

Figure 5. Field w o r k sheet for appraisal data

AO~2E~ Venturm Ooun1¥ A i l I I I ~ ¢

l AH03l Nr'y~

N' Y;~

AH0,21

--Prop.Lot Utit,

Beach Front F

Docking Rights

P ~A ~ G3

AH01

A 25

Water Front

Prop. hnproved

At24

AI23 I

AI21 I AI2~

Traffic Flow

View Oual.

Sewer View

AI20 I

H. & B. Use

i

At 18 t AI19 I

"

A117

~i16

AI15 I

All3 I " 7 ~ A,,.___t .' Y'

~til.ooms __1~.]~ ~,.ooma IAC='i i i

A,,O I .'~ A,. I "'Y=

Common Green Common Rec.

St. Lights

Sidewalks

Curbs & Gutters

Ut,, O~G

Arch. Attr.

N ' y2

" - ' ~ Y-2

TOTAL PROPERTY

~

Trans.Trend

J AGO1 I

IMMEDIATE AREA

Res. Area

Mkt, Demand

I AF03 |

I ~;;

~

Other

TEMPORARY VALUE

Bo~rdAc,~o.

Sales Price Confirmed

~ . 0 ~

I~j__N I~H__ I-I-

~'_.

IREL. A , ~ ~ A,0. L " ~ ~ ~ . 0 .

~.---~t.TActua,} _ ~ ~

~co,,~ so Ft ,0saa.,e) Total Prop. Val. A~,, i ~" T~0ie., Land Value AB04 t ,+,,, ~ ~ l i I ,r.egata. Imp. Va,ue AB061 i i i i i New Lot Value AE02 [ ~I :I =I :I_ I I .ooT.ruS. $~ESOATA St-Frontage lee'd, Date A~__~T I l ~ I I F4-1- 4

Site Use Code

Zone

OF _ _ _

COST DATA Total Living Area

BUILDING--DATA

Appraiser No.

~L

SHEET. _ _ _ _

APN (AA01)

Quality Class

LAND ATTRILBUTES _ _

SITUS (AA10) --

RESIDENTIAL FACT SHEET

App'l. Date

EOCK__

NEIGHBORHOOD (AC17)

LOT - - - -

RECORD DATA

TRACT ( A C 0 6 ) _ _

DIS[RICT (AC16)_

t~

173 Office of the County Assessor 201 County Admirfistrat~on Building 70 West Heddin9 Street San Jose, California 95114 299-3941 AreaCode 408

County of Santa Clara California

Date Recorded Recorder's Deed # Property Description # Our records indicate you purchased this p r o p e r t y . ~ What was the full price? a.

Amount of cash down payment:

b.

Please enter details concerning any balance: (I) Ist Deed of Trust $ Duration of L o a n y e a r s (2) 2nd Deed of Trust $ Duration of Loan

at

% interest

at

% interest

years

c.

Was a trade involved?

d.

Outstanding Improvement Bonds against property

Yes

No

e.

Did price include personal property?

$

Yes

No

If yes, please estimate value of such property $ f.

If this is income property, please enter the monthly gross income as if it were 100% occupied $

2.

Remarks: Please enter on the reverse side or by attachment any information you feel may help us to make a fair appraisal of your property.

3.

If you would like mail concerning this property sent to a different address from the one above, please indicate below.*

4.

If there are questions regarding this questionnaire please contact the Assessment Standards Division at 299-3941.

5.

See

Reverse

Side.

Signature of Owner Address Telephone Number City, State, Zip

Date (~, 7047 REV, 11/71 .~

F i g u r e 6.

Assessor's questionnaire

for financial data

Block 8

~21

19

16

17 < -& 2nd Street

Block 5

1st Street

Block 2

Block 9

= 22 segments(records)

Name

12

Legend Type

Su{fix

1 2B1lOck 10

11\

D

B,ook7

Block 3

From node m~mber, x, y, map number To node number, x, y, map number Left tract, right tract, left place, right place, {eft block, right b~ock Left Low address, high address Right low address, high address

13

7 ~ 6

Figure 7. 'DIME' geographic base file structure

Contents of a Sample Record Contents Maple Av. N.W. 2 1530000 31000 18 1530100 30100 205 205 42 42 5 100 198 101 199

Number of Segments Maple Av. 5 Elm Dr, 3 Coyote Cr, 6 1st St. 4 2nd St. 4

O

2O

Block 4

1®Block 1

Source Map

Map 6 CensusTract 205 Place42

:3 4~

9

26

N

15

50 53 41

Houses...

47

146 203 192

Population

Figure 8. Example of tables in extracted data base

15 6 4

3 17 4

Crimes . . . #2

1 2 3

~[ Year 10 I Year 2 Year 1 i Crimes #1 Zone II

--4

176

r l Data I Extraction

I Functions/

Presentation

I t

Aggregation

II l Data Description

Subsetting

I

h I / I i:~X4

[°'''Ex"iracted 1 Data Base I Management

J i\\

Large Data Base Management

F

' I

;

;

©-.-©

Large Data Base

Figure 9. Interactive data extraction and problem solving system

177

Transportation Zones Data Base

DynamicallyAggregated Property and Census Data (Compatible)

II CensusCompatible Property Data

T Property I nfo. System

Added Data (Public Lands, etc.)

Figure 10. Extraction and aggregation relating property and censusdata

Z

U

m--W X.J

Wm.-$

I--

Ol.b~.Z W W2> ..JW

JO ::)

0

Z

H

W Z

~t W U

Z O H I---

Ul Z O ~ 1"~ U H

¢~

W gL

H

1E

Z O H

0

Z O ~-i I"-~t U lUL

ell

"~ Y Z~

gL E Z 0

Z L~ 2>

U~ I-Z W 2> w

_J

gl p..,~ H

Z o'1

~9

Z W

Z

H

O

Z

~J

o

E

2~

E

E

o~

o

x LU

o') LL

[RAS[

R[STOR[

SCRRTCHPRD

GET PRG[ 6 UZ(~ T I T L [ ~ ~ ' " SRU£ PAGE R(DRRU ......

Figure 11a. Data description function

]LIPUT RODE R ( P L R C [ ROD[ sON CH[CK RETURN

m

m

m

m

m

DELETE L I N E INSERT BLANK L Z N [ COPY L I N E DOWN PRZNT

DATA D [ | C R Z P T Z O N |

ZBfl [HPLOYEE F Z L [ ~ZTH X , Y [RP S2 HATCH KEY rROfl UNXflRTCH ~'14TCH CH 13 1 STR[[T HUR|[R r r N o ¢H S fLANK1CH t STRE[T NRH[ Sr~rN~R[ CH 16 9TRE[T TYP[ rTTYPE CH 2 NRfl£ L~TY CH 10 2"IP cz s COD( rLRNK2 CH 1 COORDZNAT£ CZ 7 rt.~NK3 CH 2 COORDZNRT[ TC! 6

C R [ R T [ AND [ D Z T

~0

=ON

ynJ;

X~t

;

S[L.[¢T][ONS

ERAS[

RESTORE

SCRATCHPAD

GET PAGE 15 VIEW TITLE~"-'-SAVE PAGE 15 REDRAM ........

~ 95125

TN[N

THEN

AND [ D I T

Figure 11b. Data selection function

I N P U T HODE R E P L A C E MODE CHECK RETURN

ZIP

OR Y ) l l l i J J 8 8 0

Y(=m

MNI[R[

OR X ~ 2 t t i a i l

X
~LECT

L,P

CRERT£

D E L E T E L%NE I N S E R T BLANK L I N E COPY L I N £ DONN PRINT

0

8

8

e

e~94~ ~o918 e~870 ee91% 81674 E1436

01930 01742 81348 a1551 01482 81623 015~8 81550 81060 018B8 81824 81479 ~1514 01502 02243 ~1656

~1~54

01499

81539

01845 11608 81631 I1191 81148 01487

18188

SELECTION

HUMMINGBIRD HUMMINGBIRD HUMMINGBIRD HUMMINGBIRD HUSTED HUSTED

GLENUNR GL[NUNA GLENUNA GLENWOOD GRACE GRACE GRACE GRACE GUADALAJARA HAMILTON HARMIL HAZELWOOD HERVEY HERUEY HICKS HICKS HILLSD~LE

GLENPINE

GLENPIN[

GLENN GLENPINE

GLENN

GLENEYRI[ GL£N[YRI[ GL£NFIELD GLENFI£LD

Figure 1 lc. Listing of data selected

LISTING: F I L E = £MP USING P l P 2 : NO

MT8%OB~4 MT~00004 MTgg~ge4 HT8008~4 MTee0004 MTeeeee¢

GRADDRESS

MT8580B4 MT008082 MT080804 MT008802 MT080004 MT@08084 MT~08004 MT000084 MTg00284 MT~08~84 MT000804 MTeeeeQ4 MT8808~4 MT~gO~4 MT000004

MT808084

HT000004

NTI818~2 NT~I@EB4 MT8~18~4 flT011804 MTmeeSg4 hTlleli4 MT~llil4

SANJOSE SaN/OS£ 9AN30SE 9RNJOSE SANIOSE SANIOSE

= EMP

DR DR DR DR AU ~V

AU SANJOS£ S~NJOS[ AU SANJOSE 9RNJOSE AU S A N I O S E RU 9 A N 3 0 S E AU SAN3OSE AU SAN30SE AU SAN/OSE ~Y 6ANIOSE NY SANIOSE AU S A N 3 0 S E LN SAN/OSE LN SANJOSE AU SANJOSE AU SANJOSE AU 5ANZOSE 1589186 I599225 1599225 1599225 1591335 1594501

1596148 1596628 159614# 1592291 1587514 1587719 1587514 1587687 1592633 1591109 1596299 1596056 1596632 1597340 1591625 1592833

1591723 1598839 1592581 159237? 1591593 1591758

RETURN PAGE FORWARD ~AGE 5

95125 95125 95125 95125 95125 95125

95125 85125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 85125 95125

95125

95125

D~ S A N 3 O S [ DR S ~ N 3 0 9 E

S5125 85128 95125 95125 95125 95125 95125

SAN30$E SANJO8£ 9AN~OB[ SAN308E 8AN30SE SANJOSE SAN30S[

AU DR DR AU AU DR

289587 289352 289352 289352 285412 28765~

293696 292822 293698 293314 293095 294192 283085 293232 267304 292471 293944 292681 297084 296305 293923 289482

g

29|3|4 2979E1 28$780 285847 296320 296376 e

PlP2:

EXPI~ND

SNRINK

SELECTION

[MP

NORMRL

=

DISPLAY SCALE TO PIP2

Figure 11d. Map showing location of selected data

SET

M'ClP:

EMP

=

DR~W

FILE

5~ EVENTS RETURN EI~ASE CENTER

DATA BASE USER LANGOAGES FOR THE N O N - P R O G R A M M E R Peter C. Lockemann Fakultaet fuer Informatik,

Universitaet Karlsruhe

D-75 Karlsruhe

1

ADstract in

light

base These

of

systems are

the

necessary

investments,

c o m m e r c i a l l y available data

usually offer c o m p a r a t i v e l y g e n e r a l - p u r p o s e

s u i t a b l e only for the data base specialist.

interfaces.

In order for a

aata base system to attract non-programmer users, interfaces must be provided that approximate the special user terminology and conceptualizations, heterogeneous Questions should

of be

implementation for selecting

if,

in

a

variety

group, interest

are

standardized,

particular, of

these

interfaces

users be

form

a

required.

then the extent to which user interfaces the

techniques

which

allow

rapid

of new more specialized interfaces, or the p r o c e d u r e the most suitable inter£ace for a given problem. Based

on the concept of h i e r a r c h y of abstract m a c h i n e s , possible

approach

will

introduced

be

will

the paper presents a

to the solution of these questions. to critically examine

some of its merits and shortcomings.

Three examples

the concept and d e m o n s t r a t e

184

1 Introduction The

success

or

weil-conceiveo decided often

by

of

may

appear

it

by

towards

in£ormational status

of

to

to

base

the

system,

author's mind,

of the

ones.

the analysis

such

o£

as

analyzing

system

and the n e c e s s a r y integration,

structure,

adaptation

of

old

must

use the system.

the needs o£ the o r g a n i z a t i o n

are time

o£

the

resources

and

All too often, much less attention who

the

the current

a number of requirements

information

relinquishment

individuals

to a p p r e c i a t e

problems

extent

structure, new

how

is ultimately

institution or organization,

information

organizational

matter

to serve. T~lis aspect is

flow within the o r g a n i z a t i o n

the

no

p l a n n e r s wr~o devote almost their entire

an

From

as

characteristics,

to

of

it.

such

expected

syste~

in£ormation

provision

data

organizational

needs

improvements deriveG

a

the users the system is s u p p o s e d

overlooked

effort

paid

failure

is being

They are simply

and to adapt most

w i l l i n g l y to the new environment. Human to

nature~

the

same

problems

however~

is conservative.

terminology

unless

and

and

until

Human individuals will cling

m e t h o d o l o g y and try to solve the same

one can make a most c o n v i n c i n g point

for

reorientation. In m a n y cases data base systems are not even introduced to solve new Kinds o£ problems. Rather they are supposed to improve the

solution

least

use

to existing and already w e l l - u n d e r s t o o d problems,

these

circumstances

there

radical changes Unfortunatelyw one data

side by

is

no

a

point

potential

departure.

Under

or at these

represents

each

o~ a data base system this is just

For him, the d e v e l o p m e n t

of

a

a

sales

large

general-purpose

and i m p l e m e n t a t i o n of a

investment which he can only

£igures.

This

precludes

large v a r i e t y of individual

him to offer g e n e r a l - p u r p o s e

these

o~

reason w~y users snould De burdened with

for the m a n u f a c t u r e r

corresponding to

as

in style.

system

attending compels

is

of a coin°

base

justify

problems

interfaces

user who has his own special

interfaces.

him from

user needs Dut

On the other hand

it

that prove repugnant to m a n y a terminology,

conceptualizations

and a p p l i c a t i o n problems. In order to resolve the dilemma, techniques must be d e v e l o p e d that permit the a d a p t a t i o n of a data base system to various user needs. In particular, questions. (i) How

the can

operational system?

s o l u t i o n s should address user and

language management

themselves to the following

interfaces

be

separated

characteristics

o~

from

the

the data base

185

(if)

Are

there

any

the

rapid

implementation of a user language according to given

techniques that allow,

in a systematic way,

for

specifications? (iii) To (iv)

which

extent

it e c o n o m i c a l l y

feasible to c o n s t r u c t

"off-the-shelf"

Given

set of language specifications,

can

a one

define

build a

answer

between

under which c o n d i t i o n s

on

user

and d e t e r m i n e s

languages

that

formalizes these

the amount of effort required?

these q u e s t i o n s we shall define a hierarchical

user

and

user languages?

upon an already existing user language? Can one

relation

conditions To

is

stockpile

languages.

The

nature

of

the

relationship

relationship

will be

discussed

in

explicate

the approach and to point out its merits as well as some of

its

some detail. A number of examples will be introduced to

present

shortcomings.

non-procedural

interactive

The

discussion

is intended b a s i c a l l y for

languages.

2 Hier@rcni,es O f user lang,uages 2.1 Concepts The

hierarchy

of language

interfaces shall be defined as follows

[Kr

75]: -

Each interface

is defined in terms of a ("lower")

itself serve as the basis for definition -

There

is

another other Such

a

interface,

of a ("higher")

exactly one interface which cannot be defined interface

and

hence

and may

interface. in terms of

serves as the ultimate basis for all

interfaces. h i e r a r c h y of interfaces may be g r a p h i c a l l y represented

form of a tree where each node corresponds to a particular level 3

level

2

level 1

level figure

I

in the

interface.

0 (DBMS)

186

The

hierarchy

must

be

chosen

such that it reflects a h i e r a r c h y of

users. Level 0 c o r r e s p o n d s to the data base specialist, while level might cater to a user c o m p l e t e l y untrained in computer affairs. The

previous

questions

can

now

3

be restated with a little bit more

precision. (i)

Can

all

fundamental

(ii)

What are the formal c r i t e r i a that allow to c o n s t r u c t a h i e r a r c h y

solved u n d e r n e a t h

operational

and

management

functions be

the basis on level 0?

by defining new languages in terms of existing ones? (iii) Up to which level in the tree should interfaces be s t a n d a r d i z e d ? (iv) Suppose a given language s p e c i f i c a t i o n is represented as a node. Can a path

to an existing node De constructed,

path be "measured"? Can one d e t e r m i n e

the

length?

I£

the

introduced, At

tnis

point

path is too long,

and the length of

the path with m i n i m u m

should

intermediate nodes be

and what would be their s p e c i f i c a t i o n s ? in time,

"length"

is no more than an intuitive notion

for which a formal measure does not exist.

However,

a rough outline of

the a e f i n i t i o n

of one node in terms o£ another one may often give some

insight

the

into

amount

of

effort

necessary

and thus provide an

estimate of the length. Language

hierarchies

programming languages

(e.g.

programming languages

have

languagesf PL

languages [SI

741).

368

long e.g. [Wi

been

mentioned

Assembler 68],

ESPOL

-

[Bu

72])

- Very high level languages However,

except

for

being

in that

terms this

o£ a lower-level would

entail

programming

inefficient

programming -

High-level

(e.g., set o r i e n t e d

macro languages

rarely c o n f o r m to the strict d e f i n i t i o n given above defined

in c o n n e c t i o n with

Low-level

these do

(e.g. COBOL is not

language),

compilation.

the reason The

same

argument does not hold for data base languages where language analysis is but a minor part of query p r o c e s s i n g

[Kr 75].

2.2 E x p l i c a t i o n s The

notion of h i e r a r c h y as introduced above is still vague and should

be made more precise. are introducede

B e l o w several c o n c e p t s

known from the l i t e r a t u r e

Their u s e f u l n e s s as well as some of their d e f i c i e n c i e s

will be d i s c u s s e d

in the remainder of the paper.

187

(i) C h a r a c t e r i s t i c s There

exist

basis

for

these

claims

have

to

such

schools that claim to provide the just and only

base

one

meet.

concepts.

ought

It

is

Before one may pass any judgment on

to agree on the criteria that a basis would commonly

accepted

that a data base is to be

as the model of a certain reality.

that

physical authors

several data

considered

of the root.

it or

Hence a basis should be

p r o v i d e s concepts so p r i m i t i v e that any reality, conceptual,

could

be

adequately

covered

be it

by it. Some

lab 74, Su 74] have attempted to enumerate certain primitives:

elementary types),

objects,

names,

manipulating

properties,

relations,

orderings,

categories

as well as sets of operators for creating,

and

organizational

deleting

these.

In

addition,

(or

accessing,

one might consider

q u e s t i o n s such as p a r a l l e l i s m and sharing of models by

various users. (2) D e p e n d e n c i e s b e t w e e n successive nodes. Since to

it

the

is e x t r e m e l y general, average

possible

user.

realities

Users

but

the root is of little practical value are

with

invariably concerned not with all

certain classes of realities,

their models to reflect the c o r r e s p o n d i n g the

modeling

tools

on

and wish

limitations.. In other words,

level 1 will differ

from those on level 0 by

defining certain restrictions on the way the p r i m i t i v e s may interact. The same o b v i o u s l y is true for level 2 vis-a-vis level i, etc. These restrictions composed

relate

into

operations

new

mainly

to

objects,

the

manner

relations

in which objects may be

into

new

relations,

and/or

into new operations.

(3) C h a r a c t e r i z a t i o n of a node as an abstract machine. Basically, determine little

the the

bit

introduced. operators

restrictions dependencies more

An for

between successive nodes.

precise,

abstract

defined on the p e r m i s s i b l e c o m p o s i t i o n s the

machine

manipulating

concept

of

To make this a

abstract

machine

is

is a set of object types, a set of

objects

and

de£ined

on

object types,

together with a control m e c h a n i s m that allows to construct and execute sequences

of

operations.

Each node is then described

in terms of an

abstract machine. (4) D e p e n d e n c i e s between abstract machines. By

assigning

an

abstract

machine

to

each

node,

the

following

properties must hold between two successive nodes A i and Ai+ 1 [Go 73]:

188

a)

b)

The resources and the functions provided by A i form the complete Oasis on which to build Ai+ I. There is no way to use properties of Ai- 1 in building Ai+ I. Hence every A i is a complete interface description in the hierarchy. Resources of A i used in defining new resources of Ai+ 1 can no longer be present in Ai+ 1 (i.e. they may become resources of Ai+ 1 only i~ they are not part of a definition for another resource o~ Ai+l).

Keeping these rules in mina I shall attempt, as a matter o£ illustration, a tentative classification of some results discussed in the literature [Ab 74, Co 7~, We 74, Kr 75, Wo 68, Wo 73, Gr 69, Col 68]. ~

SQUARE

SEQUEL

l~re~

~

Lunar

r-----irestricted ~ restricted J Jnatural j Jnatural ~nglish ~English Relational model

]

3

--...

I

~redicate logic

!

_ _I-

~

Pharmacy

~-L-~estricted j Jnatural ~German

I jsemantic J jset ~ - Ljprimitives_ _ ! -- I theory

.......

figure 2

2.3 Consequences The concepts and rules introduced above impose a certain discipline on the design of user languages, on their application, and on the transition between them. Some of the consequences are outlined below. (i) If we strictly keep to the rules above, a new interface must be defined in terms of its immediate predecessor and not any arbitrarily chosen predecessor, i.e° immediate predecessors must not be bypassed ("stepwise abstraction"). On the other hand, given certain specifications and a suitable node in a tree, intermediate nodes that hopefully are of general usefulness should be introduced on the intervening path whenever the path proves too "long" ("stepwise refinement").

189

(2) Given

a path to the root, a user should De put into p o s i t i o n - at

least

in

principle

languages

-

to

formulate

his requests in any o~ the

that correspond to the nodes on the path. As a matter of

fact, we found this an essential p r e r e q u i s i t e testing

any desired level of detail (3) Queries

are

stated

on

some

level

between

levels

Definition

(of

abstract machine)

other:

on the higher (4) Results

an

until

and

the

must

root

successively

has

been

be

reached.

and translation r e c i p r o c a t e

The definition of the next higher

one d e t e r m i n e s

the

SYstem

[Kr 75].

translated each

for efficient

since system activities may be observed and c o n t r o l l e d to

level £rom a given

the rules that govern the translation of statements level to those on the lower level.

are p r o d u c e d on the lowest level but must be presented to

user

on

evaluation invoked

a

of

higher a

level.

query

a second

in order to propagate

As

a consequence,

("reverse")

following the

translation must be

the results to higher levels.

3 Set theoretic basis 3.1 M o t i v a t i o n The

rules of ch.2 have been applied to the construction o~ the KAIFAS

question-answering

system

this

be

system

will

practicability

of

system the reader Restrictions realities that

relations

exclusively meet

or

on

rules.

regard

are

to

to

more

[Kr 75].

the general basis are m o t i v a t e d by the

consider.

exclusively

and,

the

For a more d e t a i l e d d e s c r i p t i o n of the

is referred to the literature

wishes

relations

binary

the

with

one

and have proven highly useful there. Hence

chosen as the first example to d e m o n s t r a t e

of

In the case of KAIFAS we presume the p r o p e r t y type

important,

that

objects

(sets) or are are selected

the basis of given p r o p e r t i e s or relations which they

undergo, perhaps in logical combination.

Indeed one can show

that the set theoretic approach may be viewed as a g e n e r a l i z a t i o n of the inverted file technique [Kr 75].

190

3.2 Set theoretic machine O biect

tlpes_

I Elementary

objects

Aspirin M Sets, e.g.

city,

(individuals),

e.go ~ans Maier,

Bonn,

medication

List of individuals. R Relations,

e.g.

~ather,

contraindication

List o~ ordered pairs of individuals. Z Numbers D MeasuresF

e.g.

Ordered pairs

2 years, (number,

4 tablets/day unit expression).

F Measure functions~ e.g. age, dosage Lists of ordered n-tuples whose last components measures.

are

B Truth values ~perators On retrieval Set,

the machine

relation~

and

is supposed

function

to function

names

refer

in the £ollowing

to

objects

way.

in permanent

storage° In order to manipulate the objects they must be trans£erred into unnamed registers o£ which an unlimited number is thought to exist.

Hence

all

operations

register-to-register

except

for

the

load

operations

are

operations.

Load operator__ss Load

Mw, ev, en, ef

a

function,

set,

a

relation

(ev, en),

respectively.

Set operators MU: Mx~-~M Union Mn: MxM-~M Km: MxM->N Kz: M->Z

Intersection Relative complement Cardinality

Binary relation

{x[xeMiAx@M2}

operators

Ko: R-~R Rb: RxM-~R

Converse relation Restriction { (x,y) ~ (x,y)eRAxeM}

Rp: KxR->R RU: RxR-~R

Product Union

Reduction Vo:

{(x,y)~ 3 z:(x,z)eRIA(Z,Y)eR2}

of binar~ relations

R-~

Domain

{xI3y:(x,y)eR}

and a measure

191

Range

{xJ3y:(y,x)eR}

Na:

R-~M

Vg:

RxI-~M

Individual

domain

Ng:

RxI-~M

Individual

range

VgU:

RxM-~M

Restricted

domain

Reduction of measure Fw: FxI->D (n=2)

{xJ(x,I)eR} {x~(I,x)eR} {xl(x,y)eR^yeM}

functions

Logical 0Perators e: IXM-~B Test on set membership c:

In

MxM-~B

addition,

the

standard

Test on set inclusion

the standard arithmetic

logical operators and

comparison

are available

operators

as well as

for numbers

and

measures. Control m e c h a n i s m Sequencing

of operations

"Programs"

for the set theoretic machine

notation. Operations are performed nested argument, from inside out. Example:

A

question

such

would take the following c(Mw(Mcity),

are expressed

in a functional

from left to right and,

as "Are cities birthplaces

~or each

of engineers?"

form in the set theoretic machine

VgU(en(Rbirthplace),

Mw(Mengineer)))

Loops Loops are introduced three arguments:

by

resulting

the

use of bounded quanti£iers

i)

An expression

2)

An

3)

The name of a bound variable; invocation of the loop.

expression

(scope);

for

in a set of objects condition

it may be regarded

Important q u a n t i f i e r s are AL: MxB -~B all, every EI: MxB -~B some DB: MxB -~M

the

which

which nave

(range).

resulting

in

a

truth value

as the loop body. each o£ its substitutions

defines

an

192

ZB: Mx~ ->Z how many with the le£t-hane ~

the

set

bounding

and

the

le~t-nand

5 tne

conoition. Zxamples : DB

(x~Mw(~city) ~ e ( x r V g O ( e n ( R b i r t n p l a c e ) , M W ( M e n g i n e e r ) ) )

with the meaning DB

o£ "~nicn cities

are birthplaces

)

o£ engineers".

(x I , Mw (~manu f) ZB(x2, Vg(en(Rprod) ,Xl) , DB(x 3 , l~w (~lailment) , e(x2, Vg(en(Rmedic) ,x3)))))

with the meaning of "How many products m e d i c a t i o n s £or which ailments?" ~x~ressions Set

o£ which m a n u f a c t u r e r s

are

in the data base

membership

represen£ation

o£ an arbitrary o~

a

set,

~ind is expressed

arbitrary

set

Dy including,

expressions.

in the

Example

(in

German): Mrezeptp£1ichtig Ispasmocibalgin Vg(en(RDerivat), IOxazolidin)

®

IMorpnin Mw(MOpiate)

®

MW(MHypnotiKa) IMethadon Vg(en(RDerivat),

IS uccinimid)

Vg(en(RHeilmittel), where

~

indicates

drugs, Q a l l Tais

concept

all

opiates, is

its advantages are: - Since all objects

IAgitiertheit)

derivates

of

Oxazolidin

to be prescription

etc.

extended

to relations

and measure

functions.

are e v a l u a t e d on request only, changes

Dase may De made locally without that may exist.

Two of

to the data

regard to any interrelationships

193

- Expressions individuals

may be stored without regard for the existence of any for it. Hence one could construct a data base consisting

exclusively

of higher-order

One consequence, however, defined recursively since

relationships.

is that the control mechanism must itself be it may be invoked on any load operation.

3~3 Natu~@ 1 !anguage Few

users

will

feel

at

ease

with

the

highly

stylized

language

introduced in sec. 3.2. One possible step of abstraction, therefore, is the definition of a new abstract machine accepting natural language input. By necessity this is a highly restricted form of natural language

since

its semantics,

and hence

its syntactic

forms,

can be no

more than what may ultimately be reduced to a set theoretic interpretation. Moreover, it must be considered more restrictive than the set theoretic interface because while one may nest set theoretic expressions to an arbitrary depth, those beyond a certain depth simply cannot be stated To

speak

with

in n a t u r a l language

of

objects,

operators

natural

language

turns

in any comprehensible

and control mechanism

out

fashion.

in connection

to be highly unnatural,

It is possible,

that

in terms of the syntax of the interface which in turn may

level

however,

or rather

impossible.

to define an abstract machine

still be based on object types. This is in striking High High

similarity

on

to Very

Level languages vis-a-vis High Level program/r, ing languages: Very Level languages are loosely described as languages used to

specify what is to be done, rather

than how it is to be done

[SI 74].

In accordance with sec.2.2, the object types must relate to the ones of the set theoretic machine. In this case the relationship is straightforward as indicated by the following list: N proper names for the objects of the universe. A attributes (properties of an object of the universe). R references from one object of the universe to a second one Thebacon is referred to by Morphium M references to measures. D numbers or measures. S sentences.

These

or no, and proper names.

are of two kinds:

sentences

to

be

(e.g.

as its derivate).

sentences

answered

to be answered

by yes

by counting or enumerating

194

Some

examples

language

from

XAIfAS

in

which

German

was chosen as natural

interface.

Ist Psyquil

rezeptpflic__~ht_!~?

N A Betraegt die T a g e s d o s i s yon C n i n i d i n M

2 Gramm?

N

D

~elcne O e r i v a t e yon ~ o r p N i u m sina r e z e p t p i l i c h t i g f

The

syntax

of

the

inter£ace

is

describea

by

a 9ra~az

~itn tile

iollowing general properties: (i) S y n t a c t i c a l cannot

variables must

relate to the object types, hence

be based on tile traditional grammatical

noun,

noun

phrase,

essentially

adjective,

semantical

(attributes),

etc.

in nature.

RE(references),

categories

but on c a t e g o r i e s

they

SUCh as that are

The v a r i a b l e s are IN(names),

~F(references

to

measures),

ME ZA

(numbers) ~ SA (sentences), QO (quantifiers} . (2) On the other hand, the traditional c a t e g o r i e s inust be accounted for in some way, a consequence, features. sAS FE~ NED STR ATT ~OM

e.g.

in order

each syntactical

variable

incorrect

inflections.

is indexe~ my a number of

for

restricted

natural

nominative ) genitive ) case aative ) accusative ) wora c l a s s ( a a j e c t . / n o u n )

language,

grammars are Know~ to be

e x t r e m e l y complex because of the m u l t i t u a e of syntactic aspects be

observed~

insofar

As

Examples:

masculine ) NO~ feminine }gender GEN neuter ) OAT strong ~ e c l e n s i o n ACC attribute apposition ADJ number (singular/plural)

(3) gven

to reject

The

as it can be arranged

a) a c o n t e x t - f r e e grammar

in two levels,

in terms of the v a r i a b l e s

from (i); b) a feature program to be a s s o c i a t e d wit~l each p r o d u c t i o n on level a). Example:

Typical p r o d u c t i o n s of level a) are

aE

ME

-~

aE

ME - ~

RE

ME - ~ ~E -~

RE NE RE 1N

to

a p p l i c a t i o n of features s i m p l i f i e s tI~e grammar

SA -* ~IE sind ~h?

195

The production ME 1 -~ ME 2 ME 3 refers to the following feature program numbered

(syntactic variables are

for reference).

Part I: Test o~ right-hand features for acceptance (reduction takes place only i~ the condition is true). t__es~ (ME2,+ADJ+ATI')

A test

A ~!e~ (MAS,FEM,NE0,ME2,ME 3) A egu (NUM,ME2,~E3) Part 2: Assignment

(NO~,GEN,OAI,ACC,~IE2,~3)

of features to the syntactic variable on the

left-hand

side.

-ADJ-ATT,

co_~p (NUM,ME2),

and

(ME 3, -ADJ-Aq~) Ameq

(MAS,FEM,NEU,ME2,ME3) , a_qnd (NOM,GEN,DAT,ACC,~E2,ME3)

Feature operators are underlined. For example, test is true when the features of the first argument meet the condition specified by the second argument, me__qq is true whenever at least one of the listed features agree in both syntactic variables specilied, co~ copies the features ol the syntactic variable specified. 3.4 Pharmacolog~y The natural language level is supposed to serve a variety o£ application areas, we postulate that these application areas are all served

by

the

explainable only

in

the

in

same

natural

language

grammar

since

terms of set theory. Consequently, vocabulary

each ~ust De

these areas Giffer

they assign to the object types. Level 3 is

reached from level 2 simply by introducing names, and relating the object types. ~elow a few typical examples of assignment in the area of pharmacology. proper names

medications,

attributes

e.g. ~hebacon, Morphium, CIBA, Angina pectoris properties

references

e.g. Tablette, rezeptp~lichtig e.g. Indikation and Kontraindikation

references to measures

substances,

companies,

them to

are given

ailments,

(from ailment to

medication), Hersteller (from company to medication) e.g. Preis, Dosis, HaltbarKeit

numbers or measures

e.g. 5 DM, 2 %~abletten/i~ag, ~ ~oc~len

sentences

e.g. ~elche Preise haben Praeparate, die bei Angina Pectoris indiziert sind und deren Kont~aindiKation nicht Glaukom ist?

t96

3.5 T r a n s l a t i o n s ~he

path between aa3acent nodes

(3) and

(4)). ~e Shall briefly

natural t~ree

and

set

language.

traditional

code generation.

phases:

is traversed by translation

illustrate

(sec.2.3,

this for t~e passage between

In this case translation consists of t~e lexical

analysis,

syntactic analysis ano

The sentence

"~elche Firmen sind Herstelier

tablettenfoermiger

Medikamente?"

shall serve as an example. Lexical a n a l z s ! s Lexical

analysis

natural

language

exceptions,

includes the mapping level,

proceeds

and

for

from the p h a r m a c o l o g i c a l

each word encountered,

in three steps:

(i)

reduction of a word to its word stem;

(ii)

d i c t i o n a r y lookup resulting some

to the

with a few

of its features,

in a syntactical

variable,

and s m o r p h e m i c class,

level name for the word. (iii) a s s i g n m e n t of further

features

values of

as well as the set

on the basis of the m o r p h e m i c

class and the actual m o r p h e m i c ending.

• he lexical analysis of the entire word

Isyn.~

Ivar Welche Firmen sind Hersteller

Medika-

results

features

I

Q~ ME RE RE NE ME ME ME

tablettenfoermiger

sentence

in

]int.name

I +MAS+FEM+NEU -~OM+NOM+ACC FEM-NUM+NOM+GEN+DAT+ACC +MAS+NUM+NOM+DAT+ACC +MAS-NUM+NOM+GE~+ACC +MAS+NUM+NOM+AYT+STR+ADJ +FEM+NUM+GEN+DAT+ATT+STR+ADJ +f~AS+FEM+NEU-NUM+GEN+ATT+STR+ADJ +NEO-NUM+NOM+GEN+ACC

DB M26 R23

~9 M22

mente

Note the combinations lexical "Firmen',

syntactic ambiguities due to the d i f f e r e n t feature for "Hersteller" and "tablettenfoermiger'. Note also that

analysis all

by

four

itself cannot always determine cases are still possible),

"tabletten~oermiger') °

the case

(as for

or the gender

(as for

197

Syntactic

analzs!s

Syntactic analysis includes three phases: feature analysis (level b)), final code

reduction (level manipulation. For

a)), each

production applied, reduction and feature analysis follow each other immediately. Hence a production is applied in three steps: (i) Matching of input string and right-hand side. (ii) Test of right-hand features for acceptance. (iii) If true, reduction to left-hand side and assignment of features. For example, the production and feature program from sec.3.3 result in the following when applied to the phrase "tabiettenfoermiger Medikamente": ME2 ('tablettenfoermig'): I) +MAS+NOM+NOM+AT~+ADJ 2) +FEM+N~M+GEN+DAT+AT~+ADJ

(rejectea on m eq) (rejected on me_~q)

3) +MAS+FEM+NEO-NOM+GEN+AT~'+ADJ ME3 ('Medikamente') I) +NEH-NUM+NOM+GEN+ACC ME1 (result): i) +NEU+GEN-NOM-ADJ-ATT (note the disambiguation) The syntactic

analysis of the entire sentence

is illustrated

in figure

3. Because of the possibility of ambiguities the result is a parsing graph rather than a tree (in this case the ambiguity of the sentence is due to "Hersteiler'). The numbers adjacent to the syntactic variables refer to an associated list of features. Final code manipulation is left to the final stages of code generation, but must be considered part of the syntactic analysis because without it context-sensitive or transformational rules could not be avoided. ~o~e_g~neration Whenever a production is applied, a semantic action associated with it generates a functional set expression. Its arguments point to other such expressions unless they are individuals. Example: (tablettenfoermiger

Medikamente)

/ Mw (Mg) (tablettenfoermig)

MW (M221 (Medikament)

A,18

SA,19

~

M[,

14

ND HERSIELL

Figure 3

~\

Ip 9 RE, 8

ll

M£,I ~

ABL['r:[

-

~DIKAHEN

ME, 5 ME, ~ M~N [, N[o 2

?*. I

CO

199

WELCHE FIRHEN SIND HERSTELLER TABLZTTENFOERI41GER HEDIKAHENTE ?

02300047 I0000001 15000000 01100033 04000032 16000000 15000000 01100025 14100025 15000000 15000000 15000000 16000000 15000000 15000000 16000000 15000000 16000000 16000000 16000000 26000000

15000000 01100025 140000C5 15000000 16000000 01200001 10000001 15000000 01100045 01200040 01100C30 05000027 01200044 01100033 04000033 01100033 04000026 16000000 16000000 16000000 00000000

DB X1 t ~ M26 ) ( AA ~'T (22) ( ( ( ) ( ( ) ( ) ) ) E~IRBE

Figure 4

( AA ~'T ( 5 ) ( ) £ XI ( MV* VG* £N R23 MD ~H ~2Z MW H22 ) ) ) ........

200

On c o m p l e t i o n of the parse, syntactic

the pointer

variable SA is transformed

must be s u b m i t t e d

to a further

string m a n i p u l a t i o n

(i) C o m p l e t i o n of the syntactic

to the

This string

for two reasons.

analysis.

Quantifiers

do

not yet appear

them

is

subject

there

structure c o r r e s p o n a i n g

into a linear string.

to

a

in front of the expression.

~oving

number of rules that govern their

sequence. (2) O p t i m i z a t i o n . In many cases q u a n t i f i e r s can

The

cooe resulting

the p r i n t o u t Reverse Set

e.g. DB by

from translation o~ tne sentence adore is shown in

in figure

4.

translation

level names may

level

(whose e v a l u a t i o n may be time-consuming)

be replaced Oy stanaard set or relation operators,

immediately be translated

simply by again

conditions result.

(empty

invoking the dictionary.

sets)

into the p h a r m a c e u t i c a l However,

under certain

set e x p r e s s i o n s may themselves De part of a

This requires a translation

Examples: Vg(RI2, I14)

-~ Heiimittel

Mw(M9)

-~ t a b l e t t e n ~ o e r m i g

I2

-~ Verophen

into both level

2 and level

3.

fuer Psychosen

4 Semantic p_~rimitives as a basis 4.1 M o t i v a t i o n In

order

whether

to stuuy the a G e q u a c y of the rules o~ cn.2 anQ to d e t e r m i n e

they must be ~urther

of c o n s t r u c t i n g

systems,

refined or augmenteQ

to examine existing

in

t~e form of layers. One of the olQest

it

was

[Wo

not

conceived

that way)

it is help£ul,

systems of this ~ind

68,

~o

73]. Like the set theoretic approach,

of

objects

previous

approach,

is

taken.

but

the semantics data bases.

~oods"

universe

and i n t e r r e l a t i o n s h i p s between them. UnliKe

these are not c o l l e c t e d

treated as p r o p o s i t i o n s

This

(t~ougn

is Woods" q u e s t i o n - a n s w e r i n g machine

composed relations

snort

systems that are arrangeG

into m a t h e m a t i c a l

is the

sets and

to which a p r o c e d u r a l approach

is p r o b a b l y due to an o r i e n t a t i o n towards explaining of

natural language rather

than m a n i p u l a t i n g concrete

201

4.2 Semantic

Primitives

~bie~t_t~P~ O

Elementary

Fn

n-ary functions (n>l), e.g. departure x2). I~hese need not be functions function

objects,

may

yield

it is defined

Rn

e.g. Boston,

AA-57,

function

officer(x,O) = a 1 officer(x,al) = a 2

(end)

officer (X,an)

8:~0 a.m.

time (of flight x I for place in the strict sense. If a

more than one value

as a successor

(start)

(e.g. officer

of a ship)

such that

= E~D

n-ary

relation

arrive

(flight x I goes to place x2).

Designators

DC-9,

(predicate)

(n)l), e.g.

3et

(flight x I is a jet),

are either names of elementary objects or of ti~e form

Fn(Xl,...,xn) where x i is a (AA-57, Boston) for 8:00 a.m.

designator;

e.g.

departure

Propositions Rn(Xl,...,Xn) where x i is a designator; (AA-57), place (Boston), arrive (AA-57, Chicago). B

time

e.g. jet

Truth values

Example: (from

A

set of semantic

primitives

for the flight

schedules

[~o 68]):

Primitive

Predicates

CONNECT (Xl, X2, X3) DEPART (Xl, X2) ARRIVE

(XI, X2)

DAY (XI, X2, X3) IN (XI, X2) SERVCLASS (XI, X2) MEALSBRV

(XI,X2)

Flight X1 goes from place X2 to place X3 Flight X1 leaves place X2 Flight X1 goes to place X2 Flight X1 leaves place X2 on day X3 Airport X1 is in city X2 Flight X1 has service of class X2

JET (XI) DAY (XI) TIME (XI)

Flight X1 has type X2 meal service Flight X1 is a jet X1 is a day of the week (e.g.Monday) Xl is a time (e.g. 4:00 p.m.)

FLIGHT (Xl) AIRLINE (XI)

X1 is a flight (e.g. AA-57) X1 is an airline (e.g.American)

AIRPORT

X1 is an airport

(XI)

(e.g. JFK)

table

202

CIT~

(Xl)

Xl is a city

(e.g. Boston)

PLACE

(XI)

X1 is an airport or a city

PLANE

(XI)

X1 is a type of plane

CLASS

(XI)

X1 is a class of service

AND

S1 and $2

(SI, S2)

(e.g. DC-3) (e.g. £irst-class)

] |

Sl or $2 Sl is false

OR (Sl, S2) NO~ (Sl) IF~SE~ (Sl, s2)

~ |

(where S1 and $2 are propositions)

!

!

if Sl then S 2 J

Primitive F u n c t i o n s DTIME

(Xlo X2)

the d e p a r t u r e

ATIME

(XI, X2)

the arrival

NUMSTOPS

(XI,X2,X3)

time of Zlignt x1 from place X2

time of flight X1 in place X2

the number o£ stops of flight X1 between place X2 and place X3 the airline which o p e r a t e s flight X1

EQUIP FARE

(XI)

the type of plane of flight X1

(XI,X2,X3,X4)

the cost o£ an X3 type ticket from place X1 to place X2 with service of class X4

(e.g. the cost

o£ a one-way ticket from Boston to Chicago with first-class

service)

Qperators To

every

function

(procedure)

and relation there exists a p r o g r a m ~ e ~

which

subroutine

~ e t e r m i n e s a value of a £unction or the truth o£ a

proposition. Examples JET

(procedure names are capitalizeu) :

(AA-57)

-9

true

ARRIVE

(AA-57,Chicago)

-9

ARRIVE

(AA-57, boston)

-9

~alse

-9

8:~@ a.m.

D~II~

(AA-57, boston)

~nereas

the

specific terms

abstract

operators,

of

supplied

both by

the

microprograma~ing, adjusting

true

machine the

of cn.3 was Rased on object types Out

abstract machine

object and operator user

in this case

types. Specific

is define~

in

instances must be

for both of them. However, with the auvent of

computer

scientists

should have little p r o b l e m s

in

to this kind o£ notion.

Control m e c h a n i s m As

in

the

notation~

preceeing

e.g.

example,

p r o g r a m s are expresseo

in £unctional

203

TEST(CONNECT would

(AA-57, ~OSTON, C~ICAGO))

stand

for

"Does

AA-57 go £rom 5oston to Chicago?".

Likewise,

queries of any appreciable degree of complexity are based on the notion of bounded quantifier as a representative for loops. The £ormat for a quantified expression

is

FOR /:; where a type of quantifier (EACH,EVERY,SOME,THE, nMANY). a bound variable. class of objects over which quantification is to range. The specification is performed by special enumeration functions, e.g. SEQ,DATALINE,NUMBER,AVERAGE. Besides enumeration these functions may perform searches or computations. restriction on the range ~ may both be quantified scope ; expressions. Unlike KAIFAS automatically where the result of the evaluation of an expression retranslated and displayeG, this is must be explicitly requesteG by commands such as TEST (test trut~l o£ a proposition), PRINTOOT (print the representation for a ~esignator). Examples: (FOR EVERY X1 / (S£Q T~PECS):T; (PRiNTOOT (XI)) prints the sample numbers for all the lunar samples which are o£ type C rocks, i.e. breccias (T stands for "true"). (TEST (FOR 3~ MANY X1 / (SEQ FLIGHT):JET(XI); "Do 30 jet flights leave Boston?" DEPART (XI,~OSTON))) 4.3 Natural language As a general rule, the introductory remarks to sec.3.3 apply here as well: The level of the "English-like" query language provided on level 2 is influenced by t~%e range of expressions possible on the previously discussed level i. In contrast to KAIFAS, inspection of the data base is not limited to the evaluation of level 1 expressions but may take place during translation from level 2 into level i, too. The semantic actions associated with a rule of grammar impose further restrictions, e.g. they make sure that the first argument of CONNEC~ is inaeed an instance of the class FLIGR~. 204 This is illustrated syntactic analysis by the £ollowing example. is p e r f o r m e d and a phrase marker In a first step a is derived, e.g. NP 1 I M-57 NPR /% /\ 1 Since verbs in ~nglish I~ ,o correspond rougniy to p~eaicates, an~ noun phrases are used to denote the a r g u m e n t s of the predicate, the be phrase predicate. is marker will In the example, necessary that the the primary factor the p r e d i c a t e will be CONNECT. subject be a flight the verb in in d e t e r m i n i n g and that the For this it there be prepositional phrases whose objets are places representing origin (from) and d e s t i n a t i o n (to). The g r a m m a t i c a l relations among elements of a phrase marker are defined by partial GI: S /\ NP G2; S t V 1 (2) subjecl-verb G3; e.g. S i VP VP (I) tree structures, I t VP / \ V 1 NP i { I) t2) vetb-obj ect /P\ PREP NP (| ) { Z) Pfeposffion-objec! modifying o VP Among the phrase three n~arker, structures, v~hich of these G1 and G3 ootn match subt[ees In the is a c c e p t a b l e depends on the a~ditional rules, e.g~ (GI:FLIGHT(1) ana(2) = fly). ((i) and (2) are p o s i t i o n a l v a r i a b l e s This rule o b v i o u s l y example, the is satisfied. topmost S-node = to and PLACE((2))) ==> tree structure). More co~nplex rules are possible; of the phrase marker rule I-(GI:FLIGd%((1)) and (2) = fly) and 2-(G3: (i) = ~rom an~ PLACE ((2))) and 3-(G3:(I) in the partial CONNECT(I-I,2-2,3-2) for is matched by the 205 4.4 Air!ine 9 u i d e ~he system under discussion was first applied to a flignt seneQules table. TO illustrate the application interface, a few examples of queries shall be g i v e n below Does A m e r i c a n (from [Wo 68]). Airlines have a flight departure time from which goes from ~oston to Chicago? ~hat is the Boston of every A m e r i c a n A i r l i n e s flight that goes from Boston to Chicago? What A m e r i c a n Airlines flights arrive in Chicago from Boston before 1:8~ p.m.? Bow many airlines have more than 3 flights that go from Boston to Chi=ago? 4.5 Lug~{ geology More recently the system evaluate the chemical that accumulating was has been applied to access, compare ana analysis data on lunar rock and soil composition as a result of the Apollo m i s s i o n s [~o ?3]. Examples: What is the average c o n c e n t r a t i o n of aluminum in high alkali rocks? Give me all analyses of SI~046! How many breccias contain olivine? Do any samples have greater than 13 percent aluminum? What is the average model c o n c e n t r a t i o n of ilmenite in type A rocks? 4~6 Critique (i) The possibility during of translation confusion. related Since, to inspecting the data base both on level 1 and from definition, reference to practical repercussions: necessitate control the changes mechanism level 2 to level 1 introduces a note of according data in the to sec.2.3, translation base. The Either the translation process is d i r e c t l y must make no lack of separation will have certain changes on level 1 will rules of grammar, or parts of the for level 1 must be duplicated for translation purposes. (2) In Wooas" system the subroutines their arguments are of the proper whether AA-57 kind do not appear to verlfy that (e.g. ARRIV~ Goes not c~eck is indeed a flight or Chicago a place), since this 206 is done on translation~ then p r i m i t i v e These interdependencies the parlance corresponding arguments. of to relationships this those structures unary for circumvents predicates macnines axioms t~is types that or must restrict accoun~ by or in ranges oi machine ana not only for (~ote that only 1 categories of a D s t r a c t as well. problem to level to each oLner. by a set oi axioms, Dy tt~e c o n c e p t s abstract but (correctly) are related may be e x p r e s s e d data between terms left and functions As a consequence, primitive machine If one predicates the KAl~AS prescribing all operators.) (3) O p e r a t o r s albeit (subroutines) in a one-to-one requirements are met governing it corresponding 5 Relational ana objects fashion. are In order the r e l a t i o n s h i p suffices to procedure as two treat interdependent to make between a predicate instances as well, sure that the abstract machines or function o£ the same and its resource. model 5.1 M o t i v a t i o n One oi the relational well to users an to iormatte~ A certain reade r ' s are abstract unlverse same way: field names a uniquely a sequence or, as is by supposes oi £ielGs are ordered a key, i.e. his structures. of entries t~ey an is Coua's particularly CoQ~ that may be named. identified Oases itsel~ of table-liKe ol a number entry is a relation to Qata lenas machlnes. in terms the formally, a table by consists or More consequently, approaches 72, ~e 74] which a table exactly headings but their speaking, in particular alscusseQ interpretation attributes. named widely [Co 7G,Co explain Intuitively certain most mooel t~at are orGerea called n-tuple ~ntries on here, and, are not the contents ol fields. familiarity part. Only with the relational its i n t e r p r e t a t i o n model by a m a c h i n e here. 5.2 R e l a t i o n a l algebra Qbie~t & A attributes Kn relations naming a set of ob3ects (domain) is assumea on the will be e x a m i n e d 207 R n (AI,A2,...,A n) S A 1 x A 2 x ... x A n Example: S U P P L I £ R (SUPPLIERNR, ~AME, LOC), K E Y = S O P P L I E R N R SUPPLIER: SUPPLIERNR NAME LOC 1 Jones New York 2 Smith Chicago 3 Connors Boston 4 ~hompson New York Key attributes are indicatee; anQ other Keys may be composite. Hierarcnicai relationships are usually eliminateo ~y normalization. ~ence all relations can be assumea to be normalizea. Tn ~ R n n-tuple. Operators 9tand~d Rnl Q [We 74] rela~ign o p e r a t o r s Rn2 -9 Knl+n 2 Direct Product: {(Tnl~Tn2) JTnl E Rnl^Tn2 e R n 2 ) (~ C o n c a t e n a t i o n operator) } attributes Rnu Rn -~ R n Union R n ~ Rn Rn - Rn -9 R n -~ R n In t ~ r s e c t i o n l must be Di£~E~ence "compatible" Special o p e r a t o r s Rn[A] -9 R m Projection: Kelation R n restricteo to the attributes A={AI,...,Am}. Rnl [AQ~]Rn2-~ Rnl+n2Join: { ( T n l ~ T n 2 ) JTnl E Rnl ^ Tn2 ~ Rn2 ^ Tnl [A]~Tn2 where A,~ sets of attributes, @ one oi (Slight modifications, R n [A@B] -9 R n Restriction: e.g. natural R n [A÷~]R n ->R m ~iv~sion: [Co 71], p.74. {=,9,<,&,>,l}. join, are possible). {~nJTng R n ^ Tn[A]@Tn[B] } where A,B,O as above. [B]} 208 ~o£tio ! ~e£h~n!s ~ Since are all operators formed by n e s t e ~ i~elational nave by linear For 5.3 R e l a t i o n a l calculus In relation place oi reduced in the for Individual an e x a m p l e algebra see Co~G relational infix operators, and sec. operands "programs" rather than 5.3. proposes an~ an a p p l i e u proceeds calculus relation constants, constants, Tuple variables, (attributes to show preQicate tnat (alpha-expression) algebraic may any be expression. are a I, a 2, a 3, ... i, ....... indexeu 2, 3, per 4, relation insteau ot namee) r I, r 2, r 3, ...... constants, monodic, dyadic, Logical as operators the c a l c u l u s : Index Predicate o£ calculus), to an e q u i v a l e n t Alphabet defined (ALPHA) (relational expression been sequences expressions. calculus al~e£r~) symbols, PI, P2, P3, .... ; =,~,<,~,>,~ 3, V , A , v , Delimiters. Simple alpha-expressions nave (t I, t2, .... , tK) : w where - w a well-fo[meu - formula, terms consisting non-indexed tuple variable, set of is p r e c i s e l y tuple the ~xample: Alpna-expresslon suppliers each o£ W h O m variables set of indexeQ occurring in free ior supplies of an variables "~ino all the ] P3r3((rl[l]=r311]) reduction to r e l a t i o n tl, name projects": S1 = R1 S2 = R2 S3 = R3 s=sI®s2® 3 T 3 = S[I=6]~S~8=4~ T 2 = '1'3 [1,2,3,4,~] TI = T2 [(4,5)÷(1,2)]S 2 A (r313]=r2[l])) algebra: or .o, tk in w. r2{3]): Plrl^~P2r2 After form t i distinct - the (rl[2], t~e and location oi all 209 = TI[2,3 ] ALPHA is a appealing language to the user may be r e f o r m u l a t e d I~ANGE S U P P L I E R RANGE PROJECT RANGE SUPPLY G~T ~ in A L P H A SUPPLIER PROJECT SUPPLY ~or ((L.SUPPLIEk~=K.SUPPLI~R~k) (order of q u a n t i f i e r s similar do of tnis to = K.SUPPLIERNR) A (K.PROJNR a have kind is SQOARE = P.PROJ~R) each such statements found columns However, of a table formal looking the been shown une to oe the view o£ [elatlens ~y ALPHA: for a value one row after examine have training, wnich has been from t~at offerea to inspecting value of given 3 an~ 4 languages [bo ?4] calculus. or columns (as opposed in cns. to rely on a user's is d i f f e r e n t column elements to the ones not the relational of values SQUARE A (~.PiO0~R=P.P~OONk)) be m a i n t a i n e d ! ) , L.LOC): Dy SQOAR~ (ii)For must L that (i) Scan as levels reducible offered more ine example P ALL reasons language is slightly above, L.LOC): (L.SOPPLIERNR devised that shown K SOME GET W (L.NAME, 5.4 Higher form K or, e q u i v a l e n t l y RANGE expressions F (VP) (~K) RANGE alpha L (L.~AME, RANGE for than the p r e d i c a t e or a set another). corresponning row anG in this row. are of a form suc~ as ("aisjunctive mapping") bRA(S) (read: is a "find B of R where A is S") relation, respectively), Other forms, a similar A S e.g. and is an B that defines a mapping are sets of a t t r i b u t e s argument for projection, that may conjunctive itself (domain such be an expression. and n-ary mappings, appearance. Example : ~iA~iggMP DEPI' ( "TOY ") stanGs for "FinQ the names of e m p l o y e e s that R and range, in ti~e toy aepartment". nave 210 ~ore a recently attempts relational [Co ~4]. ehs.3 data %he a p p r o a c h and nave base 4 in that been reporteo that allow system in a ~ialog ~oun~eQ ~ii~ers drastically from a truly two-way a user to engage on natural ~ngiisn t~e ones ~ i s c u s s e o communication in is envisioned. 5.5 Comment It has been relational shown algebra, expressible SQUARE tnat botn in i.e. ALPHA are t h e m s e l v e s any query and equivalent. on tne s u c c e s s i o n equivalence~ the definition ~rom the point given ss relational increasing notion o~ user level). expressible Equivalence in relation of the h i e r a r c h y ALPHA indicates does - SQOAR£ that ALPHA is and relation. not preclude by r e s t r i c t i o n a hierarchy to the algebra hence is a s y m m e t r i c machines sophistication - are e q u i v a l e n t and vice versa, of abstract algebra This of h i e r a r c n y and SQUARh in S Q U A ~ , The c o n d i t i o n does. ALPHA however still De (in the e i r e c t i o n of ~urtner coul~ refinement on the is necessary. 6 Conclusions There and are some striking similaritzes between the examples o£ cns.3,4 5: - In each - All the lowest rely on level has been well quantification as a £ormalizeu. means for building complex expressions. - All - All tend towards three systems On the other a less natural hand, formal Experiences have been only but indicate ~nile a objectives between has been that successive perhaps in the belore. translations, raise o£ so far system Rave (cn.3) to provide level. situations, as well. at the very least they meet the languages coulo 0£ course, the r e l a t i o n s h i p nigher techniques the e f f i c i e n c y attempteo to De made much more precise, Furthermore, ane some application. on an i n t e r m e d i a t e proof, user introduction. will found levels° in some w e l l - d e f i n e d do noc c o n s t i t u t e levels inoicateo (ch.5) the KAIFAS higher and language at least nierarcnies mentioned o~ s u c c e s s i v e and with ~ew e x a m p l e s suggest stylized that, on their implemented one of them still this may be n e c e s s a r y Qo language of nigher must levels imply be e x p l o r e d levels. ~inally, did not attend to the critical q u e s t i o n what form take; this a p p e a r s to be a largely unsolved problem. as a number to measure tne paper the root should 211 Acknowiedgement~ The reading the manuscript author is grateful to G.Goos and making helpful suggestions. for carefully Re£erences [Ab 74] J.R.Abrial, [BO 74] R.F.~oyce, D.D.Chamberlin, W.F.King, M.M.Hammer, Specifying Queries as Relational Expressions, in [KI 74], 169-176 [Bu 72] Burroughs Corp., Language (ESPOL), [Co 76] E.F.Codd, A Relational Model for Large Snared Data BanKs, Comm.ACM 13(197~), No.6, 377-387 red 72] E.F.Coad, Relational Completeness of Data base Sublanguages, in: ~.Rustin (ed), Data Base Systems, Courant Computer Data Semantics, in [KI 74], 1-59 B6700/77~ Information Science Symp., Executive System Programming Manual, 1972 Prentice-Hall, Inc. 1972, 65-98 red 74] E.F.Coea, Seven Steps to Rendezvous in [KI 74], 179-199 with the Casual 0ser, [Col 68] L.S.Coles, An Online Question-Answering System with Natural Language and Pictorial Input, Proc. 23rd Natl. ACM Conf. (1968), 1.69-181 [Go 73] G.Goos, ~ierarchies, in F.L.Bauer (ed), Advancea Course on Software Engineering, Lecture Notes in Econ. and Math. Systems, vol.81, 29-46 |Gr 69] C.C.Green, The Question-Answering Univ. 1969 [~i 74] J.W.Klimoie, Nortn-Hollana |Kr 75] K.D.Kraegelo~, P.C.Loc~emann, Bierarcnies o£ Data Languages: An Example, Information Systems (in print) [Su 74] B.Sundgren, Conceptual Foundation of Approach to Data Bases, in |KI 74], 61-94 [SI 74] ACM SIGPLAN Symposium on Very High Level Languages, 1974, ACM, New York 1974 Application o£ ~neorem Proving to Systems, Tech. Rep. ~o. CS138, Stanford K.L.Koffeman (eds), Publ. Co. 1974 Data Base the Management, Base In£ological March 212 [i~e 74] H.WedeKino, Data Base Systems I, ~I-~issenscna£tsverlag~ Reine Informatik, vol.16, 1974 (in German) [Hi 68] N.Wirth0 Computers, PL3~6, A Programming Language Journ.ACM 15(1968), No.l, 37-74 [wo 68] ~.A.WOOdS~ Machine, 457-471 Proce0ural Semantics £or a Question-Answering Proc. AFIPS Fall Joint Coff!p.ton~l 33(1966), [No 73] WoA.~oo~s~ Progress in Natural Application to Lunar Geology, 42(1973)~ 441-450 £or tne 36~ Language 0nde[stan~lng - An Proc. AFIPS ~ati.Comp.uon£. Ein System zur interaktiven Bearbeitung umfangreicher Me~daten Ulrich Schauer, IBM Deutschland GmbH, Wiss. Zentrum Heidelberg Zusammenfassung Bei der Bearbeitung von Megdaten mu~ man unterscheiden zwischen einer Standardauswertung der Messungen, bei der eine bestimmte Modellvorstellung zugrunde liegt und einer Analyse mit dem Ziel, logische Zusammenhange zu erkennen und ein erkl~rendes Modell zu finden. W~hrend die Standardauswertung durchaus im Stapelbetrieb ablaufen kann mit einem Datenmodell, das abgestimmt ist auf die im Modell ablesbaren Verknfipfungs- m6glichkeiten, ist ffir die Analyse ein interaktives System wfinschens- wert mit einem Datenmodell, das beliebige Verknfipfungen erm6glicht und mit einer Datenmanipulationssprache, die mSglichst deskriptiv sein soll- re, aber komplexe Auswahlkriterien erlaubt. Verf~gbare Systeme werden den Anforderungen der Analyse nur teilweise gerecht, meist mangelt es der Datenmanipulationssprache an F~higkeiten zur rechnerischen Datenbe- arbeitung. Im folgenden wird ein experimentelles System ffir die Bearbeitung von Megdaten beschrieben, an dem im Wissenschaftlichen Zentrum der IBM in Hei- delberg gearbeitet wird. t. EINFOHRUNG Umfangreiche Sammlungen yon Megdaten k6nnen erst in vollem Mage nutzbar gemacht werden, wenn die f~r die Analyse zust~ndigen Fachleute Wissenschaftler, (z. B. Techniker - meist ohne groge Programmiererfahrung) in die Lage versetzt werden, ohne Zuhilfenahme von Programmierern selbst die Bearbeitung vorzunehmen. Dazu ist ein interaktives System erforderlich, das erlaubt, Teilmengen der Daten unter komplexen Auswahlkriterien zu bilden und in vorhandene oder neu zu schreibende Bearbeitungsprogramme zu stecken und die Ergebnisse tabellarisch oder graphisch darzustellen. 214 Schon bei den Auswahlkriterien k6nnen recht verwickelte Berechnungen anfallen, die z w e c k m ~ i g mit Bausteinen aus einer Programmbibliothek durchgeffihrt werdeno Anpassung des Systems an bestimmte Fachgebiete ist damit m6glich durch Anpassung der zugrundeliegenden Programmbibliothek. Da nur eine begrenzte Anzahl yon vorgefertigten Programmen zur Verffigung stehen kann~ wird h~ufig noch Datenmanipulation durch eine Tr~gersprache (host language) notwendig sein. Als Tr~gersprache ist APL ffir die angestrebte Zielsetzung besonders geeignet durch ein hohes Mag an Interaktivit~% durch Anpassungsf~higkeit an die Programmiererfahrung des Ben~tzers und eine Vielzahl yon Operationen zur Datenmanipulation. Figur ! vermittelt einen 0berblick fiber den Systemaufbau. DatenManagementsystem ........ IInformationsSystem DatenManipulations System Interaktive Tr~gersprache (APE) FIGUR I: System-Aufbau Die Datenbank enth~it sowohl Problemdaten als auch beschreibende Dateno Programmbibliothek steht symbolisch f~r eine Sammlung von Programmen, die in PL/I, FORTRAN oder Assembler geschrieben sein k6nnen und die von APL aus mit Daten aus dem APL-Arbeitsspeicher oder der Datenbank angestogen werden k~nnen und ihre Ergebnisse wieder im APL-Arbeitsspeicher abliefern. Die Benfitzer-Kommunikation erfolgt mit APL oder mit einem der in APL eingebetteten Systeme zur Manipulation yon Megdaten, Pro- 215 grammen und zugeh~riger Dokumentation. Als Benftzerstation (Terminal) kommen in erster Linie Bildschirm und Schreibmaschine in Frage. Einen 0berblick fiber die Datenkomponenten, die vom System zu verwalten sind, gibt Figur 2. Katalogbearbeitung beschreibende Daten ProblemDaten <:~ ,, ,~ t Programme Megdatenbearbeitung FIGUR 2: Datenkomponenten Das System mug drei in Wechselbeziehung stehende Klassen von Daten verwalten: a) Algorithmen (Programmbibliothek) b) Problemdaten (Me~daten) c) Beschreibende Daten (Katalog der Daten und der Programme) Im allgemeinen wird der Benftzer bei einem konkreten Bearbeitungsfall erst anhand der Kataloginformation feststellen, aus welchen Datenaggregaten (Tabellen) seine Problemdaten auszuw~hlen sind und welche Programme bei der Bearbeitung eingesetzt werden k6nnen, und erst dann die vollst~ndige Probleml6sung festlegen mit Hilfe des Datenmanipulationssystems. 2. 0BERBLICK 0BER DIE SYSTEMKOMPONENTEN 2.1 Algorithmen Die verf~gbaren Hilfsmittel zur Bearbeitung der Megdaten und zur Darstel- lung von Resultaten und Zusammenh~ngen zwischen den Daten lassen sich in drei Klassen einteilen: a) Arithmetische und logische Operationen zum Ausdr@cken yon logischen Beziehungen (z. B. a > 5) und zur Datenmanipulation (z. B. x ÷ y ÷ z-tOO) ffir numerische und abgesehen yon arithmetischen 216 Operationen auch ffir nicht numerische Daten. Die Verwendung yon APL als Tr~gersprache erlaubt insbesondere auch bequeme Manipulation yon Rechtecksstrukturen yon numerischen und yon Textdaten (Vektoren~ Matrizen). b) Unterprogramme zur L6sung von standardisierten Problemen aus Ge- bieten wie Mathematik tiation) und Statistik (z. B. numerische Integration und Differen(z. B. lineare Regression, Testverfahren, Darstellung yon H~ufigkeitsverteilungen c) Anwendungsbezogene zeichnungen, Standardverfahren etc.). (z. B. Analyse von EKG-Auf- Klassifizierung von FingerabdrQcken etc.). Die Tr~gersprache APL mit einer Vielzahl von verf~gbaren APL-Bibliotheksprogrammen und der M 6 g l i c h k e i ~ v o n APL aus graphische Darstellungen zu initiieren, bietet schon alle M6glichkeiten zur Datenmanipulation. Trotz- dem sind die Klassen b) und c) notwendige Bestandteile des Systems. Die Klasse b) erlaubt Ausweichen auf FORTRAN, PL/I oder Assembler geschriebene Unterprogramme, was besonders bei grogen Datenmengen bessere Rechen- zeiten bringen kann. Programme der Klasse c) existieren vorwiegend in FORTRAN oder PL/I~ weil sie meistens f@r Anwendung im Stapelbereich entwickelt werden. 2.2 Problemdaten Das System ben@tzt ein relationales Datenmodell~ die Datenbank besteht aus einer Sammlung umfangreicher Tabellen, die mit leicht verst~ndlichen Operationen manipuliert werden k~nnen (Codd /1,2,3/). Datenattribute sind den Spalten einer Tabelle fest zugeordnet wie beim SEQUEL-System (Boyce, Chamberlin /4,5/). Spezifikation von Teilmengen von Daten aus einer oder mehreren Tabellen erfolgt mit einer an Beispieleintragungen in die fraglichen Tabellen orientierten deskriptiven Sprache, die sich gleichermagen fur den Einbau von Unterprogrammaufrufen ablauf eignet in den Programm- (Zloof /6/). Die Datenelemente in einer Tabellenspalte k~nnen dimensionierte Daten sein (z. B. Vektoren, die eine Me~reihe darstellen oder Matrizen, die mehrere Megreihen oder eine Funktion yon zwei Ver~nderlichen darstellen k6nnen etc.)° Die offensichtliche Mehrdeutigkeit wird duTch eine der Tabellenspalte zugeordnete Interpretierung behoben. a) Interpretierungsattribut: Regelt die Deutung einer Matrix, z.B. als Werte einer Funktion yon zwei Ver~nderlichen in den Punkten eines gleichabst~ndigen Gitters. Die Definition der Gitterpunkte 217 (x ° + i.h, Yo + j'k) i = O, I, ..., m-1 j = O, I, ..., n-1 erfolgt durch Angabe von Xo, Yo' h, k und m, n. b) Darstellungsattribut: Erlaubt Spezifikation yon Verdichtungsmechanismen fur Datendarstellungen in Erg~nzung zu beispielsweise I, 2, 4 byte integer. c) Speicherungsattribut: Die meisten Daten werden in der XRM-Datesbank gespeichert digitalisierte (Lorie /7/). Umfangreiche Datenelemente (z. B. Bilder) k6nnen jedoch auch in yon CMS (Conversational Monitor System) verwalteten Band- oder Platten-Dateien abgelegt und in XRM nur durch Angabe ihres Dateinamens und einer Zugriffsroutine bekannt gemacht werden. Das System besorgt automatische Umwandlung physikalischer Einheiten und automatische Datenkonversion entsprechend Interpretierungs-, Darstellungsund Speicherungsattribut sowie Beachtung yon durch logische Bedingungen definierten Konsistenzregeln bei neuen Eintragungen oder ~nderungen in einer Tabelle. 2.3 Beschreibende Daten Das System zur Manipulation der unformatierten Kataloginformation ist eine selbst~ndige Komponente mit F~higkeiten fNr Generierung, Wartung und f@r rechnerunterstNtztes Auffinden der relevanten Katalogeintragungen Nber Daten und Algorithmen (Erbe, Walch /8/). Formatierte Datenbeschreibung wird in der XRM-Datenbank gespeichert und umfaSt jeweils ein Verzeichnis von: a) Umwandlungstabellen f~r physikalische Einheiten. b) Methoden mit Programmidentifikation. c) Datenattributen mit Tabellen und Spaltenbezeichnern. Mittels b) und c) kSnnen Programme und Tabellen rasch identifiziert werden, wenn die Bezeichnung der Methode bzw. der Attribute der fraglichen Tabellenspalte bekannt sind. 3. DIE DATENMANI~ULATIONSSPRACHE Zun~chst sind zwei Sprachebenen vorgesehen. 218 Prgz!durale Sprachebene 3.1 Die folgenden Eigenschaften kennzeichnen die prozedurale Datenmanipula- tion: a) Der Datenzugriff erfolgt durch APL-Befehle (Lorie, Symonds /9/)° b) Umwandlungen zwischen der externen Datendarstellung in der XRMDatenbank und der internen Datendarstellung (z. B. Darstel!ung und Speicherung). c) Konsistenzregeln erfolgen automatisch werden automatisch kontrolliert bei Datenzug~ngen oder Ver~nderungen. d) Die Daten werden tabellenweise e) Der Ben~tzer ist verantwortlich ten hinsichtlich physikalischer oder zeilenweise verarbeitet. fur korrektes Verarbeiten der DaEinheiten und Interpretation. Deskriptive SpFacheben ~ 3.2 Die nicht prozedurale Sprache EQBE stellt eine Erweiterung dar von QBE (Query by Example, Zloof /6/). Sie eignet sich auch fur Ben~tzer mit geringen Kenntnissen in APL (Erfahrung im Umgang mit APL als Tischrechner gen@gt) und ohne Programmiererfahrung. Die Sprache ist in hohem Ma~e deskriptiv. Relationen und in der Programmbibliothek verf~gbare Unterprogramme werden als Tabellen dargestellt, und der Ben~tzer formuliert seine Datenauswahl, indem er entsprechende Zeileneintragungen vornimmt, die Ausgabewerte bezeichnet und Auswahlkriterien - soweit erforderlich durch APL-Statements definiert. EQBE l ~ t sich am besten anhand yon Beispielen erkl~ren. 3.3 Beispiele R ~ r Ix zu E~BE ist ein Schema fur eine Tabelle mit dem Namen R und I y ~ zwei Spalten mit den Bezeichnern RI und R2. Die Werte x~ y stellen eine Tabellenzeile dar, r ist ein Bezeichner diese Zeile. r, x, y werden vom Ben~tzer eingetragen in das Schema R IRI fur I R2 ~ I a das vom System geliefert wird, wenn man Tabelle R anfordert. Die Datenvariablen x, y k6nnen alle in R gespeicherten Tupelwerte annehmen. { ( x , y) I (x, y) !. Auswah! einer Spalte O÷ X e R} (Projektion) 219 Die Angabe eines Zeilenbezeichners ist als Symbol Die Abfrage Gesucht ffir Ausgabe ist nicht notwendig. zu verstehen. lautet: ist die Menge Eine m6gliche der x Werte Formulierung {x I ~ ( x , y) Selbstverst~ndlich nur auf Werte Im folgenden aus RI. im Pr~dikatenkalkfil wgre ER} erstreckt sich der Definitionsbereich aus der R2-Spalte schreiben von y yon R. wir daffir auch k~rzer {x I u ( x , ) } und fassen u(x,) in R existiert, 2. Einfache als Pr~dikat dessen Abfrage gersprache R RI R2 u x y mit einschrgnkenden formuliert x> auf, das wahr erste Komponente ist, wenn ein Tupel gleich x ist. Bedingungen, die in der Trg- werden. ,31 z 5 +yxy (z < 25) V (z > 50) D~x {x [~3u(x,y,z) yz A (x > 5 + y × y) A ((z < 25) V z > 5O) } 3. Schnittmenge x > y z = 10 ~÷x T r g g t man i n S a n s t a t t das APL-Statement z den konstanten W e r t 10 e i n , z = 10. oder {x ]~/9 r ( x , y ) yz A s(x,z) A (x > y ) A (z=lO)} so e n t f ~ t l l t 220 4. Vereinigungsmenge x1> y z = 10 0÷ x { x ] ~y u C x , y , ) A Cx> y) } L) { x } 3z vCx,,z) (x i (#. u(x,y,) A (x> y)) v ::]zzvCx,,z) A (z=1O)} A (z=10)} oder S. Differenzmenge r x y D+x {x [ ~ r(x,y) A ~ s(,x) } Selbstverst~ndlich muB jede Datenvariable, die in einer negierten Tupelvariable auftritt, auch in einer nicht negierten Tupelvariabfen auftreten (oder als globale Variable bekannt sein). 6. Kartesisches Produkt R RI I x ...... :I r O+ x,y,xl,z { (x,y,xl,z) I r(x,y) A s(xl,z) } 7. Equijoin (Restriktion im Kartesischen Produkt) - ~1 ~ ~ I1~1 Ix i"I ~'2"'I ~ ~+x,y,z { (x,y,z) I r(x,y) A s(x,z)} 221 8. Verallgemeinerter Join mit nachfolgender R RI R2 S $I $2 r x y s xl z Projektion x_>y B÷z {z I 3x x-3I -3y r(x,y) A S(Xl,Z) A (x >- y)} Anstelle des _> Operators k~nnte eine beliebige goolsche Funktion stehen. 9. Division R r RI Ix R2 I y I S $I $2 T TI T2 s x z t .y z ~]+x {x I~z ¥Y6 r r(x,y)A s(x,z) A t(y,z)} .y steht fiir {y l~x ~z r(x,y) A wobei -4 s(x,z)} , bedeuten soll, daI~ x fest zu w~hlen ist, und das Auf- X treten yon .y in t ist so zu verstehen, dab gilt ~ Y6.Y t(y,z) 10. Gruppierung {x Iv v { r ( x , y ) A s(x,z)A t(y,z)} g kann bis jetzt noch nicht formuliert werden. Man braucht ein Hilfsmittel, um AbhRngigkeit zwischen Variablen anzugeben. Mit der Vereinbarung, daf~ y.z bedeuten soll-I ~z ' sind die entspreY chenden Eintragungen : sis ] r x y s t .............. .y y. zl D+x Wir sind jetzt in der Lage, jede Operation der Relationenalgebra auszuf{ihren. Die Vollst~ndigkeit yon QBE in der vorgestellten erweiterten Form ist damit fiir einfache Abfragen, die nur eine Operation der Relationenalgebra umfassen, erwiesen. Sie folgt auch fur beliebig zusammengesetzte Operationen: Jede Abfrage yon QBE etabliert bei ihrer Definition eine logische Datensicht, die der Resultattabelle entspricht. Erst bei Ausf~hrung eines APL-Programmes) das yon einem Abfrageprozessor aus der logischen Datensicht erzeugt wird, 222 entsteht die Resultattabelleo Eine neue Abfrage kann auf der iogischen Datensicht yon schon definierten Abfragen aufgebaut werden, und damit kann eine komplexe Abfrage in Einzelschritte aufgel~st werden. 3.4 Diskussion der Erweiterungen von QBE Die nachfolgend beschriebenen Erweiterungen erlauben die Behandlung yon recht komplexen Abfragen, wie sie bei Me~daten zu erwarten sind, ohne die Einfachheit f~r elementare Abfragen zu beeintr~chtigen. a) In einer Programmbibliothek erfa~te Algorithmen (APL-Funktionen, FORTRAN-Unterprogramme, PL/1-Prozeduren oder Assemblerroutinen) k6nnen f~r Datenauswertung oder Datenselektion innerhalb einer Abfrage eingesetzt werden. b) Beliebige APL-Befehle k6nnen innerhalb einer Abfrage zur Datenselektion und Auswertung verwendet werden. QBE erlaubt auger den Vergleichsoperationen nur eine begrenzte Anzahl eingebauter Funktionen wie COUNT, SUN etc. ¢) Die Resultattabelle einer Abfrage kann durch Angabe yon formatbeschreibenden Formularen auf verschiedenste Art dargestellt werden, auch in graphischer Form und wiederholt mit wechselnden Formularen. d) Dutch jede Abfrage wird eine logische Datensicht definiert, die zur Entkoppelung komplexer Abfragen in einer Folge von einfacheten Abfragen verwendet werden kann. e) Jede Abfrage kann zu wiederholten Malen ausgef~hrt werden. Dabei k~nnen von Mal zu Mal die Werte globaler Variablen ge~ndert werden. F@r APL-erfahrene Ben~tzer er6ffnen sich dadurch interessante Mgglichkeiten zur Datenbearbeitung mit anpassungsfghigen Bausteinen. f) Der Entkopplungseffekt von QBE, da~ die Zeileneintragungen in beliebiger Reihenfolge m6glich sind, wurde noch verst~rkt (Verwendung der Gruppierungsm6glichkeit). g) Durch die Gruppierungsm~glichkeit k~nnen auch Abfragen ohne Zerlegung in aufeinanderfolgende Schritte bearbeitet werden, die sich der Behandlung durch QBE entziehen. h) Als Gegenst@ck des ALL D-Operators (all different) von QBE dient in EQBE ein vorgesetzter Punkt, entsprechend beim ALL-Operator (alle mit Wiederholungen) ein vorgesetzter Punkt und Angabe des Tupelbezeichners in Klammern gesetzt. Eine Pseudovariable wie .y oder .x (r) kann in APL-Befehlen verwendet werden und steht stellvertretend ffir einen Bereich gleichartiger Werte. 223 4. MESSDATENBEARBEITUNG 4.1 Das Datenbearbeitungssystem APL ist zur interaktiven Analyse von Me~daten, die im APL-Arbeitsspeicher Platz finden, hervorragend geeignet (Schatzoff /10/). Bei gro~em Datenumfang verliert APL an Attraktivitgt, weil Datenselektion aus Tabellen dann aus Platzgr~nden nicht im APL-Stil durch eine Operation abet einen dimensionierten Bereich dargestellt werden kann, sondern nur durch eine Rekursionsvorschrift ~ber alle Tabellenzeilen. Eine prozedurale Sprachebene mit APL als Trggersprache ist daher noch nicht voll zufriedenstel- lend. Ein weiterer Gesichtspunkt bei Me~daten ist, da~ Messung h~ufig f@r die Zusammenfassung von vielen Einzelwerten steht (z. B. digitalisierte Me~- kurve). FUr die Bearbeitung solcher Messungen ist es w@nschenwert yon der Tr~gersprache APL aus, Programme, die in einer anderen Sprache (FORTRAN, PL/I, Assembler) Andere experimentelle entwickelt wurden, aufrufen zu k6nnen. Datenbanksysteme, die APL als Trggersprache ver- wenden, sind meist nur ffir geringen Datenumfang konzipiert (Palermo /I]/), Klebanoff, Lochovsky, Tsichritzis /12/) und erlauben den Einsatz von Programmen, die nicht in APL geschrieben wurden, entweder gar nicht oder nur mit ineffizienter Datenkommunikation (~ber externe Dateien). Bei der in Figur 5 beschriebenen Architektur erhalten wir ein System zur Probleml~sung mit Datenbankzugriff auf zwei Sprachebenen (prozedural und deskriptiv) Einsatzm~glichkeit von vorgefertigten Programmen aus einer leicht erweiterbaren Programmbibliothek (FORTRAN~ PL/] oder Assemblerprogramme) Hilfsmitteln Programme zur Verwaltung der Dokumentation fiber Daten und Automatischer Datenumwandlung in gew~nschte physikalische Einheiten Automatischer Datenkonversi~n, soweit durch Implementierung, Darstellung und Speicherung erforderlich Unterstfitzung graphischer Ein/Ausgabegergte Verffigbarkeit von Programmen zur graphischen Darstellung - einer Schnittstelle f~r leichte Substitution von Ein/Ausgabeger~ten 224 VM /370 Conversational Monitor System I CP/CMS o~andos ~ Informationssystem (Daten,Methoden) i Nicht procedurale Sprachebene (EQBE) Procedurale Sprachebene (DB-Service) Dateizugriff Spooling XRM DB-System ProgrammBib lio thek (FORTRAN, Assembler, PL/I) Schnittstelle ~ilfs'~ f@r prozessoren , Ein/Ausgabeger~te Menutechnik etc. Station FIGUR 3: Systemarchitektur ] Biid-~ schirm I I ~a~in 225 4.2 Be , i s p i e l e zur D a t e n b e a r b e i t u n $ Die folgenden zwei Beispiele sollen die Fghigkeiten zur Probleml~sung illustrieren. Im ersten Beispiel wird die Verbindung mit Programmen aus einer Programmbibliothek gezeigt, im zweiten Beispiel unter anderem die Bengtzung von globalen Variablen. 1. Welches in der Datenbank erfaBte Material hat einen mittleren Reflexionsbeiwert .~TERIAL~PEKTREN (zwischen 250 und 300 nm) gr6ger als 60? ~{¢TERIALNAME REFLEXIONSSPEKTRUM material reflexion AUSGABE SIMPSONREGEL INTEGRALWERT integral xl ÷ 250 x2 ÷ 300 STARTWERT 150 NM SCItRITTWEITE 5 NM ,,EINGABE iNTEGRAND 150 GRENZEN reflexion xl x2 60 Die obigen Eintragungen in das Tabellenschema der Materialspektren und eine schematische Darstellung des Programmes SIMPSON-REGEL zusammen mit einigen APL-Befehlen definieren die Ergebnisliste der gesuchten Materialien. Kommt als zus~tzliche Bedingung hinzu, dab f~r das gesuchte Material das Predukt aus spezifischem Gewicht y[kg/dm3], spezifischer W~rme c [ cal/(grad.g) ~[cal/(cm.grad.sec)] ] und W~rmeleitf~higkeit gr6~er als 0.5 sein mu~, so ist das obige Schema wie folgt zu erggnzen: 226 SPEZ. ~NTERIALWERTE GEWICHT LEITFI\III GKE IT NAME WXRME c gamna lambda ~material 0.5 >gamma[KG-DN~3]xc[CAL.GRADxG] xlambda [CAL.CMxGRADxSEC] Bei dieser Formulierung ist die Existenz einer Eintragung in der Tabelle }~9\TERIALWERTE gesichert. Eine widersprechende Eintragung k6nnte augerdem existieren (falls t~NTERIALNAME nicht Schlfisseleigenschaft hat). Bei der folgenden Abgnderung ist entweder die zusfitzliche Bedingumg erffillt oder nicht entscheidbar Eintragung der Materialwerte MATERIALWERTE (weil keine existiert): SPEZ. ~ I E ' .... IMATERIALW)~RME 1 LEITF)~HIGKEIT INAME i GEWICHT [gamma' ' c ] lambda [material 0.5 <- gamma.[KG+DM*3]xc[CAL+GRADxG]xIambda[CAL.CblxGRADxSEC] 2. Die Datenbank m6ge Aufzeichnungen schiedener Behandlungsarten enthalten fiber die Wirkung ver- sowie Daten fiber die behandelten Per- sonen. Um einen ersten 0berblick zu bekommen, ist eine Haufigkeitstabelle gewOnscht, die den Zusammenhang zwischen Wirkung und Behandlungsart ffir ein bestimmtes Kollektiv von Versuchspersonen (Raucher, m~nnlich, alter als 40 und mit 0bergewicht) BEHANDb -LUNG_ !AT IIART , NIRKUNGy PERSON, name wiedergibt. 227 PERSON / NAb~ GEWICHT GR~SSE b~NNLICH ALTER RAUCHER a I h ! name 40 < a [JAHR] i < g [KG]÷ (h[CM]-IO0) I + x(b) J ÷ y(b) P[I;J] ÷ P[I;J] Durch das Anh~ngen vorkommenden von (b) an x und y wird bewirkt, Paare x,y ber@cksichtigt Die globale Variable wird, dutch P ~- 5 werden dab alle in b (auch Wiederholungen). P, welche mit den Werten x(b), y(b) muB vor Ausf[hrung kungsstufen + I der Abfrage initialisiert 10 p O, wenn zwischen 5 Behandlungsarten unterschieden gebildet werden, z. B. und 10 Wir- wird. W~rden kontinuierliche Werte und Wirkung vorliegen, so mfiBte noch eine f~r Kennzeichnung von Behandlungsart Intervalleinteilung vor- gegeben werden f~r x und y, z.B.: x < x I, x I ~ x < x2, ..., x 3 ~ x < x4, x 4 ~ x Y < YI' Yl s x < Y2' "''' Y8 s y < Yg' Y9 ~ y durch Angabe der Zahlenfolgen IX und IY k6nnen gleichfalls Anstelle yon Intervallnummer [1]I "''' Y9" ~bergeben werden. tritt dann I + IX INDEX x(b) Dabei ist INDEX eine APL-Funktion, die die feststellt: IX INDEX X ÷ I + +/X Durch Ab~nderung ~ IX V der globalen Variablen neue Resultattabelle der Abfrage. ..., x 4 und IY ÷ YI' als globale Variable I ÷ x(b) und J ÷ y(b) und J ÷ IY INDEX y(b). VI IX ÷ Xl, erzeugt werden, IX, IY, P kann eine v611ig ohne neuerliche Kompilation 228 4.3 Einsatzm6glichkeiten Das vorgestellte System ist in erster Linie fur die Bearbeitung von Me~- daten in Wissenschaft und Technik konzipiert. Sicher linden sich auch im kommerziellen Bereich Einsatzm6glichkeiten, z. B. fur interaktive Datenanalyse mit dem Ziel, Zusammenh~nge zu erkennen, die sinnvolle Vor- hersagen erm~glichen. ComputerunterstUtztes Entwerfen (CAD) als spezielles Anwendungsgebiet dieses Systems wird am Wissenschaftlichen Zentrum Heidelberg untersucht (Kantorowitz /13/). Wie Figur 4 zeigt, l ~ t sich das System - je nach Benutzerstandpunkt - charakterisieren als: - Erweiterung yon APL (Datenbankzugriff, Einsatzm6glichkeit yon kompilierten Programmen, Auskunftssystem Uber Methoden, Programme und Daten), Erweiterung eines Datenbanksystems zum Probleml8sungssystem aktive Datenmanipulationssprache :nit Einsatzm6glichkeit Sammlung yon Programmen, Auskunftssystem ~ber Methoden, - und Daten), Erweiterung einer Sammlung von Programmen (Datenbankzugriff, f~r eine Programme zum Probleml6sungssystem interaktive Datenmanipulationssprache, system ~ber Methoden, Programme und Daten), - Erweiterung eines Auskunftssystems Nber Methoden, Daten zum Probleml6sungssystem (Datenbankzugriff, manipulationssprache (Inter- Auskunfts- Programme und interaktive Daten- mit Zugriff zu Programmbibliothek). 229 Funktion Datenmanagementsystem Interaktive Tr~gersprache Komponente XRM (Extended Relational Memory) APL Prozedurale Sprachebene (Interface zu XRN) APL-Funktionen Deskriptive Sprachebene (mit Obersetzung nach APL) EQBE (Extended Query by Example) Programmbibliothek Scientific Subroutine Package (Ben~tzerprogramme) Programmdokumentation Methodenbank (in APL implementiert) Kommandosprache filr Programmaus f~hrung EQBE Interaktive graphische Datenmanipulation GRAPHPAK Sequentielle Dateien CHS/APL Schnittstelle APL - PL/I Hilfsprozessor Unterst~tzung von Ein/Ausgabegergten APL-Funktionen IGUR 4: Komponenten zum Probleml6sungssystem 230 Keine der oben erw~hnte~ Komponenten stellt allein gesehen ein Novum dar~ ausgenommen vielleicht das Auskunftssystem @ber Daten~ Methoden und Programme. Im Zusammenwirken entsteht ein System zur interaktiven Bearbeitung umfangreicher Megdaten, bei dem der Probleml~ser selbst (ohne Zwischenschaltung yon Programmierern) die f~r ihn wichtige Information aus einer Datenbank unter anwendungsspezifischen Auswahlkriterien abrufen und f~r gleichfalls anwendungsbezogene Berechnungen nutzbar machen kann. Bei unserem experimentellen System wird APL als Implementierungs- und Tr~gersprache verwendet, um entsprechend implementierte Teilsysteme leicht einf~gen zu k6nnen. Durch die Wahl der Komponenten bedingt bleiben manche Aspekte eines Datenmanagementsystems zun~chst unber~cksichtigt. Dennoch kann das System helfen, Erfahrung zu sammeln ~ber die Forderungen, die f~r die Bearbeitung yon Me~daten an Datenbanksystem, Datenmanipulationssprache und Auskunftssystem zu stellen sind, um ein Probleml6sungssystem zu erhalten, das auch ffir Nichtprogrammierer attraktiv ist. 23t Literatur [ 1] E.F. Codd, "A Relational Model of Data for Large Shared Data Banks", CACM, Vol. 13, No. 6, June ]970, pp. 377-387 [ 2] E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", Proc. 1 9 7 1 A C M SIGFIDET Workshop [ 3] E.F. Codd, "Interactive support for Non-Programmers: The Relational and Network Approaches", Prec. 1974 ACM SIGFIDET Workshop [ 4] R.F. Boyce, D.D. Chamberlin, "SEQUEL: A Structured English Query Language", Proc. 1974 ACM SIGFIDET Workshop [ s] R.F. Boyce, D.D. Chamberlin, "Using a Structured English Query Language as a Data Definition Facility", IBM Research Report RJ 1318, Dec. 1973 [ 6] M.M. Zloof, "Query by Example", IBM Research Report RC 4917,July 1974 [ 7] R.A. Lorie, "XRM an Extended (n-ary) Relational Memory", IBM Technical Report 320-2096, Jan. 1974 [ 8] R. Erbe, G. Walch, "An Interactive Guidance System for Method Libraries", IBM Germany, Wissenschaftliches Zentrum Heidelberg, Technical Report 75.O4.OO1, April 1975 [ 9] R.A. Lorie, A.J. Symonds, "A Relational Access Method for Interactive Applications", Courant Computer Science Symposia 6, "Data Base Systems", 1971, Prentice Hall [IO] M. Schatzoff, "Interactive Statistical Data Analysis - APL Style", IBM Technical Report 320-2079, April 1972 [11] F.P. Palermo, "An APL Environment for Testing Relational Operators and Search Algorithms", Proc. APL 75 [12] J. Klebanoff, F. Lochovsky, D. Tsichritzis, "Teaching Data Base Concepts Using APL", Proc. APL 75 [13] E. Kantorowitz, "A Computer Aided Design Front End for the Measurement Data Base", IBM Germany, Wissenschaftliches Zentrum Heidelberg, Technical Note 75.07, July 1975 DATENBANKORGANISATION BEI DER HOECHST AKTIENGESELLSCHAFT Otmar Saal, Diplom-Volkswirt, HOECHST AKTIENGESELLSCHAFT Zusammenfassun~ In einem generellen Rahmen wird zun~chst aufgezeigt, von welchen Bedingungen und ~berlegungen HOECHST bei der Planung von Datenbanksystemen ausgeht. Am Beispiel Yon Anforderungen seitens stark integrierter Abrechnungs- und Abwicklungssysteme werden dann ausgew~hlte Fragen aus der praktischen Anwendung yon Datenbank- und Datenkommunikationssystemen er6rtert. DATENVERARBEITUNG IM SYSTEMVERBUND Um den Rahmen der sp~teren Ausf~hrungen verst~ndlich zu machen, erscheint es zweckm~Big, zun~chst einen kurzen Oberblick ~ e r die Struktur unseres Unterneh- mens zu geben. Die HOECHST AG legte f~r das GeschAftsjahr 1974 einen WeltabschluB vor, in dem 0bet 400 in- und ausl~ndische Gesellschaften konsolidiert sind, an denen das Unternehmen mit mindestens 50 % beteiligt ist. Weltweit wurde ein Umsatz yon 20,2 Mrd. DM erzielt. Die Produktionspalette deckt mit etwa 50.000 verschiedenen Erzeugnissen fast vollst~ndig das gesamte Gebiet der Chemie ab. Das Gesamtunternehmen HOECHST wird in 3 Gruppen betrachtet, n~mlich HOECHST Welt, HOECHST Konzern und HOECHST AG, wobei sich die nachfolgenden Ausf~hrungen iiberwiegend auf die Muttergesellschaft mit insgesamt 13 inlandischen Werken und einem Umsatzvolumen im Jahre 1974 von 9,7 Mrd. DM beziehen werden. Betrachten wir das Zusammenwirken der einzelnen Unternehmenseinheiten der HOECHST AG (Werke, Konzerngesel!schaften, Auslandsgesellschaften) chen (Ressorts, Bereiche, Unternehmensleitung) Aufgaben f~r die Datenverarbeitung, mit den Funktionsberei- hinsichtlich der dabei anfallenden dann wird offensichtlich, dab die notwendigen 233 Daten fur Abrechnungssysteme, mations- und Planungssysteme erfaBt, zugef~hrt, Abwicklungs- und Dispositionssysteme sowie Infor- nut durch umfassende Systeme der Datenverarbeitung einheitlich aufbereitet, gespeichert und ausgewertet werden k6nnen. Dementsprechend entsprechendes tr~gt ein den jeweils zentralen und/oder dezentralen Aufgaben System von Datenverarbeitungseinrichtungen archisch organZsierten Zusammenwirken in einem quasi hier- dazu bei, die ben6tigten Daten zu erfassen und zu verarbeiten und/oder f~r eine weitere Stufe der Verarbeitung im Gesamt- system bereitzustellen. Unter der Bezeichnung Systemverbund HOECHST arbeiten wir an der Realisierung einer Konzeption, die die Probleme einer zentralen und dezentralen Datenverarbeitung in einer mSglichst effizienten und betriebssicheren helfen soll. Unser Mehrrechnerverbundsystem yon 3 Gro~rechnern oder lokalen Weise 16sen in der Zentrale wird sinnvoll erg~nzt durch einen Verbund angepaBter dezentraler Rechner- oder Terminalintelligenz, wobe± weniger eine System-Distribution im Vordergrund sondern vielmehr die Verwendung der jeweils geeignetsten Einrichtungen Aufgaben der lokalen Datenerfassung Kommunikationseinrichtung steht, far die und Verarbeitung mit einer m6glichst direkten zum zentralen System. Soweit m~glich und sinnvoll wer- den vom lokalen Rechner aus auch die zentralen Ressourcen mittels "Remote-JobProcessing" genutzt, wof~r ein entsprechendes steht. Die Werke und Gesch~ftsstellen in die Lage versetzt, eigene werksbezogene gen, die normalerweise ordnung 0bersteigende Workstationprograrrm% zur Verf~gung werden dutch dieses Verbundsystem Informationsbed~rfnisse dort nut dutch eine die wirtschaftlich Datenverarbeitungsanlage Dieses Gesamtsystem der Datenverarbeitung au~erdem zu befriedi- sinnvolle Gr6Ben- erf~llt werden k~nnten. in unserem Unternehmen kann sich aber nicht allein auf eine Weitergabe von Daten f~r die £ibergeordneten zentralen Datenverarbeitungsaufgaben Teileinheiten dutch die jeweils 6rtlich oder funktional getrennten erstrecken, sondern verlangt gerade im Bereich der Datenspeiche- rung und Informationsauswertung eine einheitliche Architektur. Damit ergab es sich fast zwangsl~ufig, dab die Datenverarbe±tung bei HOECHST in konsequenter Verfolgung des Konzeptes einer integrierten Datenverarbeitung zu einer Datenbankorganisation der Datenbest~nde die physischen kommen muBte, die eine Allgemeinverwendbarkeit sowohl fiber die einzelnen Anwendungsbereiche Grenzen eines einzelnen Rechenzentrums Wie bei jedem Unternehmen, als aber auch Ober hinaus sicherstellen kann. das schon sehr frOhzeitig mit dem Einsatz der Daten- verarbeitung begonnen hat, strebte auch HOECHST anfangs vorwiegend die Inte- 234 gration der Datenerfassung dungsgebieten und einen effizienten DatenfluB an0 Das soll abet keineswegs bedeutent heitlichkeit zwischen den Anwen- dab beim A~fbau der zentralen Dateien die Ganz- der Planung und die Beachtung der Gesamtzusammenh~nge vernachl~ssigt worden ist. Es standen eben anfangs fiberwiegend Projekte der Massendatenverarbeitung in den Abrechnungsf~r den eigentlichen orientierten und Administrationsbereichen Fachbereich Datenverarbeitung die Belange der unmittelbaren gehende, an, die zuerst einmal primer aufgebaut wurden und die aufgrund der stapel- auch meist in der Dateiorganisation Fachbereiche nicht fachbereichstypische tenerfassung und der z w e c k m ~ i g s t e n 0berwiegend Daten dienten vorwiegend der integrierten DaWeitergabe m6glichst umfassend gepr~fter Daten. Schon rein technisch gesehen hatten wir damals keinerlei M6glichkeiten lich einer umfassenden, handhabenden auf hin organisiert waren. Dar~ber hinaus- abet dennoch m~glichst anpassungsf~higen hinsicht- und leicht zu integrierten Datenbankorganisation. Erst die techno!ogischen Entwicklungen der Datenverarbeitung lieBen uns etwa ab 1967 dutch geeignete externe Speicher mit wahlfreiem Zugriff und dutch die neuartigen Kommunikationsformen Realisierung im Rahmen einer Echtzeitverarbeitung von umfassenderen wendungssystemen und dateiorganisatorisch an die Planung und starker integrierten An- herangehen. AUFGABEN DER DATENBANKEN IM GESAMTINFORMATIONSSYSTEM Seit etwa 6 - 7 Jahren befinden sich Art und Struk~ur unserer Anwendungen starken Wandel. Die reinen Abrechnungs- und Administrationssysteme in einem konnten jetzt durch Datenbankkonzeptionen und direkten Zugriff Ober Datenfernverarbeitungsein- richtungen zu Dispositions- und Informationssystemen ausgebaut werden. Die Daten- speicherung kann yon dem bisher ~iberwiegend inaktiven Zustand auf den Magnetb~ndern in sine aktive, jederzeit yon den Benutzern auf Magnetplattenspeicher tionssysteme ~berf~hrt werden. f~r die Fachbereiche die Art der maschinellen ansprechbare Speicherungsform Die Datenerfassungs- Durchf~hrung durch die Echtzeitverarbeitung neller gestaltet als abet auch dutch die direkte VerknOpfbarkeit banken aussagef~higer. viel ratio- zu anderen Daten- Denn nun konnten die f~r den Fachbereich notwendigen mationen durch Zugriff auf die Datenbankorganisation zentral gef~hrte Datenbanken baut werden. und Administra- selbst wurden dadurch sowohl im Hinblick auf anderer Fachbereiche leichter zu wirksamen Teilinformationssystemen Zus~tzlich erm6glicht die integrierte Datenbankorganisation direkten Abruf yon Daten aus fachbereichsbezogenen Schwerpunktsystemen mehrerer Fur~tionsbereiche Teilsystemen Infor- oder auf ausge- einen zur Bearbeitung in oder gar in zentralen Informations- 235 systemen. Wenn wit den Begriff "zentrales Informationssystem" system" anstatt der vielfach Qblichen Bezeichnung benutzen, oder auch "zentrales Berichts"Management Information System" dann hat das seinen Grund. Uns scheint MIS zu stark auf eine Informationsgewinnung ebenen festgelegt, wodurch der Eindruck erweckt wird, da~ das entsprechende der Datenverarbeitung und Datenspeicherung dab die Informationsbed~rfnisse der operativen Ebene his bin zur h~chsten F~hrungsebene zentral gef~hrten Datenbanken System primer unter diesem Gesichtspunkt kon- zipiert wurde. Wir sind vielmehr der Ansicht, gemeinsamen, nur f~r h6here FQhrungs- von unbedingt aus jeweils abgedeckt werden mOssen. Diese Daten- banken selbst k~nnen dann durch verschiedene Teilinformationssysteme erstellt werden und dienen zun~chst einmal primer zur Bew~Itigung der Aufgaben in Systemen, die f~ir die operative Ebene erstellt wurden. DaB diese Datenbanken dar~ber hinaus auch in der Lage sein m~ssen, die Anforderungen systemen abdecken zu k6nnen, d a s i s t von Obergeordneten im wesentlichen Informations- eine Frage einer planvollen und flexiblen Datenbankstruktur. Eine planvolle und auf die Erf~llung aller zentralen Informationsbed~rfnisse gerichtete Datenspeicherung System der Datendefinition und DatenverschlOsselung. bei HOECHST vom Beginn der Datenverarbeitung zentrale Definition, Dementsprechend wird auch an ein sehr gro6er Wert auf die Entwicklung und Pflege aller Schl~sselbegriffe und Ordnungs- kriterien gelegt, die in einem zentralen Schl~sselbuch der Datenverarbeitung das gesamte Unternehmen verbindlich Unter der Voraussetzung dann prinzipiell Anforderungen aus- erfordert aber zun~chst einmal ein einheitliches for festgelegt und erg~nzt werden. einer klaren Datendefinition und Verschl~sselung ist es kein allzu schwieriges Problem mehr, die Daten den jeweiligen der Informationssysteme zu verkn~pfen und auszuw~hlen. entsprechend bereitzustellen, zu verdichten, Wenn man ein geeignetes Datenbank-Management-System zur Verf~gung hat, kann durch einen universellen Aufbau der Datenbanken viel elastischer und unmittelbarer auf wechselnde Informationsbed~rfnisse des Manage- ments reagiert werden als dies bei dem starren Rahmen eines einmal vorgedachten und im Dateieninhalt festgelegten MIS m6glich w~re. Damit gehen wir bei HOECHST eindeutig den Weg, zun~chst einmal sehr umfassende Teilinformationssysteme aufzubauen und das eigentliche MIS als ein quasi Qberge- ordnetes "zentrales Berichtssystem" der Datenbanken durch die gemeinsame, jederzeit aussagef~hig zu halten. zentrale Organisation 236 DATENBANKEN IN EINEM INTEGRIERTEN TEILINFORMATIONSSYSTEM Dutch praktische Beispiele m~chte ich die bisherigen generellen Aussagen etwas konkreter werden lassen. Um abet auch hierf~r zun~chst den Gesamtzusammenhang verstandlich zu machen, werde ich einen Uberblick 0bet ein umfangreiches stark verzahntes seinerseits aus einer Anzahl fir sich allein wirksamer Teilsysteme Die derzeit engste Verzahnung im Datenverbund im Bereich der Auftragsabwicklung, Produktionsdatenerfassung, disposition der Versanddisposition sowie der KontokorrentfOhrung. auch auf zentrale Datenbanken zur~ck. yon der Auftragsannahme tions- undAbwicklungsstufen his 168t sich nut voll automatisiert und Disposition, und -Abwicklung, In den wesentlichen im Echtzeitbetrieb Eine Auftragsabwicklung der Einkaufs- Bestandteilen ~ber die verschiedenen ar- Disposi- im Kontokorrent wenn auch die jeweils relevanten und wirklich aktuellen Daten aus den tangierten anderen Teilsystemen Zugriff zur Verf~gung der und greifen dabei weitestgehend hin zur Rechnungsverbuchung durchf~hren, besteht. haben wit zwischen den Teilsystemen der Lagerbestandsf~hrung beiten alle diese Teilsysteme im direkten gestellt werden k6nnen. Dementsprechend ben6tigt bereits das Teilsystem der Auftragsabwicklung nungsschreibung umfassenden Produkt, und System im Bereich des Verkaufs und der Produktion geben, das Zugriff auf Informationen Lagerbestandsf~hrung, Abet diese gegenseitige Produktionsplanung, Bereitstellung und Rech- aus den Bereichen Kunden, Transportmittel und andere. von Daten yon und for andere Arbeitsge- biete darf keineswegs nut im engen Rahmen eines lokal orientierten Systems erfolgen, sondern mu~ dem Gesamtverbund Unternehmenseinheiten der Abwicklung und Abrechnung ~ber einzelne hinweg Rechnung tragen. So kann die Definition der Kundenauftr~ge Gesch~ftsstellen erfolgen~ und die Auslieferung l~gern oder von den verschiedenen Anhand einiger ist von AuBenl~gern, Betriebsst~tten stark vereinfachter reich der Auftragsabwicklung die zur entsprechenden sowohl in dem Stammhaus als auch in den Frage- und Aufgabenstellungen aus diesem Be- bei HOECHST werde ich nun versuchen, Datenbankorganisation (Lager, Werke, B = Best~nde (Istbestand, C = Kunden und Lieferanten Uberlegungen, gef~hrt haben, zu verdeutlichen. Dazu stelle ich drei stark integrierte Datenbanken heraus, A = Auftrage von Zentral- in Deutschland aus mSglich. n~mlich f~r: interne Lieferungen) Dispositionsbestand, Prod.-Plan, (Offene Posten, Bestellungen) Bestellung) 237 Zun~chst soll durch Auftragsdefinition gang gebildet werden. in dem Bestand der Datenbank A ein Neuzu- Dazu bedarf es abet bereits bei der Auftragsannahme folgen- der Feststellungen: - Kann die Ware zur Zeit ~berhaupt geliefert werden und zu welchen Konditionen, oder wenn nicht, wann und von welchem Lager oder Produktionsbetrieb kann wieder geliefert werden? - Ist der Kunde bez~glich der Kunde gleichzeitig zwischen Kundenobligo seines Kreditlimits noch belieferbar, auch als Lieferant vorkon~nt, und unseren Verbindlichkeiten Um diese Fragestellungen beantworten oder aber, falls wie sieht die Differenz oder Bestellwerten aus? zu k6nnen, mOssen f0r die erste Frage Infor- mationen aus der Datei B und f~r die andere Frage Daten aus der Datenbank C zur Verf0gung stehen. Die Ver~nderungen handelt, der Bestandsdatei dutch Aktivit~ten bewirkt, verursacht werden, (B) werden, soweit es sich um Betriebsbest~nde die nicht allein durch das Verkaufsgeschehen sondern ebenso durch Zu- oder Abgange im Produktionsproze~ denn bei der Auftragsabwicklung in den Produktionsl~gern einer von vielen statusver~ndernden Vorg~ngen. ist der Kundenauftrag Eine st~ndige Dispositionsbereit- schaft erfordert n~mlich noch eine Reihe ~nderer Daten. Je nachdem, tion auftragsorientiert, lagerorientiert, abl~uft, m~ssen die entspreehenden daten, Auftragsdaten, kontinuierlich Dispositionssysteme Produktionsplandaten, nur ob die Produk- oder diskontinuierlich auch auf aktuelle Bestands- Anforderungen aus Produktion und Be- stelldaten zugreifen k~nnen. Ausgehend von einer einfachen Fragestellung nach der Lieferbereitschaft f0r ein Produkt k6nnen wir jetzt bereits eine beachtliche Verkn~pfung verschiedener systeme erkennen, die alle einen gemeinsamen Integrationspunkt Teil- in der Bestands- datenbank haben. Auch bei der Beantwortung wit Abh~ngigkeiten der anderen Frage nach der Bonit~t des Kunden erkennen vom Zahlungseingang, anderer Verkaufsbereiche yon der Auftragsannahme durch Disponenten und schlie~lich yon den eigenen Bestellanforderungen wie der Begleichung unserer Lieferantenrechnungen, so- falls dieser Kunde auch gleich- zeitig uns gegen~ber Lieferant ist. In meinen bisherigen A u s f ~ h r u n g e n w u r d e bung und andeutungsweise absichtlich diese organisatorische auch die funktionalen Zusammenh~nge Umge- der Teilsysteme mit 238 aufgezeigtg um klar erkennen zu lassenf dab die Dateiorganisation vor allem unter dem Aspekt der Gesamtzusa1~nenh~nge gesehen werden muB. Datenbankorganisation muB sich n~mlich yon der frfiher vorherrschenden Dateiorganisation dadurch unterscheiden, dab eine universelle Verwendbarkeit der Daten fQr eine Vielzahl von Anwendungen auch ~ber den prim~ren Anwendungsbereich hinweg erreicht werden kann. Wenn man zus~tzlich die bekannten Postulate f~r den Einsatz von Datenbanken erf~llen will, n~mlich Aktualit~t der Daten f~r alle Benutzer, Redundanzfreiheit und Zugriff zu den Daten nach verschiedenen Kriterien, dann war dies genau die Ausgangssituation unserer Uberlegungen, als wir etwa Anfang 1968 an die System- planung fiir unser erstes Auftragserfassungs- und -Abwicklungssystem im Echtzeitbetrieb herangingen und uns nach einer geeigneten Datenbank-Software umsahen. Unsere Anforderungen an die einzusetzende Datenbank-Software betrafen aber nicht nur ein Instrument fur die eigentliche Datenbankverwaltung, sondern wit suchten ein insgesamt flexibles und ausbauf~higes, abet auch in seiner weiteren Entwicklung abgesichertes System. Da auch unsere ersten Datenbank-Anwendungen nut als Teilssysteme konzipiert wet- den konnten und selbst innerhalb der Einzelsysteme lediglich in Entwicklungsphasen zu realisieren sind, muBte das einzusetzende Dahenbanksystem ebenfalls in seiner Struktur recht anpassungs- und ausbauf~hig sein und im Datenbankverwaltungsteil leicht und sicher die Integration weiterer Anwendungsprogram/ne ermSglichen. IMS als Datenbank-Software Noch bevor uns IMS bekannt wurde, haben wir unter dem Stand der Erkenntnisse und M6glichkeiten Anfang 1968 versucht, einen eigenen Datenbankprozessor zu entwickeln. Ausgehend yon der Dateiorganisation des Stficklistenprozessors sollte die notwendige Strukturierung der Dateien m~glichst und fiber entsprechende Makros der universelle Zugriff zu den Datenelementen realisiert werden. Doch noch w~hrend der Entwicklung dieses eigenen Datenbankprozessors wit Vorabinformationen ~ber das Information Management System erhielten (IMS) und ent- schlossen uns nach einem umfangreichen Systemtest zum Einsatz dessen Datenbankteiles (DLI). Ffir die Datenbankorganisation mit IMS sprach vor allem der aufgrund der Baumstruktur gegebene flexible Aufbau mit maximal 256 Segmenttypen und einer var~ablen 239 Segmentanzahl auf 15 verschiedenen programmunabh~ngigen Stufen. Darf~ber hinaus bot das IMS, durch die Datenbankbesehreibungen einzelnen Benutzerprogramme g~ngig machen zu kSnnen, und dutch die Einrichtung, nur jeweils erforderliche eine uns ideal erscheinende M6glichkeit, gige und dem wachsenden Integrationsgrad grationsgrad zufrieden geben, dutch unterschiedliche mationsbedOrfnis die Notwendigkeit, programmunabh~n- gut anpaBbare Datenbanken aufzubauen. Konnten wir uns in den beiden ersten Anwendungsjahren Struktur unserer IMS-Datenbanken f~r die Segmente als sensitiv zu- noch mit der stark linearen so brachte der wachsende Inte- Anwendungssysteme und das ansteigende Infor- die Datenbanken unabh~ngig von ihrer phy- sischen Speicherung auch logisch strukturieren zu k6nnen, wie dies dann ab IMS Version 2 auch m6glich wurde. Ebenfalls mit Version 2 wurde auch der Datenkomunikationsteil zentralen System f~r die Nachrichtenarmahme, zwischen 140 Datenstationen fast nur fur den IMS-Betrieb £ibernommen, des IMS zu unserem -Steuerung- und Verwaltung von in- so dab heute schon ein System /370-168 eingesetzt werden muB. Dar~ber hinaus wird IMS auf- grund seiner zentralen Datenbankverwaltung und Nachrichtensteuerung Verbindung mit GIS eingesetzt und auBerdem mit STAIRS verkn~pft. heute yon HOECHST als ein umfassendes steuerungssystem Datenbankverwaltungs- und Nachrichten- angesehen. Aufgrund der zentralen Bedeutung der mit IMS organisierten recht hohen Nachrichtenaufkommens Vorg~nge jetzt auch in Somit wird IMS Datenbanken mit einem starken Anteil anderungswirksamer in den Datenbanken muBten wir besonders beim Datenbankdesign der Programmstruktur Sehnelligkeit und bei der Wahl der Zugriffsbefehle und Sicherheit und eines sowie bei einen groBen Weft auf legen. IMS l~Bt dem Benutzer einen groBen Spielraum bei der Organisation und Strukturierung der Datenbanken. beitungsgeschwindigkeit Deren Design aber beeinfluBt ganz entscheidend die Verarder zugeh~rigen Anwendungsprogramme Hinblick auf den Gesamtdurchsatz und kann sich auch im im IMS-System sp6rbar bemerkbar machen. Da zu Beginn der Anwendung von IMS weder Erfahrungen vorlagen noch in irgendeiner Weise ein Verfahren zur Simulation des Zeitverhaltens schiedlicher Strukturierung zur VerfOgung der Datenbanken bei unter- stand, muBten viele grundlegende Er- kenntnisse yon uns zuerst einmal im Rahmen spezieller Testuntersuchungen gesam- melt und dann im praktischen Betrieb erg~nzt und angepaBt werden. Allerdings muBten im Laufe der Zeit manche unserer dabei gewonnenen Regeln infolge wesentlicher Anderungen von Hard- und/oder Software wieder neu fiberdacht und ver~ndert werden. 240 So hat die Verf~gbarkeit 0bet preiswertere Plattenspeicher mit erheblich verbesserter Speicherkapazit~t einerseits und die immer aufwendiger werdende K o ~ u n i k a tion zwischen Benutzer- und Verwaltungssystem bei den neuen Betriebssystemen andererseits dazu gef~hrt F v o m Konzept der tieferen Strukturierung mit feiner Seg- mentierung wieder abzugehen. Es wird dabei zwangsl~ufig mit gr~Beren !nformationseinheiten (Segmenten) gearbeitet, die jedoch oft nieht voll genutzt werden und entsprechend mehr externen Speicherplatz ben~tigen. Der Mehraufwand bei der Daten- bereitstellung im Anwenderprogramm zwischen einem gr~Beren und einem kleineren Segment ist verschwindend gering gegeniiber dem zweimaligen Kommunizieren zwischen Anwender- und Kontrollprogramm. Ahnliche Einsparungen erlauben die im Laufe der Weiterentwicklung yon IMS eingef~hrten Syntaxverbesserungen. W~hrend fr~her im Regelfall mehrere Segmente ange- fordert wurden und die Auswahl im Anwenderprogramm erfolgen muBten, erlauben es jetzt die booleschen Verkn~pfungen verschiedener Kriterien in den Suchanweisungens die gew~nschten Informationen mit weniger Aufrufen vom System ausw~hlen zu lassen. In der Organisationsform der IMS-Dateien streben wit heute ~berwiegend Verfahren an, bei denen zum Auffinden des Satzes nicht mehr das aufwendige Durchsuchen der Indextafeln erforderlich ist, sondern durch ein Umrechnungsverfahren aus dem Sor- tierschl~ssel eine direkte Adresse ermittelt werden kann. Allerdings ist es oft schwierig ein Verfahren zu finden, das g l e i c h m ~ i g teilt. Dies gilt besonders f ~ 0her den gegebenen Bereich ver- sogenannte sprechende Schl~ssel, die keinerlei R~cksicht auf eine Speicherorganisation nehmen° Obwohl bei diesen Umrechnungsverfahren in der Regel die Sortierfolge verloren geht, interessiert uns der schnellere zugriff f~r die Echtzeitverarbeitung erheblich mehr als der vermehrte Aufwand f~r ein gelegentliches sequentielles Verarbeiten dieser Datenbest~nde. Heute sind bei HOECHST ca. 60 % aller online-Dateien nach direkten Zugriffsverfahren organisiert. Die restlichen Dateien konnten wegen ihrer Schl~sselstruktur und h~ufigen sequentiellen Verarbeitung noch nicht umgestellt werden. Problematisch bei den indexorientierten Verfahren sind Neuzug~nge, da IMS f~r sie 0berlaufketten bildet, was die Performance ganz erheblich senkt. Gerade bei einem online-System werden die neuen S~tze in mehreren Phasen gepr~ftt verarbeitet und weitergeleitet~ wobei jeweils ein aufwendiges Lesen erforderlich ist. In jeder Nacht reorganisieten wit die meisten dieser Dateien, wobei ben~tigte Auswertungen und Statistiken erstellt werden und als Datensicherung eine Kopie anfallt. VSAM als verbesserte indexorientierte Zugriffsform des Betriebssystems wird zur Zeit bei uns getestet. An einen produktiven Einsatz ist aber erst zu denken, wenn wit vom sicheren fehlerfreien Funktionieren im Zusammenspiel mit IMS Oberzeugt sind. 241 Dutch praktische Erfahrungen wurde auch ein gewisser Wandel bei der Gestaltung umfangreicher zentraler Datenbanken ausgel~st; denen Benutzern oft recht unterschiedliche Umfang aufzunehmender Daten gestellt werden. banken, wie beispielsweise spezielle Abwicklungs- speziell dann, wenn yon verschie- Anforderungen hinsichtlich der Kunden- und Lieferantendatenbank, oder Abrechnungsprograrr~ne k6nnen h~ufig Ober das hinausgehen, Inhalt und Die in solchen zentralen Stammdatenf~r einzelne zu speichernden Informationen, was fur die restlichen Benutzer jemals von Bedeutung sein kann. Wir batten dieses Problem vor allem bei typischen branchenbezogenen serer Kundendatei, die beispielsweise deren Ausf~llung des einheitlichen dustriebereichs. bei Arzneimittelkunden Strukturrahmens zu einer v611ig an- f~hrte als bei Kunden des In- Hinzu kommen h~ufig abweichende Anforderungen tualisierung der Informationen, Zust~ndigkeit Daten in un- hinsichtlich Ak- im ~nderungsdienst und Aufnahme neuer Daten, wodurch trotz allen Komforts der Datenbankverwaltungssysteme immer wieder Unruhe auch in diejenigen Benutzergruppen gen wird, die primer v o n d e r Struktur nicht betroffen doch solcher Datenbanken getra- Erweiterung oder einer kleineren Anderung in der sind. Aus ~berwiegend pragmatischen Gr~nden haben wir uns daher in einigen F~llen weniget an die reine Theorie eines universell verwendbaren und redundanzfreien bankkonzeptes gehalten und mehr die flexible und benutzerfreundliche sowie ein sicheres Verhalten der Datenbank in der Systemumwelt Daten- Handhabbarkeit in den Vordergrund unserer Oberlegungen gestellt. Darum wurden einige bisher schon recht komplexe Datenbanken und fur sich durch neu hinzukommende Umfang erweitert. Anwendungsgebiete nicht mehr in dem an erforderlich werdenden Wir gingen nun verst~rkt auf das Prinzip der Auslagerung speziel- let Daten in dedizierte Dateien ~ber. Diese Subdateien bleiben logisch in gewissem Umfang noch v o n d e r Mutterdatenbank abh~ngig, weil der Stammteil der Informationen von dorther eingespeist wird. Auf der anderen Seite mOssen ver~nderte Daten aus der Subdatei v611ig unabh~ngig yon der Mutterdatenbank ~bernommen werden kSnnen. Physisch werden diese Subdateien v611ig unabh~ngig yon dex Mutterdatenbank und erhalten dort im Rootsegment weise oder L6schungen. einen bereitser~ffneten er~ffnet werden. lediglich Vermerke ~lber Er6ffnung, Logisch bleiben sie dadurch voneinander Stammteil Derartige Aufteilungen ~nderungshin- abh~ngig, denn ohne in der Mutterdatei kann auch die Subdatei nicht Ebenso d~rfen Basisdaten solange entsprechende gef~hrt in der Mutterdatei nicht gel6scht werden, Daten in den Subdateien noch ben6tigt werden. einer Datenbank in Mutterdatei und Subdateien k6nnen nicht nur aus organisatorischen Gr~nden erfolgen, sondern m~ssen auch durch die 242 Verhaltensweise der Hard- und Softwaresysteme sehr viele Anwendungsgebiete die Zugriffsh~ufigkeiten kommen. Andererseits der Update-Vorg~nge in Erwagung gezogen werden; auf den gleichen Plattenstapel bereitet uns die tempor~re einige Zeitprobleme, rungen sinnvollerweise denn bei umfassenden Datenbanken kann es schon allein durch im Stapelbetrieb zu erheblichen Engp~ssen Sperrung der Datenbanken w~hrend vor allem dann, wenn die Datenbank~ndeauf einem anderen Rechner durchgef~hrt werden. Durch die Aufteilung Bereich in dedizierte Subdateien erfa6glichen wit es jedoch, dab im der voneinander unabh~ngigen schiedene Anwendungsprograrmme nen. Synchronikationspunkte vermerken Daten einer logischen Gesamtdatenbank weitgehend ungest6rt und gleichzeitig und ein aufeinander in den Einzeldateien abgestimmtes ver- arbeiten k~n- System von Hinweis- sorgen dann daf~r, dab der Gesamtzusammenhang der Datenbank erhalten bleibt. Ein anderer in der praktischen Arbeit nicht zu untersch~tzender zierten Subdatenbanken Fehlersituationen lieber eine gewisse Datenredundanz wichtige Progran~ne im Falle eines Ein-/Ausgabefehlers in Kauf, als dab oder anderer technischer im Zugriff zu lange auf die Durchf~hrung umfangreicher stellungsmaBnahmen fur die Gesamtdatenbank warten mOssen. auch dann~ wenn die nachts im Stapelbetrieb ganisationsl~ufe bei oder sonstigen St~rungen im System. So nehmen wir ggf. innerhalb dieses Subdatenbanksystems Behinderungen Vorteil yon dedi- liegt in der erheblich besseren Reaktionsf~higkeit zentraler Datenbanken durchzufNhrenden Anderungs- und Reor- eine St~rung im Ablauf erfahren und nicht mehr rechtzeitig bis zum Anlaufen des Echtzeitbetriebs nen. Zugeh~rige Subdatenbanken Wiederher- Dies gilt im Prinzip bereitgestellt werden k6n- hingegen bleiben yon diesen St6rungen oft v~llig unber~hrt oder k6nnen auch ohne die anstehenden Datenbank~nderungen weiterhin benutzt werden. Andererseits k~nnen abet auch Auswirkungen Gesamtdatenbanksystem yon Programmzusanm~enbr~chen muS dann nicht unbedingt die Gesamtdatenbank programme stoppen, geschenkt; einer gewissen Anf~lligkeit sowohl durch die Benutzersysteme Soft- und Hardware, Abh~ngigkeiten zur schnellen Behebung der Fehlersituation Gerade dem Problem der Vermeidung wirkungen, und damit alle tangierten Anwendungs- sondern kann wegen geringer gegenseitiger gezieltere MaBnahmen auf ein dutch dedizierte Subdateien geringer gehalten werden. Man einleiten. gegen~ber St~rein- als aber auch nach wie vor durch wird heute vom Hersteller noch viel zu wenig Aufmerksamkeit denn welche Vorteile soll eine theoretisch und auf alle Informationsbelange viel eingerichtete sehr sinnvoll DateDbankorganisation strukturierte bringen, 243 wenn diese nicht absolut benutzungsfreundlich und zuverl~ssig Ziel integrierter Datenbank- und Informationssysteme bei aller w~nschenswerten soweit wie nur m6glich operationsf~hig Da Auswirkungen muB es vielmehr Bewahrung der Gesamtzusammenh~nge, ihrer speziellen Funktion von St6rungen verbundener einzelner Anwendungsprogramme Systeme unbeeintr~chtigt eingerichtet. entsprechend geeigneter Strukturierungen und sicherheitsm~Big sowohl hinsichtlich VerfUgbarkeitsaspekte Diese hat zur Aufgabe, sowohl die datenbanktechnischen und auf das einzelne Datenbankdesign Empfehlung und nur Sicht beurteilt werden k6nnen, wurde bei HOECHST eine spezielle Koordinationsstelle rend der Planungsphase in als auch im Hinblick auf die Belastung fur das Gesamtsystem und die grunds~tzlichen noch aus ~bergeordneter sein, da~, die Teilsysteme erhalten bleiben. durch Aufbau und Anwendungen von Datenbanken des Systemverhaltens angelegt ist? Das bereits w~h- Gesamtaspekte einzuwirken, der Anwendungsprogramme zu beachten als auch durch zu einem zeitlich gUnstigen Ablauf im Gesamtsystem rechtzeitig beizutragen. DarUber hinaus werden yon diesen Spezialisten wendungsbeispiele fur Datenbankdesign dutch Merkbl~tter und Informationsseminare allgemein g~ltige Normen und An- erarbeitet und in jeweils geeigneter Form den Anwendern von Datenbanksystemen zug~nglich gemacht. HILFSMI~TEL FOR DATENBANKDESIGN UND -VERWALTUNG W~hrend wir uns in der Vergangenheit wendigen Testversuchen sehr tastend und mit teilweise recht auf- an eine endgUltige Struktur einer Datenbank heranbewegt haben, bemUhen w±r uns heute beim Design sowohl mehr um die Anwendung generell gesicherter Erkenntnisse geeigneter Hilfsmittel Einerseits aus der praktischen Erfahrung als auch um den Einsatz fur eine wirksame Datenbankmodellierung. helfen uns zur Erreichung dieses Zieles eine Reihe von Hilfsprogram- men, die rein organisatorisch transparenter die Struktur der Datenbank und deren Inhalt gestalten und als Modellierungshilfe ~nderungen im Design erm6glichen; zur Simulation der Datenbanken unter dem Gesichtspunkt nationsfahigkeit andererseits sehr einfach notwendige Ver- k~nnen zus~tzlich noch Programme eingesetzt werden, die eine geeignete Struktur der Mengenger~ste, der Datenelemente Allerdings m6chte ich einschr~nkend der Zugriffsh~ufigkeit und der Kombi- herausfinden helfen. sagen, dab wit umfassende Simulationen wegen der sehr aufwendigen Vorarbeit f~r die Beschaffung der quantitativen Angaben und 244 wegen des Aufwandes fur die Beschreibungen der vielf~ltigen Zugriffsfunktionen seitens der Benutzerprogramme bisher noch nicht durchgef~hrt haben. In der Zukunft jedoch werden zuverl~ssigere Planungen unter Anwendung verbesserter Simulationsverfahren schon deshalb unerl~Blich werden, well die Datenverarbeitung nicht nut wegen des allgemeinen Kostendrucks, sondern auch wegen der %IberhShten Systembeanspruchung durch spezielle Anwendungsprogramme nicht st~ndig das Gesamt- system erweitern kann. Andere, ganz dringend notwendige Hilfsmittel, sowohl for die Design-Phase als auch fiir die laufende Verwaltung der Datenbanken, program2ne. Ohne derartige, sind geeignete Dokumentations- im englischen Sprachraum mit "Data Dictionary and Directories" bezeichnete Systeme kann man eine effektive Datenbankplanung und einen laufenden Oberblick %iber Struktur, Querverbindungen im Daten- und Benutzer- bereich sowie flber den jeweiligen Status der Datenbank nicht mehr zuverl~ssig erreichen. Umso mehr m~ssen w i r e s als Anwender komplexer Datenbanksysteme bedauern, da~ bis- her vom Anbieter der Datenbanksysteme dieses schwierige Problem der DatenbankDictionary-Systeme so sehr schleppend bearbeitet wurde und die Benutzer meist eigene und wegen des hohen Aufwandes oft unzureichende Tei!16sungen fur ihre Datenbankdokumentation und Administration erarbeiten mugten. In dieser, f~r eine weitere und gesicherte Fortentwicklung von Anwendungen mit Datenbanken so entscheidenden Frage mOssen wir an IBM die dringende Aufforderung richten, die Kunden in ihrer Datenbankverwaltungsarbeit dutch ein umfassenderes und benutzer- freundliches "Data Dictionary System" zu entlasten und zu einer weitgehend maschinellen Dokumentation der eingesetzten Datenbanken beizutragen. Neben diesen Administrationshilfen f~r eine leichtere Gestaltbarkeit und Verwaltung yon Datenbanksystemen ist auch der permanente Einsatz von Hilfsprogrammen zur Beobachtung des arbeitenden Systems und zur Auswertung statistischer Kenngr6gen unerl~Blich, um dadurch sowohl die Arbeitsweise einzelner Programme als auch das gesamte Systemverhalten beurteilen und anpassen zu k6nnen. Hierf~r k6nnen wir aber auf ausreichende Daten aus IMS und SMF zur~ckgreifen und geeignete Monitoren zu deren Auswertung einsetzen. Geringe Belastung dutch einzelne Anwender f~hrt bei IMS wegen der Verzahnung der Abl~ufe innerhalb der online-Kontrollregion zu einem insgesamt besseren Performance-Verhalten. Es ist daher erforderlich, sowohl das Verhalten einzelner Pro- gramme als auch ihr Zusammenspiel miteinander zu ~berpr~fen. Dazu benutzen wir Programme, die auf der Auswertung von Logbandsatzen basieren. Die Tagesstatistik 245 zeigt die Aktivit~ten eines Programmes w~hrend eines ganzen Tages. Daraus kann man erkennen, ob einzelne Programme ten eine ~erdurchschnittliche Die Tagesstatistik Anforderung (z. B. zu Spitzenzeiten), innerhalb eines Progranz~durchlaufs Datenbankzugriffe. zu den verarbeiteten Nachrichhaben. ermittelt keinen Eindruck vom Verhalten eines Programms inner- halb einer gewissen Umgebung Aktivit~ten im Verh~itnis Rate von Datenbankzugriffen yon der Reihenfolge sowie v o n d e r Ffir diese Zwecke gibt es den DC-Monitor, entsprechende Informationen mitschreibt. der auf besondere Systemspezialisten die damit gewonnenen Listen aus und k6nnen den verantwortlichen zu geschickteren Datenbankaufrufen veranlassen bzw. allgemeine Richtlinien heraus- erreichen k~nnen. Es ist bei einem online-System, dungsprogramme miteinander konkurrieren, stellationen oder Ablaufreihenfolgen gleiche Kon- Der EinfluB von kleinen Xn- maBst~ibe sind daher allenfalls Gesamtzahl der Zugriffe, Durchsatzrate von Nachrichten der in dem ca. 40 Anwen- natfirlich kaum m6glich, zu wiederholen. imAblauf derungen kann daher meist nicht in exakten Zahlen ausgedrfickt werden. Rechnerzeit, werten Programmierer geben. Wir haben auf diesem Wege schon wesentliche Verbesserungen Programme der Dauer einzelner Bewertungs- insgesamt verbrauchte zu bestimmten Tageszeiten usw. GEGEBENE UND NOTWENDIGE KOMMUNIKATIONSFORMEN Aus den bisherigen Ausf0hrungen war zu erkennen, dab die entscheidende keit zum Aufbau einer umfassenden Datenbankorganisation ausgel6st wurde. Echtzeitverarbeitung bar hin zum Arbeitsplatz dutch die Echtzeitprojekte mit 0ffnung der Datenverarbeitung des eigentlichen Systembenutzers, unmittel- der in einer interakti- ven Betriebsweise mit dem System und seinen Datenbanken kommunizieren deft aber die Anwendung eines umfassenden Notwendig- Nachrichtensteuerungs- soll, erfor- und Verwaltungs- systems. Die wesentlichen Teilhabersysteme, hier vorgestellten Anwendungen sind von ihrem Typ her sogenannte bei denen in einem vorgegebenen Anwendungssystem einzelnen vom Benutzer ausgel6sten Transaktionen fest zugeordnete Prozeduren aktiviert werden. Ffir diesen Typ der Nachrichtensteuerung bietet IMS mit seinem Datenkommunikationsteil wendige Erg~nzung des Datenbankteils. handenen Sfcherheitseinrichtungen Nachrichten not- Auch die f~r das Gesamtsystem des IMS vor- als auch der datenbankwirksamen Logging sowohl der Aktivit~ten und einem wirksamen best~rken uns zus~tzlich in der Ansicht, dab wir mit IMS im PrinZip das richtige Softwareprodukt systeme zur Verffigung haben. und Programmkontrolle die f~r ein Informationssystem mit einem ~mfangreichen Pr~fpunkt und Wiederanlaufverfahren, aufgrund der f~r unsere Informations- 246 Richtigerweise ist die Syst~msteuerung yon IMS recht umfassend ausgelegt, so dab wir inzwischen unter dessen Kontrollprogramm nicht nur die transaktionsbedingten "Message Control Progran~ne" Datenfernverarbeitungsprogrammes laufen lassen, sondern auch stapelorientierte das GIS und das "Information Retrieval System" STAIRS mit umfangreichen Dokumentationsdatenbanken zur Anwendung bringen. Das erw~hnte GIS hat alierdings zur Zeit noch eine v611ig untergeordnete Bedeutung und soll erst nach Ablauf einer erfolgreichen Erprobungszeit in zuk~nftige Planungen einbezogen werden. Trotzdem k6nnen wir schon aufgrund der ersten Probeanwendungen erkennen, dab es eine interessante Erg~nzung zu den Datenbanksystemen darstelien kann. Ob es allerdings voll geeignet ist, um unmittelbar vom Endbenutzer sporadisch auftretende Anfragen an bestehende Datenbanken schnell formulieren zu lassenr scheint noch ungewiB. Es w~re unseres Erachtens besser, fur die Endbenutzer eine einfachere und in deutscher Sprache formulierbare Abfragesprache zu haben und daf~r GIS fur erfahrene Benutzer noch weiterhin auszubauen, um beispielsweise auch durch Feld- und variable Indizierung noch bequeme Anfragen an IMS-Datenbanken richten zu k6nnen. Dabei w~re es ebenfalls yon Vorteil, wenn aus den vorhandenen IMS-Datenbankbeschreibungen auch automatisch die IMS-Datei- beschreibung erzeugt w~rde, oder aber durch ein 0bergeordnetes Datenbankmanagement IMS und GIS gemeinsam bedient wOrden. Das hier ebenfalls erw~hnte STAIRS wird bei HOECHST als umfassendes Dokumentations- und Information Retrieval System eingesetzt. Unter der Nachrichten- und Programmsteuerung von IMS wird STAIRS bisher fGr umformatierte Datenbanken im Bereich der medizinischen Literaturdokumentation, und zur Patentdokumentation eingesetzt. der Forschungsdokumentation S~mtliche Fragestellungen und Suchvor- g~nge in zur Zeit auf 16 Magnetplattenspeicher IBM 3330-11 gespeicherten Doku- menten erfolgen in Echtzeitverarbeitung ~iber Bildschirmterminals unmittelbar im System-Benutzerdialog. Die bisher erl~uterte umfassende Nutzung des IMS als Gesamtsteuerungssystem bringt allerdings ein ~berproportionales Ansteigen der CPU-Belastung dutch vermehrten systeminternen Verwaltungsaufwand mit sich, so dab wir uns jetzt Gedanken machen, bis zu welchem Zeitpunkt das IMS bei unserer geplanten Vermehrung der Datenstationen - selbst bei einer /370-168 - noch in der Lage sein kann, alle Anforderungen in einem System zu bedienen. Sollte demzufolge die Leistungsf~higkeit und die jetzige Arbeitsweise des IMS seitens IBM nicht entscheidend ge~ndert werden, dann bliebe nur eine verh~itnism~ig unwirtschaftliche Aufteilung der IMS-Anwendungen auf zwei Systeme. Dem 247 sind allerdings sowohl dutch die Datenbankverwaltung AnwendtIngssysteme eindeutige Grenzen gesetzt. scheint uns hingegen in der Realisierung lagerung geeigneter Funktionen des IMS als auch dutch die Ein wesentlich sinnvoller Weg einer Konzeption zu liegen, die eine Aus- in intelligente und mit dedizierten Dateien ausge- stattete Datenstationen erm6glicht. Dadurch kann das Zentralsystem meidbaren Transaktionen entlastet werden als aber auch die Funktionsf~higkeit sowohl yon verdes Systems dutch eine zumindest tempor~r m6gliche unabh~ngige Arbeit an den peripheren Datenstationen verbessert werden. In diese Richtung weisende Konzeptionen wurden ja auch beispielsweise Systemen IBM 3790 oder auch 3770 angek~ndigt. im IMS sowohl hinsichtlich Allerdings seiner Datenbankkonzeption steuerung eine volle Integration der M6glichkeiten eines echten hierarchisch gegliederten erwarten wit dann auch als auch bei der Nachrichten- dieser Terminalcomputer Systemverbundes. W~nschenswert n~mlich ein voller Einbezug der vom Vorrechner gef~hrten Datenbest~nde tenbankverwaltungssystem der Mutter-Datenbank mit den SNA- des IMS, so dab alle entsprechenden im Sinne w~Lre dann in das Da- Dateiver~nderungen in auch sofort f~r die dedizierte Datei mit ausgel6st w~rden und umgekehrt. Wenn wit die £tbliche Definition haber am System voneinander system miteinander verbunden eines Teilhabersystems, bei d e m v e r s c h i e d e n e abh~ngig sind und 0ber ein gemeinsames sind, hinsichtlich nisation in Unserem Unternehmen betrachten, einer umfassenden Datenbankorga- dann sind viele Arbeiten, Job Processing yon den Werken aus im Rahmen des Systemverbundes eher Teilhaber als Teilnehmersysteme. Teil- Informations- die im Remote betrieben werden, Obwohl die DatenObertragung stapelorientiert erfolgt und in der Regel auch v~llig unabh~ngige Programme aufgerufen werden, gibt es doch auch im RJE-Betrieb viele Anwendungen, dur- und Datenbanken die auf gemeinsame zentrale Proze- zugreifen. Diese Arbeiten k6nnen derzeit aber nut im Rahmen der RJE-Prozeduren ASP-System abgewickelt werden, das einerseits ten wird als das iMS, andererseits benutzt, auf einer ganz anderen Anlage gefah- aber auch eine v611ig andere 0bertragungstechnik so dab nicht einmal eine gemeinsame Leitungsbenutzung m6glich ist, obwohl beide Systeme Bestandteile als auch aus organisatorischen gehend eine gemeinsame Leitungssteuerung von IMS und RJE des gleichen Gesamtinformations- systems und der gemeinsamen Datenbankorganisation wirtschaftlichen auf unserem sind. Hier ist es sowohl aus Gr~nden dringend erforderlich, und m6glichst auch Datenbankverwaltung fur IMS und RJE herbeizuf~hren° Ein solches Paket yon Verfahren fur eine einheitliche Daten~bertragungssteuerung um- 248 ist ja inzwischen yon IBM als "System Network Architecture" angek~ndigt. Aller- dings scheint ~ns diese Bezeichnung wenigstens bisher noch ein vielversprechendes Schlagwort des Wortes "Network" zu sein, das vor allem hinsichtlich noch mit sehr viel Inhalt ausgef011t werden muB; denn wir ben6tigen in unserem Unternehmen Rahmen des Systemverbundes Konzept fur die Daten~ber- tragungssteuerung nicht nur das einheitliche und die hierarchisch und untergeordneten Datenstationent geordnete Kommunikation im zwischen Rechner sondern wit erwarten vor allem aus Gr~nden einer erh~hten Sicherheit und Verf~3gbarkeit ein Netzwerksystem zwischen gleichbe- rechtigten Systemen mit gemeinsam benutzbaren Programmbibliotheken und Datenban- ken. Ich hoffe, dab ich trotz des ~berwiegend tische Erfahrungen Referates allgemein gehaltenen und nur auf prak- oder konkrete Planungsans~tze auch den anwesenden Wissenschaftlern Best~tigungen ihrer Auffassungen ken und Datenkon~nunikation bei HOECHST ausgerichteten und Software-Architekten einige oder auch einige Anregungen zum Thema Datenban- geben konnte. Nutzun~ von Datenbanken im nicht-wissenschaftliche ~ Bereich einer Hochschule Eckhard Edelhoff, Universit~t Dortmund Zusammenfassun@ Ziel des Vortrages ist es darzustellen, in welchem Umfang und zu welchem Zweck Datenbanken in Verwaltung und Bibliothek einer Hochschule eingesetzt werden k~nnen. Genauer eingegangen wird auf die datenverarbeitungsrelevanten Fragen im Bereich einer Hochschulbibliothek, insbesondere auf die Auswirkung unterschiedlicher Datenverarbeitungstechniken. Inhaltsverzeichnis I. Der Gesamthochschulbereich Dortmund 2. Projekte ±m Bereich der Bibliotheken und Veraltungen 3. Das Bibliotheksprojekt 3.1 Stand der Automatisierung im Bibliotheksbereich 3.2 Buchlauf in einer konventionellen Bibliothek 3.3 Buchlauf unter Ausnutzung eines On-line-Systems 3.4 Die Bildschirme von DOBIS 3.5 Die Datenbank im Dortmunder Bibliothekssystem 4. Das Verwaltungsprojekt 250 I. DER GES~6THOCHSCHULBEREICH Der G e s a m t h o c h s c h u l b e r e i c h der U n i v e r s i t ~ t Dortmund, der F a c h h o c h s c h u l e Hagen, des G e s e t z g e b e r s Gesamthochschule Das R e c h e n z e n t r u m einer Dortmund, besteht Hochschule der F a c h h o c h s c h u l e Nach der A b s i c h t Hilfe Dortmund aus Dortmundr der P ~ d a g o g i s c h e n einer DORTMUND Ruhr, sollen die g e n a n n t e n integriert an der U n i v e r s i t ~ t IBM/370-158 Dortmund die H o c h s c h u l e n die U n i v e r s i t ~ t Hochschulen zu werden, Bielefeld versorgt seit 1973 m i t des G e s a m t h o c h s c h u l b e r e i c h e s und seit in Ha g e n m i t D a t e n v e r a r b e i t u n g s k a p a z i t ~ t 1975 die F e r n u n i v e r s i t ~ t und den d a z u g e h ~ r i g e n Dienst~ leistungen. Die B i b l i o t h e k e n in eine:n N e u b a u im G e s a m t h o c h s c h u l h e r e i c h - zu einer Einhelt dab ab 1976 das B i b l i o t h e k s s y s t e m thek und ca. Die e i n z e l n e n ab 2976 zusammengefaBt in D o r t m u n d 25 B e r e i c h s b i b l i o t h e k e n Hochschulen werden sein. aus einer - nach Einzug Das bedeutet~ Zentralbiblio- besteht, im G e s a m t h o c h s c h u l b e r e i c h haben eigenstindige Verwaltungen. 2. PROJEKTE In den J a h r e n IM B E R E I C H 1971/72 wurde DER B I B L I O T H ~ K E N e n t s c h i e d e n r die B i b l i o t h e k e n t u n g e n des G e s a m t h o c h s c h u l b e r e i c h e s den zu b e s c h a f f e n d e n UND V E R W A L T U N G E N GroBrechner Dortmund und V e r w a l - in die V e r s o r g u n g einzubeziehen~ HierfNr durch sprachen fol- gende Gr~nde: - Es b e s t a n d keine Aussicht Bibliotheken neben und V e r w a l t u n g e n - Bei e n t s p r e c h e n d e r Leistungsf~higkeit die A n w e n d u n g e n der B i b l i o t h e k e n den A n w e n d u n g e n aus F o r s c h u n g Ausnutzung. einem GroBrechner dedizierten Rechner einen eines G r o B r e c h n e r s und V e r w a l t u n g e n und Lehre zu einer fir zu beschaffen. f~hren zusammen mit ausgewogenen 251 Als K o n s e q u e n z zu d i e s e r Entscheidung - 1972 eine P r o j e k t g r u p p e in den B i b l i o t h e k e n wurde zur'Organisation unter der A r b e i t s a b l ~ u f e BerHcksichtigung der D a t e n v e r a r b e i - t u n ~ g e m e i n s a m v o n den Bibliotheken und d e m R e c h e n z e n t r u m Einbeziehung yon Mitarbeitern - 1974 eine P r o j e k t g r u p p e von den V e r w a l t u n g e n , Mitarbeitern der F i r m a mit e n t s p r e c h e n d e m dem R e c h e n z e n t r u m der F i r m a unter IBM und Auftrag unter gemeinsam Einbeziehung yon IBM und der H o c h s c h u l i n f o r m a t i o n s s y s t e m e GmbH gegr~ndet. 3. DAS BIBLIOTHEKSPROJEKT Die A u f g a b e n s t e l l u n g 'Organisation tigung des E i n s a t z e s der D a t e n v e r a r b e i t u n g ' Die B e a r b e i t u n g bibliothekarischer einer H o c h s c h u l e - - - - - erfolgt ~ber der A r b e i t s a b l ~ u f e Objekte zahlreiche unter B e r ~ c k s i c h - bedarf der Erl~uterung: zu/n Zwecke der N u t z u n g in Stufen Literaturauswahl, Bestandskontrolle, Bestellung, Eingangsbearbeitung, Rechnungsbearbeitung, - Sachkatalogisierung, - alphabetische Katalogisierung, - Einbandbearbeitung, - SchluBkontrolle, - Auskunft, - Ausleihe. F~r eine g e o r d n e t e liothekarischen zahlr e i c h e r Arbeitsplatz Bearbeitung Objekte Kataloge, Register Die Redundanz Ziel der E i n f H h r u n g l i o t h e k e n muB d e s h a l b - die e i n z e l n e n - die m a n u e l l e fl~ssig u.a. Bibliotheken und L i s t e n der jeweils der bibdie F H h r u n g fur S t a t i s t i k e n abgelegten der D a t e n v e r a r b e i t u n g am Daten in die Bib- sein: Arbeitsg~nge F~hrung zu machen, Zusammenhang verringern, Identifikation ist in k o n v e n t i o n e l l e n Karteien, erforderlich. ist erheblich. und e i n d e u t i g e miteinander der v e r s c h i e d e n e n d e r e n Redundanz zu verknHpfen, Auskunftsmittel zu b e s e i t i g e n mit deren Unterbringung stehenden ~ber- und den im Wegeaufwand zu 252 d i e ~o,/nmunikation der B i b l i o t h e k den N u t z e r n zu e r l e i c h t e r n - die Nachhaltung yon Belegen, F~r d i e s e n Aussicht u,~, folgende den Lieferanten zu v e r b e s s e r n der e r f o r d e r l i c h e n Meldungen Zweck w u r d e n bzw, mit und und S t a t i s t i k e n r das D r u c k e n zu a u t o m a t i s i e r e n ~ Datenverarbeitungsm6glichkeiten in gestellt: Magnetplattenspeicher w~hrend zur A u f n a h m e der L i t e r a t u r b e a r b e i t u n g - Sichtger~te anfallenden m i t der M ~ g l i c h k e i t ~ line am A r b e i t s p l a t z jederzeit und R ~ c k g e w i n n u n g Daten, die g e s p e i c h e r t e n verf~gbar aller zu m a c h e n Daten bzw. onzu erg~nzen. Nutzung Bei dieser des M o n i t o r s Entscheidung wurden CICS der Fi r m a folgende IBM. Gesichtspunkte - Zu d e m d u t c h L e h r e und F o r s c h u n g bestimmten zu b e s c h a f f e n d e n GroSrechners der und der V e r w a l t u n g e n Bibliotheken intensive Offnungszeit der B i b l i o t h e k aus d e m B e r e i c h einf~gbare ein/ausgabe~ fallen v e r t e i l t ~ber d e r e n g e s a m t e an. genannten die B i b l i o t h e k mittel sind A n w e n d u n g e n des Komplemente. - Die A u f g a b e n -Die ber~cksichtigt: Aufgabenprofil Ziele der E i n f ~ h r u n g sind nur dann der R e c h n e r der D a t e n v e r a r b e i t u n g erreichbar, am A r b e i t s p l a t z wenn jederzeit in als A u s k u n f t s zur V e r f ~ g u n g steht. 3.1 Stand der A u t o m a t i s i e r u n g Versuche, bibliothekarische verarbeitung Zu n e n n e n Arbeiten mit zu r a t i o n a l i s i e r e n , sind uoa. im B i b l i o t h e k s b e r e i c h haben Hilfe der e l e k t r o n i s c h e n eine ~ber die auf B a t c h - V e r f a h r e n 10-j~hrige sich s t ~ t z e n d e n Daten~ Geschichte, Systeme (Off-line-Systeme) ~ und - der B i b l i o t h e k der U n i v e r s i t ~ t - der B i b l i o t h e k der W a s h i n g t o n University - der B i b l i o t h e k der U n i v e r s i t y of Illinois in neuerer Bochu~, Zeit die auf R e a l z e i t - V e r f a h r e n School of Medicine, sich s t ~ t z e n d e n Systeme 253 und Versuche(~n-line-Systeme): der Bibliothek der Stanford University, des Ohio College Library Centers r der Bibliothek der Universit~t Bielefeld, des IBM Labors in LOS GATOS. Es handelt sich hierbei, abgesehen v o n d e r Datenverarbeitungstechnik, um im Ansatz und Umfang sehr unterschiedliche die Automatisierung Systeme- Systeme, die auf eines Teils der Arbeitsg~nge ausgelegt sind, bzw. Systeme, die alle Arbeitsg~nge umfassen, Das Dortmunder Bibliothekssystem ist in Anlehnung an die mit dem in dem I ~ - L A B O R entwickelten System gemachten Erfahrungen aufgebaut worden. Off-line-Systeme Off-line-Systeme sind unabh~ngig yon der verwandten Technologie nicht in der Lage, die Handhabungen einer konventionellen Bibliothek grunds~tzlich zu ver~ndern bzw. zu erleichtern. Mit dieser Technik k~nnen wesentlich nur jene Arbeitsg~nge rationalisiert werden, deren Ablauf an die burn around time des jeweiligen Rechners angepa~t werden kann: - Die F0hrung der Kataloge, Karteien und Register kann bez~gl. der jeweils notwendigen Ver~nderungen erleichtert werden. Am Arbeitsplatz sind sie fur den Bibliothekar nach wie vor er- forderlich. - Die Wiederverwendung m~glich, einmal erhobener Daten ist grunds~tzlich jedoch in der Regel an die Ausnutzung externer Daten- tr~ger, wie Lochstreifen, Lochkarten u~a. gebunden. Korrekturen, Erg~nzungen und umfangreiche Kategorienschemate fOhren zu Um- st~nd!ichkeiten und Schwierigkeiten und doppelten Arbeitsvorg~ngen. - Es gibt keine ~ber die konventionelle Handhabung hinausreichende M~glichkeit, Informationen ~ber den Stand der jeweiligen Bearbeitung eines Objekts verf~gbar zu halten. 254 On-line-Systeme On-line-Systeme sind in der Lage, die komplizierten konventionellen Handhabungen des Buchlaufes abzubauen: Der Bibliothekar ben~tigt an seinem Arbeitsplatz physisch keine Karteien, Register und Kataloge. Die Wiederverwendbarkeit yon einmal erfaBten Daten ist sichergestellt, ebenso deren ~nderungen und Erg~nzungen. Es gibt eine einfache M~glichkeit der Information ~ber den Stand der Bearbeitung eines bibliothekarischen Objektes. Informations- l~cken sind nicht vorhandeno Beide Systeme nutzen die Technologie des jeweils zur VerfHgung stehenden Rechners in v~llig unterschiedlicher Weise aus° W[hrend On-line~Systeme wegen ihres g r ~ e r e n Softwarekomforts einen GroBrechner als Tr~ger be- n~tigen, k~nnen Off-line-Systeme prinzipiell auch auf f0r diesen Zweck spezialisierten Rechnern der mittleren Datentechnik gefahren werden. 3°2 Buchlauf in einer konventionel!en Bibliothek Es sollen bier als Grundlage for die nachfolgenden Abschnitte~ die w~hrend eines Buchlaufes erforderlichen Arbeitsvorg~nge in einer konventionellen Bibliothek dargestellt werden. Es handelt sich hierbei jedoch notwendig um eine unvollst~ndige, auf Monographien beschr~nkte Schilderung~ o Buchauswahl - Die Literaturauswahl erfolgt dutch die Fachreferenten anhand von Bibliographien, Prospekten, w~chentlichen Verzeichnissen der Deutschen Bibliothek u.~. unter Ber~cksichtigung der Benutzerw~nsche und der Ausleihstatistik~ - Neben den jeweils yon den Fachreferenten vorzugebenden bestelltechnischen Daten~ wie Bestellart~ Anzahl der Exemplare, Standort, Fachgrupper gehen im Regelfalle die oben genannten gekennzeichneten Unterlagen an die Abteilung Erwerbung. 255 o Buchbestellung - In der A b t e i l u n g E r w e r b u n g w e r d e n die A n s c h a f f u n g s v o r s c h l ~ g e der R e f e r e n t e n geprHft. Hierbei sind unter U m s t ~ n d e n b i b l i o g r a p h i s c h e R e c h e r c h e n d u r c h z u f ~ h r e n und a n s c h i i e B e n d ist am a l p h a b e t i s c h e n Katalog, an der I n t e r i m s k a r t e i und an der B e s t e l l k a r t e i eine Be- s t a n d s k o n t r o l l e durchzufHhren. - E n t s p r e c h e n d dem Ergebnis der B e s t a n d s k o n t r o l l e wird die B e s t e l l u n c durchgefHhrt. F o l g e n d e Daten sind u.a. Verfasser erforderlich: (bei V e r f a s s e r s c h r i f t e n ) Titel Verlag Ort, E r s c h e i n u n g s j a h r Auflage Bestellart S e r i e n t i t e l ~ ( b e i Serien) Bandangabe Haushaltstitel Lieferant Quelle - K o p i e n der B e s t e l l u n g w e r d e n in die B e s t e l l k a r t e i und in die Buchh~ndlerkartei eingelegt. O Bucheingang - In der A b t e i l u n g E r w e r b u n g wird das Buch auf B e s c h ~ d i g u n g oder Fehldruck hin HberprHft. Ein L a u f z e t t e l zum Zweck der K o n t r o l l e des Ge- s c h ~ f t s g a n g e s und der E i n t r a g u n g yon Daten, die w ~ h r e n d des Ges c h ~ f t s g a n g e s anfallen, wie z.B. Signatur und S a c h k a t a l o g i s i e r u n g s - daten, wird beigegeben. - Es wird eine B e s t a n d s k o n t r o l l e f~r u n v e r l a n g t e zur A n s i c h t - S e n d u n g e n durchgefHhrt. U n v e r l a n g t e Sendungen und zur A n s i c h t - B e s t e l l u n g e n w e r d e n den F a c h r e f e r e n t e n zur K a u f e n t s c h e i d u n g vorgelegt. - FHr alle B~cher mit p o s i t i v e r K a u f e n t s c h e i d u n g werden die Buchh~ndlerkartei, die B e s t e l l k a r t e i und die I n t e r i m s k a r t e i auf den neu- esten Stand gebracht. - Die BHcher werden inventarisiert. Die R e f e r e n t e n e r h a l t e n die BHcher zur w e i t e r e n Bearbeitung. 256 O Sachkatalogisierung - Die R e f e r e n t e n k l a s s i f i z i e r e n die B[cher n a c h ihrem inhalt und legen den Inhalt der N e u e r w e r b u n g s l i s t e - fest, Die BUcher w e r d e n zur Titelaufnah/ne fur die a l p h a b e t i s c h e n K a t a l o g e weitergeleitet, o Titelaufnahme - E n t s p r e c h e n d den Regeln fdr die a l p h a b e t i s c h e K a t a l o g i s i e r u n g w e r ~ den die zur I d e n t i f i z i e r u n g der BUcher r e l e v a n t e n Daten erfaBt. Es h a n d e l t sich h i e r h e i um eine erneute A u f n a h m e jener Daten~ bereits w [ h r e n d des B e s t e l l v o r g a n g s die z,T, a n g e f a l l e n sind, - Die Zettel fur die a l p h a b e t i s c h e n K a t a l o g e und den S a c h k a t a l o g w e r d e n e r s t e l l t und in den z e n t r a l e n a l p h a h e t i s c h e n K a t a l o g t den z e n t r a i e n S a c h k a t a l o g und den S t a n d o r t k a t a l o g eingefUgto Bei B U c h e r n m i t S o n d e r s t a n d o r t sind z u s ~ t z l i c h zettel f~r die a ! p h a b e t i s c h e n K a t a l o g e der Abtei!ungen~ den L e s e s a a l k a t a ! o g , den K a t a l o g der L e h r b u c h s a m m l u n g und den K a t a l o g der H a n d a p p a r a t e erforderlich, S o n d e r s t a n d o r t e v e r f U g e n im w e s e n t l i c h e n a u s s c h l i e ~ l i c h 0ber einen a k t u e l l e n Buchbestand, A u s l a g e r u n g e n yon B U c h e r n in die Z e n t r a l b i b - liothek sind in u m f a n g r e i c h e m M a B e erforderlich. Bei ~ n d e r u n g e n des S t a n d o r t e s eines Buches sind die e n t s p r e c h e n d e n K o r r e k t u r e n in allen K a t a l o g e n vorzunehmen. - Die Bdcher w e r d e n an die E i n b a n d s t e l l e w e i t e r g e g e b e n . o Einbandbearbeitung - Soweit HierfHr notwendig, w e r d e n die BHcher zum B u c h b i n d e r w e i t e r g e l e i t e t , ist eine B u c h b i n d e r k a r t e i erforderlich. - Die B~cher w e r d e n nach D u r c h f ~ h r u n g der B u c h b i n d e r a r b e i t e n an die Beschriftungsstelle weitergeleiteto 257 o Beschriftung Es werden die Signaturen und die R~ckentitel aufcebracht und die - Stempelungen gemacht. - Die B~cher werden der SchluBkontrolle zugefHhrt, o SchluBkontrolle - Die SchluBkontrolle ist eine formale Kontrolle z~m Zweck der Uber- pr~fung des Gesch~ftsganges anhand des Laufzettels und zur Uber- prHfung Yon Buchdaten und Daten auf den Katalogzetteln, o Benutzung - Die B~cher werden in den Standorten aufgestellt, - F~r die Zwecke der Ausleihe werden Benutzerkartei und ein Couponregister gef~hrt, 3,3 Buchlauf unter Ausnutzun9 eines 0n~line-Systems , , , ~ in d±esem Ahschnitt sollen - heschr~nkt auf Monographien ~ m~gliche Ver~nderungen in der Handhahung des Buchlaufes unter Einsatz eines On-lineSystems dargestellt werden. Hier, ebenso wie im vorangegangenen Abschnitt, bleibt die Darstellung wegen des m~glichen Detailierungsgrades st~ndig. unvoll- Wesentliche Merkmale organisatorischer Ver~nderungen sind; - Einmal erfaBte Daten k~nnen aufbereitet jederzeit zurHckgewonnen werden. - Das System h~it jederzeit Informationen Hber den Bearbeitungsstand aller erfaBten Objekte bereit, - Merkmalsgebundene ~berwachungen und uberprHfungen k ~ n n e n a u t o m a t i s c h durchgefHhrt bzw. unterstHtzt werden. Die Auswirkungen dieser Ver~nderungen auf den Buchlauf sind nach~olgend an einigen Beispielen dargestellt: - Daten, die in der Erwerbungsabteilung fur zu bestellende bibliothe- karische Objekte ermittelt werden, kSnnen fur die aiphabetische Katalogisierung genutzt werden. Hierdurch ergibt sich eine Verlagerung der Katalogisierungsarbeiten in die Erwerbungsabteilung, Dies wird zus~tzlich beg~nstigt durch die M0glichkeit, fur die 258 Erwerbung und die Magnetb~nder - Nicht der D e u t s c h e n gleichzeitig verschiedenen regelm~Sig !agen. Daten den. Kata!oglsierung Anforderungen Bereichsbibliotheken zur m e h r f a c h e n Mit Hilfe eines Konventionell Die B e a r b e i t u n g rung v o r g e n o ~ m e n Eine Titels die einmal GrOnden vonder Bearbeitung Katalogisie- ist ~berfl~ssig. unterst~tzt u,~. k~nnen Auf den B u c h l a u f sichtigter und Bestellungen, individuell durchgef~hrt bezogenen Katalogdaten~bernahme den v o r l i e g e n - (yon u,a, aufg~mndbeab- von F r e m d b ~ n d e r n automatisch nicht d u r c h g e f ~ h r - erste!it werden, von Katalogisierungsarbeiten Erwerbungsabteilung f~hrt die v o l l k o m m e n e Buchlaufes schnelleren zu einer Buchbinderauftr~ge entsprechend Erinnerungslisten mit der V e r l a g e r u n g durch werdeno ter K a t a l o g i s i e r u n g s a r b e i t e n ) k @ n n e n Zus a ~ n e n Der V o r - werden von F r e m d l e i s t u n g e n . vorgenommener automatisch den G e g e b e n h e i t e n wer- d u r c h die kann w i e d e r u m bzglo erhobenen kann nach von D u b l e t t e n Ausnutzung System n i c h t m~glich. zur A n s i c h t - S e n d u n g e n zus~tzliche f~r die wiederverwendet zur V e r m e i d u n g - Die M a h n u n g e n die von B e s t e l l u n t e r - gang der K a t a l o g i s i e r u n g die a u t o m a t i s i e r t e - k~nnen Aufwand d u r c h den F a c h r e f e r e n t e n werden. z~B, im k o n v e n t i o n e l l e n aus t e c h n i s c h e n von u n v e r l a n g t e n eines Ausfertigung zus~tzlichen ist dies der K a u f e n t s c h e i d u n g f[hren manuellen On-line-Systems ohne wie B i b l i o t h e k r auszunutzen. eintreffende in der Regel Erwerbung Fremdleistungen~ in die Durchsichtigkeit Verf~gbarkeit des des b i b l i o t h e k a r i - schen Objekteso - Im B e r e i c h der A u s k u n f t k~nnen Aussagen die V e r f ~ g b a r k e i t bibliothekarischer jeweils Standes aktuellen - In der A u s l e i h e erhebungen Standes loka! vorgenommen sind R e s e r v i e r u n g e n ~ und n i c h t - l o k a l vornehmbar° ~ber das V o r h a n d e n s e i n Objekte unter E i n s c h l u S und des werden. Sperrungenr aufgrund des Mahnungen, jeweils Geb~hren- aktuellen 259 3.4 Die Bildschirme yon DOBIS Das Dortmunder Bibliothekssystem (DOBIS) ist mit dem Ziel entwickelt worden, den Rechner als Speicher fur alle in der Bibliothek w~hrend des Buchlaufes anfallende und fur diesen verwendbare Daten einzusetzen. Zum Absetzen und zur Wiedergewinnung von Daten stehen Sichtger~te zur VerfHgung. Es sind drei Arten yon Bearbeitungsvorg~ngen zu unterscheiden: - Vorg~nge ohne Nutzung der M~glichkeiten der Datenverarbeitung. Hier sind u.a. bibliographische Recherchen zu nennen. - Vorg~nge unter Ausnutzung der On-line-Funktionen des Systems. Das sind: Registersuche Bestellung Zugang Zeitschriftenbearbeitung Katalogisierung Rechnungsbearbeitung Einbandbearbeitung Ausleihe Fernleihe - Vorg~nge unter Ausnutzung der Off-line-Funktionen des Systems. Diese sind Drucken u.a. von: Bestellungen Katalogen Mahnungen Uberwachungslisten und Statistiken Die Verf~gbarkeit von Datenstationen ersetzt dem Bibliothekar am Arbeitsplatz die bisher benutzten manuell erstellten Kataloge, Register und Karteien. Die in Bildschirmdialoge umgesetzten Arbeitsabl~ufe erfordern allerdings eine groBe Anzahl verschiedenartiger Bildschirmanzeigen. Schwierigkeiten, die sich hieraus ergeben, werden dadurch vermieden, dab alle Anzeigen einem einheit!ichen Aufbau folgen: 260 Daten desselben dem Schirm. Typs Dieser erscheinen stets ist in drei Teile an der g l e i c h e n Stelle unterschiedlicher auf Funktion gegliedert: Kopfteil Raum f~r w e c h s e l n d e Informationen Anweisungsteil Der K o p f t e i l umfaBt deren gerade ablaufende gezeigten Angaben ~ber die a n g e s p r o c h e n e n Unterfunktion f~r die w e c h s e l n d e n weiligen Arbeitsschritt fiche Angaben, Lieferanten z.B. oder Daten zurn~chsten Wie bereits eines enthilt Aktion der an- weiteren mehrfach betont, m~ssen in welcher z.B. Daten dab alle Angaben, Objektes zur E i n g a b e nur einmal bestimmter die eingegeben zur Verf~gung° sind, werden. Dieses zur W i e d e r a u f f i n d u n g im System w e s e n t l i c h ISSN, wird der Bear- Bildschirmausgabe. sie an allen A r b e i t s p l i t z e n zur Folge, ~ber einen W e i s e der Be- In einer T e x t z e i l e aufgefordert~ dem je- u n t e r s c h i e d -~ usw, dar~ber, kann. einer !iothekarischen entsprechend Funktion Registers r Informationen Dokument Hinweise fortsetzen enth~it der a b l a u f e n d e n zum A u f r u f Von da an s t e h e n zip hat Tei!e den D i a l o g oder Informationen innerhalb ein besti_ntmtes Der A n w e i s u n g s t e i l beiter und eine C h a r a k t e r i s i e r u n g Bildschirmmaske Der Raum arbeiter DOBIS~Funktionen~ Prin- eines bib- wie: ISBN Titel Personen, K~rperschaften Signaturen Nummern, Abk~rzungen Sachkatalogdaten sorgf~itig Speicherung auf Fehler gleicher dberprHft Daten de B e a r b e i t u n g s f e h l e r Angabe, die nannten gezwungen~ automatisch die E i n g a b e zur E i n g a b e eines zahlreichen oder Formen e n t we d e r und daraus zu verwenden. bei sind die oben folgenjeder ge- in das b e t r e f f e n -~ dadurchr benutzt, eine dab das S y s t e m oder der B e a r b e i t e r wird. Angaben zu erleichtern, werden wird d e s h a l b - in diese aufgefordert vom Bibliothekar Versionen zur E i n s i c h t n a h m e zur R e g i s t e r s u c h e bibliothekarischen standardisierte und k ~ n n e n Registers - f~hrt~ Suchwortes Hierdurch Der B e a r b e i t e r Das g e s c h i e h t mit den F o r m v o r s c h r i f t e n chert eines gespeichert de R e g i s t e r Bei in v e r s c h i e d e n e n vermieden. zur ~ n d e r u n g Daten w e r d e n mHsseno sind b e s t i m m t e Abk~rzungen Um dem Bearbeiter sind diese b e n u t z t werden. den U m g a n g im S y s t e m gespei- 261 3.5 Die Datenbank im Dortmunder Bibliothekssystem Unter den bereits im Kapitel 3 'Bibliotheksprojekt' - - - Ersetzen aller Kata!oge, Wiederverwendung genannten Pr~missen Karteien usw. einmal erhobener Daten Durchsichtigkeit des Buchlaufes ist es erforderlich, den gesamten Buchbestand d.h. den Inhalt s~mt- licher konventioneller Informations- und Dokumentationsmittel on-line verf~gbar zu machen. integriert Bei einem Buchbestand yon I Mio. B~nde bedeutet dies, dab Sekund~rspeicher in der Gr~Benordnung yon 6-8 x Io 8 Bytes erforderlich wird. Dem hohen Speicherbedarf auf der einen Seite stehen sehr komplexe, durch vielf~itige Regeln und Zw~nge festgelegte Datenstrukturen und Abfrageerfordernisse gegen~ber. Da zwischen den Bibliotheken ein enger Daten- austausch national und international stattfindet, kann hiervon nicht ohne eigenen Schaden abgewichen werden. Die Datenbank im Dortmunder Bibliothekssystem kennt die nachfolgend nach ihren Inhalten unterschiedenen Dateien: - Hauptdateien zur Aufnahme von u.a.: bibliographischen Informationen Bestellinformationen Ausleihinformationen Rechnungsinformationen Druck-und Terminwarteschlangen. Die bibliographischen speichert. Informationen sind in zwei Hauptdateien ge- Die logischen SMtze in diesen Dateien werden fortlau- fend numeriert. Die einander entsprechenden S~tze sind miteinander verkn~pft. Jeder physischen bibliographischen Einheit entspricht mindestens je ein logischer Satz in beiden Dateien. Entsprechend der Komplexit~t der bibliographischen Gebi!de sind auch die logischen S~tze innerhalb einer Datei miteinander verbunden. zu unterscheiden: Monographien Monographien mit beigef~gten Werken Mehrb~ndige Werke mit und ohne eigenen StHcktitel Schriftenreihen Zeitschriften Es sind 262 Monographien stellen den Normalfall beiden H a u p t d a t e i e n ren S~tzen je ein logischer zugeordnet werden. heit m e h r e r e logische S~tzen erforder!ich. plexe Struktur dar, Jedem Exemplar kann in den Satz ohne V e r k n ~ p f u n g e n S~tze bzw~ V e r k n ~ p f u n g e n Im N a c h f o l g e n d e n zu anderen ist ein Beispiel aus dem Bereich der Schriftenreihen < abgebildet: S chriftenreihe (Serie) ~ St~ickt.I I St[ckt. 2 logischen fdr eine kom- untergeordnete Schriftenreihe mit selbst~nd±ge~Titel Sticktitel zu ande- In allen anderen F~llen sind je physischeEin- Unterreihe B (unselbst~ndig) St~ckt.2 0 St~ckt. 3 d Tell I (unselbst~ndig) l Teil 2 (StUckt.) mit beigefi%~tem Werk l 0t Unterreihe A (unselbstZmdig) Chemi e % Band I Band 2 Band 3 Band1 Anm. : Gek0rztes Beispiel aus der unter Ziffer 12 zitierter~ Schrift. Band2 263 Zugriffsregister - zur Aufnahme von u.a.: Personen/Kooperationen Titeln Schlagw~rtern Signaturen Verlagen ISBN/ISSN Benutzername Lieferanten Die Zugriffsregister dienen auf der einen Seite dem Wiederauffinden der bibliographischen Einheiten. Sie bieten aus dieser Sicht wesentlich mehr M~glichkeiten als die konventionellen formationsmittel lichkeit, In- aufgrund der gro8en Vielfalt und z.B. der M~g- das Titelregister um permutierte Titel mit entsprechen- den Verweisungen zu erweitern. Andererseits dienen die Zugriffs- register zur Datenaufnahme. Z.B. sind in den ersten beiden Regi- stern neben der Ansetzungsform (beschreibend) gespeichert. hin als Hilfsmittel, (ordnend) auch die Vorlageform Beide Register zusammen dienen weiter- aufgrund deren Sortierung der alphabetische Kataiog erstellt werden kann. Alle Hauptdateien und Zugriffsregister besitzen einen VSAM-~hnlichen Index, der mehrere Indexstufen umfassen kann. Auf diese Weise ist ein schneller Zugriff auf den jeweils ben~tigten physischen, bzw. variabel langen logischen Satz gew~hrleistet. - Dateien zur Aufnahme von Code-Tabellen. Diese Tabel!en dienen u.a. der Platzersparnis in den Hauptdateien. 264 4. Das Vel-waltungsproj ek~ Wie bereits eingangs gesagt~ trag der wurde 1974 eine P r o j e k t g r u p p e m i t dem Auf- ' O r g a n ± s a t i o n der A r h e i t s a b l ~ u f e in den V e r w a l t u n g e n unter B e r H c k s i c h t i g u n g der D a t e n v e r a r b e i t u n g ' g e g r ~ n d e t . Zu dem B i b l i o t h e k s - projekt bestehen folgende Unterschiede: - W e g e n des A r b e i t s u m f a n g e s k~nnen die v e r s c h i e d e n e n F u n k t i o n e n der H o c h s c h u l v e r w a l t u n g e n nicht g l e i c h z e i t i g o r g a n l s i e r t werden. Ihr s c h w [ c h e r e r innerer Zusammen- hang m a c h t d a r ~ b e r hinaus eine n a c h t r [ g l i c h e Integration, soweit ein e n t s p r e c h e n d e r Rab~en g e s c h a f f e n w u r d e ~ m~glich. -Aufgrund der H o c h s c h u l s i t u a t i o n des Systems, sollen nach F e r t i g s t e l l u n g bzw, d e s s e n T e i l e , d i e e i n z e l n e n V e r w a l t u n g e n u n a b h ~ n g i g v o n e i n a n d e r a r b e i t e n k~nnen. - Die O r g a n i s a t i o n s a r b e i t wird e r s c h w e r t d u r c h die u n t e r s c h i e d l i c h e n H a n d h a b u n g e n von V e r w a l t u n g s v o r g ~ n g ~ i n den e i n z e l n e n Hochschulen. Als erste F u n k t i o n w u r d e der P e r s o n a l - und der S t u d e n t e n b e r e i c h in Angriff genommen, Hierbei w u r d e n die yon der F i r m a H o c h s c h u l i n f o r m a t i o n s - systeme G m b H und den S t a t i s t i s c h e n A m t e r n e n t w i c k e l t e n N o r m u n g e n von B e g r i f f e n und S c h l ~ s s e l n b e r ~ c k s i c h t i g t . F o l g e n d e M ~ g l i c h k e i t e n der D a t e n v e r a r b e i t u n g stehen der P r o j e k t g r u p p e zur Verf~gung: - Die D a t e n b a n k IMS der Firma IBM. - Plattenspeicher in dem b e n 6 t i g t ~ n Umfang. - Der Einsatz des R e c h n e r s ist in den e i n z e l n e n F u n k t i o n e n so zu planen~ dab die A n w e n d u n g e n v o r w i e g e n d in der zweiten und d r i t t e n Schicht g e f a h r e n w e r d e n k~nnen, 265 Literaturanglaben Elektronische Datenverarbeitung in der Universit~tsbibliothek Bochum. Hrsg. von G~nther Pflug u. Bernhard Adams. Bochum 1968. Alexander, R.W.: Library Management System (LMS): Descriptive specifications for an on-line, real-time integrated system. Los Gates, Cal.: IBM o.J. Experimental Library Management System (ELMS): Librarian's User Manual. Los Gates, Cal.: IBM 1972. Datenerfassung und Datenverarbeitung in der Universit~tsbibliothek Bielefeld. Hrsg. yon Elke BonneB u. Harro Heim. Pullach bei MHnchen 1972. (Bibliotheksstudien. Bd. IA.) Bibliographic Automation of large library operations using a time-sharing system (BALLOTS): Phase I. Final Report. Stanford, Cal. 1971. Bibliographic Automation of large library operations using a time-sharing system (BALLOTS): Phase 2, part I. Final ~eport. Stanford, Cal. 1972. First annual report of the BALLOTS project to the National Endowment for the Humanities. Stanford, Cal. (1973). The Shared Cataloging System of the Ohio College Library Center. Frederick G. Kilgeur, Philip L. Long u.a. in: Journal of Library Automation. Vol. 5, No. 3, 1972. An automated on-line circulation system. Hoadley and A. Robert Thorson. The Ohio Libraries 1973. (Proceedings and Papers Held at The Ohio State University Sept. Ed.: Irene Braden State University of an Institute 13-14, 1971.) IO Ohio College Library Center. Annual Report 1973/1974. 11 Bibliotheksautomatisierung in den USA und in Kanada. Hrsg. yon Walter Lingenberg. Pullach bei M~nchen 1973. (Bibliothekspraxis. Bd. 10.) 12 Deutsche Forschungsgemeinschaft. Bibliotheksausschu8. Maschinelles Austauschformat fHr Bibliotheken (MAB I). Berlin 1974. 13 Empfehlungen f0r den Einsatz der Datenverarbeitung in den Hochschulbibliotheken des Landes Nordrhein-Westfalen. Hrsg. v o n d e r Planungsgruppe Bibliothekswesen im Hochschulbereich NRW. D0sseldorf 1974. 14 Jedwabski, Barbara: DOBIS - ein integriertes On-lineBibliothekssystem. in: 10 Jahre Universit~tsbibliothek Dortmund. Zum 1.6.1975 hrsg. von Valentin Wehefritz. Dortmund 1975. 15 DOBIS. Anwendungsbeschreibung. (GAP. Application Guide.) IBM 1975. (Erscheint Ende 1975) 16 DOBIS. Systembeschreibung. (Erscheint Ende 1975) (GAP. Systems Guide.) IBM 1975. Einsatz eines Datenbanksystems Roll HeitmHller, ]. 62 Wiesbaden, DIE HESSISCHE Die Hessische beim Hessischen Am Hochfeld Landeskriminalamt 12 POLIZEI Polizei ist seit I. Januar d.h. alle polizeilichen Einrichtungen 1974 eine staatliche Polizei, werden vom Land Hessen unterhal- ten. Bis zum 31o Dezember 1973 gab es neben der staatlichen Polizeidienststellen in bestimmten Im Bereich der Verbrechensbek~mpfung den besonderen wendigen gesetzlichen Informationen allen interessierten fNr bestimmte k~mpfung und Aufkl~rung ermittlung). trag, 2. RSCKBLICK Die Einf@hrung hat das Hessische der Hessischen der EDV bei der Hessischen Bundesverwaltung 0berlegungen (so z.B. zur Be- Landeskriminalamt Polizei und zur Brandursachenzentral den Auf- zu betreiben. gesehen werden. Polizei Bereits automatisiert wie polizeiliche EDV verarbeiten stand die Forderung, werden. und bei der 1964 und, wenn auch sehr vage, angestellt, sich mit dem modernen Arbeitsmittel k6nne. An Ende der 0berlegungen kann nicht losgel~st in anderen Bundesl~ndern schon vorher wurden erste 0berlegungen T~terermittlung Neben dieser auch Exekutivaufgaben Zentralstellen not- Erkenntnisse IN DIE ENTWICKLUNG von den entsprechenden formation Stel!en mitzuteilen. Landeskriminalamt Spezialaufgaben die Datenverarbeitung Landeskriminalamt und gewonnene von Wirtschaftskriminalit~t, Dar~berhinaus kommunale die zur Verbrechensbek~mpfung auszuwerten und berechtigten Funktion hat das Hessische unterh~it hat das Hessische Auftrag~ zu sammeln, Polizei St~dten. In- lassen zun~chst m~sse die Diese Forderung wurde in den folgen- 267 den Jahren geradezu zur Voraussetzung zur Einf~hrung der EDV bei der Polizei in der BRD hochstilisiert. Eine Analyse polizeilicher Arbeit war bisher noch nicht erfolgt; an ihre Stelle traten eben Forderungen, Einzelbereiche zu automatisieren. Eine auf Bundesebene eingesetzte Arbeitsgruppe erarbeitete erste Ans~tze einer Analyse. Der Wert einzelner Informationsbereiche, zum Teil auch einzelner Begriffe, wurde diskutiert. F~r die Hessische Polizei fielen in diesen Zeitraum die ersten Gesprgche mit Herstellern yon EDV-Anlagen und wissenschaftlichen Institutionen. Das Ergebnis war eher entmutigend. Nach der groben Beschreibung der zu l~senden Aufgabe erkl~rten alle Befragten, eine allgemeingfiltige, fertige DV- L~sung zur Realisierung polizeilicher Aufgaben sei nicht vorhanden und lasse sich auch nicht ohne weiteres aus vorhandenen Verfahren entwickeln. Im Vordergrund der 0berlegungen standen damals bereits die Probleme der Speicherung yon Massendaten im direkten Zugriff, der schnellen Wiedergewinnung gespeicherter Daten, der schnellen Aktualisierung der Information und nicht zuletzt das Problem der hohen Verffigbarkeitsforderung der Polizei an die DV-Einrichtungen. Als Ergebnis dieses Zeitabschnittes bleibt festzuhalte~, dag als Voraus- setzung zur Einf~hrung der EDV bei der Polizei zun~chst eine intensive Analyse der polizeilichen Arbeitsabl~ufe erforderlich war und, auf dieser Analyse basierend, ein polizeiliches DV-System zu entwickeln sein wfirde. Dieser Erkenntnis wurde auf verschiedenen Ebenen Rechnung getragen: Beim Bundeskriminalamt wurde zum 1.1.1968 eine "Arbeitsgruppe EDV" eingerichtet, zu der alle Bundesl~nder einen geeigneten Mitarbeiter entsenden sollten. Aufgabe dieser Arbeitsgruppe sollte es sein, ein einheitliches polizeiliches DV-System fur die gesamte Polizei der BRD zu entwickeln. Die Arbeitsgruppe sollte alle bisher in der Bundesrepublik angestellten @berlegungen zur polizeilichen Datenverarbeitung sammeln und auswerten~ um auf dieser Basis ein einheitliches System zu entwickeln. Die Arbeitsgruppe EDV war bis Anfang 1970 in wechselnder personeller Zusammensetzung tgtig. Als Ergebnis ihrer Arbeit bleibt festzuhalten: Die Arbeitsgruppe war mit zu geringer Kompetenz ausgestattet, um den Auftrag auch nut ann~hernd zu erf~llen. - Das geforderte ¥erfahren zur automatischen T~terermittlung war in dem gegebenen personellen und zeitlichen Rahmen nicht zu erstellen. 268 Sinnvoller, Grenzen weil einfacher zu 16sen, ein System, schnell und umfassend Die dazu erforderlichen In dieses einer Druckerei Karteikarten karteien hergestellt auf stellen konnte. und -mengen wurden in syste- System sollte auch ein Teil die sog. Bfirofahndung, es sich um ein Verfahren, zun~chst Information zur Verffgung Informationsbereiche Form zusammengestellt. der Personenfahndung, handelt der Arbeitsgruppe das die zum bekannten T~ter vorhandene Anfrage m6g!ichst matischer und in zeitlich und sachlich fberschaubaren schien nach Auffassung einbezogen werden. Hierbei so aufzubereiten, dag in Suchantrgge zur tgglichen Aktualisierung werden und ein monatlicher von Fahndungs- Fahndungsbuchdruck er- EDV" durchgeffhrte Er- folgen konnte. Die parallel zu den Arbeiten hebung und Analyse brachte interessante das Karteisystem Ergebnisse. Obwohl de die Informationswiedergewinnung immer wieder behauptet wurde, polizeiliche dag von ca. 60 Karteien "Personalien" auskamen. in der Hauptsache stimmte mit den Oberlegungen BKA im wesentlichen le- Demnach wur- fiber die Persona- nicht abet fiber andere Beschreibungsmerkmale Diese Erkenntnis Ffr den Aufbau Landeskriminalamt sei die wichtigste stellte sich heraus, 20 ohne das Ordnungsmerkmal lien der Tgter, im Hessischen zur T~terermittlung Informationssammlung, diglich der "Arbeitsgruppe des IST-Zustandes betrieben. der Arbeitsgruppe EDV beim fberein. eines Informationssystems sich nun grunds~tzlich neue Denkansgtze, der Hessischen Polizei ergaben die sich wie folgt zusammen- fassen lassen: - Mittelpunkt bek~mpfung folgte, a!ler polizeilichen T~tigkeit im Bereich der Verbrechens- ist der Fall als Anlag zum r~tigwerden dag ein System ohne Berfcksichtigung nur ifickenhaft~ sondern auch falsch aufgebaut - Da die Personalien des T~ters tions-Wiedergewinnung spielen, fiberhaupt. eine wesentliche nicht sein wfrde. Rolle bei der Informa- sollten sie an hervorragender aber - und das war neu - nut einmal Daraus von Fallinformation Stelle, ffr eine Person in dem System Platz finden. Das Verh~itnis sein. Fall / Person sollte dutch Verknfpfungen darstellbar 269 - Das System sollte die M6glichkeit bieten, zu jedem beliebigen Zeitpunkt Erweiterungen sachlicher Art anzubringen. Die genauere Betrachtung der vorstehenden Forderungen zeigte, da~ sich das gesamte Problem nicht in einem Entwicklungsgang l~sen lassen w@rde. Die Auswirkungen auf die polizeiliche T~tigkeit w~ren wahrscheinlich so schwer geworden, da~ es zumindest fraglich schien, ob die Arbeit nicht gel~hmt worden w~re. Dies war Grund genug, das Hessische Polizeiinformations-System (HEPOLIS) stufenweise aufzubauen und in der ersten Stufe nur das zu realisieren, was am wenigsten einschneidende Folgen f~r die polizeiliche T~tigkeit haben w~rde. Zur Aktualisierung der gespeicherten Information wurden verschiedene M6glichkeiten in Betracht gezogen. Die beste L6sung schien, Datenaufbereitung und Datenerfassung zu dezentralisieren. Untersuchungen der Leistungsf~higkeit eines zun~chst grob geplanten Daten~bertragungsnetzes ergaben, dag die Kapazit~t neben dem Auskunftsdienst auch die dezentrale Aktualisierung des Bestandes zulassen w~rde. Die grunds~tzlichen Anforderungen an das aufzubauende System waren zusammengefagt: - Im Mittelpunkt der ersten Ausbaustufe soll die Personenauskunft stehen; - Die Informations~bermittlung soll mittels Datenfernfibertragung erfolgen; Die Information soil an zentraler Stelle gespeichert, aber dezentral aufbereitet und eingegeben werden; Die Aktualisierung des Bestandes soll jederzeit vom 0rt der Informationsgewinnung aus m6glich sein und - die folgenden polizeitaktischen Forderungen sollen erf~llbar sein: Information mug auf Wunsch, ganz oder teilweise, schnell und jederzeit zur Verf~gung stehen; sie mug von m6glichst jedem 0rt erreichbar, ein- fach zu handhaben und m6glichst umfassend sein. Aufgrund dieser Forderung erfolgte eine Ausschreibung mit der Aufforderung, Angebote zu Hardware und Software abzugeben. Die Firma IB~ erhielt 270 den Zuschlag, weil sie neben dem gNnstigsten Preis/Leistungsverh~Itnis ausgewogenste Verh~itnis 3. das zwischen Hardware und Software bieten konnte. DIE REALISIERUNGSPHASE Nachdem die Vertragsverhandlungen mit der Firma IBM abgeschlossen waren, wurde zun~chst intensiv an der Konfiguration der DV-Anlagen gearbeitet. Die Forderung nach hoher Verffigbarkeit der Einrichtung fNhrte dazu, zwei Rechner einzusetzen, (IBM /370-145 mit 768 KB Hauptspeicher) die beide in ihrer Ausstattung und ihren F~higkeiten gleich sind, damit sie wahlweise einzeln den Betrieb yon HEPOLIS aufrechterhalten kSnnen. Die angeschlossenen externen Einheiten tenlaufwerke (Magnetplattenspeicher - 16 Plat- IBM 3330 mit insgesamt 2,2 Mrd. Zeichen im direkten Zu- grill -, Magnetbandmaschinen, fibertragungssteuereinheiten) Schnelldrucker, Lochkartenleser und Daten- sind technisch so ausgelegt, da6 jeder Rechner auf jede dieser Einheiten zu jeder gewfinschten Zeit Zugriff haben kann. Die DatenNbertragungssteuereinheiten sind, genau wie die Rechner, doppelt installiert. Alle anderen Einheiten sind in genfigend gro6er Anzahl vorhanden, um auch bier Ausf~lle im technischen Bereich m0glichst ohne grOBere Wartezeiten fiberbrficken zu kSnnen. Als Betriebssystem werden OS MFT If, derzeit im Release 27.7, und OS-VS eingesetzt. Zur Verwa!tung der gespeicherten Daten und zum Betrieb der Datenfernverarbeitungseinrichtungen wird IMS 2 Level 4 eingesetzt. Als Datenstationen werden ausschlie~lich IBM 3270 Terminals verwendet. Diese sind als Einzelstation oder Mehrfachstationen - je nach Bedarf vorhanden und alle mit einem Puffer ffir 1.920 Zeichen ausgelegt. Die Wahl fiel auf diese relativ gro~en Bildschirme, weil HEPOLIS, soweit irgend mSglich, benutzerfreundlich aufgebaut werden sollte und kleinere Bildschirme automatisch zu Restriktionen in der Organisation des Bildschirmaufbaus geffihrt h~tten. Nur mit dem gro~en Bildschirm ist es gelungen: fast in allen F~llen eine Informationseinheit zubauen, ohne an Obersichtlichkeit in einem Bildschirm auf- zu verlieren, 271 jedes einzelne Datenfeld mit einer Feldbezeichnung yon 10 Zeichen zu adressieren und so in den meisten F~llen Aussagen ohne AbkOrzungen zu machen und, was f@r den Benutzer die Bildschirmformate gleichf6rmig for Auskunftsdienst und Anderungsdienst nahezu angesehen werden, wenn fast jede Zeile des nut ein Datenfeld der 0bersichtlichkeit Der Datenbestand Personenbezogene enth~it. Dennoch dient ein solcher Aufbau und macht das System benutzerfreundlich. unter standdatenbanken. - ist, aufzubauen. Es mag als Raumverschwendung Bildschirms sehr wichtig IMS-Steuerung gliedert sich zur Zeit in drei Be- Dies sind: Daten - Falldaten - KFZ-Daten. Der Bereich KFZ-Daten ist programmtechnisch im IMS-System bereits abgebildet. Alle Informationen zu einem Objekt (Person, Fall, dieser Dateien in nur einem Datensatz mit einem Ordnungsbegriff, noch nicht realisiert, KFZ) werden innerhalb abgebildet. Jeder Datensatz einer satzspezifischen Nummer, zu einem Objekt bei Kenntnis entsprechenden werden. zukommen. auch auf anderen Wegen an die gew~nschte Dies ist im allgemeinen IMS 2 keine M~giichkeit IMS-Konzept der "logischen chert Invertierungslisten Datenbank" Invertierung aufgegriffen dutch Anwendungsprogramme So bestehen nunmeh~ die folgenden M6glichkeiten, merkmalen auf Datens~tze zuzugreifen, m6glich. bietet, - mit dem phonetisierten wurde das erstellt. mit Identifizierungs- ohne die satzspezifische oder Namen oder - mit einem DeliktschlOssel heranDa und die erforderli- zu kennen: - mit Name und Geburtsdatum ist, war Information nur durch Invertierung der automatischen der Da in der polizeilichen Praxis diese Nummer nicht ilnmer und in jeder Situation bekannt es notwendig, ist adressierbar. Auf diese Weise kann die Information Nummer wiedergewonnen jedoch auf die personenbezogenen Daten; Nummer 272 - mit der Angabe Beh6rde und Aktenzeichen - mit dem amtlichen Kennzeichen - mit der Fahrgestellnummer - mit einer Kombination - mit der Motornummer - personenbezogenen personenbezogenen Richtungen sowie - den Personalien allein, oder aus beiden oder Daten. hergestellt worden zwischen Daten und fallbezogenen Daten, Daten und KFZ-bezogenen Daten, innerhalb der personenbezogenen Verbindungen Auswertung Datenbestandes. Problem po!izeilicher nicht ohne weiteres des Verfahrens Reicht abh~ngigen das sich mit IMS 2 Segmente konnte auch das indem einem Segment eine GrOBe Daten in einer statistisch des angesprochenen ver- Datenmenge Segment-Typs, gr6~er werden Daten in einem ersten abh~ngigen Segment auch das nicht aus, k~nnen beliebig viele Segmente Typs gef~llt werden. Eine kurze Beschreibung zeigen, ist, bei langen Datenfelder. kann. Falls die tats~chliche ist als die Aufnahmef~higkeit dieses der abh~ngigen die die zu erwartenden die nicht mehr unterbringbaren untergebracht. Datenverarbeitung, gel~st werden, n~nftigen Menge aufnehmen realisiert in- sich yon genutzt werden k~nnen. lOsen lie~, waren die variabel Problem zufriedenstellend gegeben wurde, Es versteht die nunmehr weitem nicht alle theoretischen M6glichkeiten Unter Ausnutzung in beiden erlauben nahezu jede polizeilich des gesamten da~ in der ersten Ausbaustufe, Ein weiteres jeweils Daten zwischen und den Personenfahndungsdaten. Diese weitreichenden teressante Daten; oder auf KFZ-bezogene Daneben sind Verbindunge~ - auf fallbezogene von Auskunftsdienst wie das System in der Praxis und %nderungsdienst soll genutzt wird: - Auskunfts diens t Der Polizeibeamte ~berpr~ft. ben~tigt Er wendet tionsmittel erforderlichen Datenstation, Identifizierungsmerkmale NachrichtenschlNssel Nber eine Person, die er gerade sich mit dem der Situation angepassten an die n~chstgelegene ob die so beschriebene merkmale Information und erkl~rt, Person gesucht wird. Kommunika- gibt dem Bediener die er m6chte wissen, Der Bediener AFO2 sowie die ihm ~bermittelten in die erste Zeile des leeren Bildschirms tastet den Identifizierungs- ein und bet~tigt 273 eine Funktionstaste. HEPOLIS teilt ~hm aufgrund dieser Werte innerhalb von I0 Sekunden mit, ob diese Person gesucht wird oder nicht. gibt dazu die gesamten Personalien sondere Hinweise forderlich, der Alias-Daten, zur Person und alle Fahndungsdaten anderen Nachrichtenschl~ssel rufen werden. einschlie~lich Der jeweils ausgedruckt k6nnten entsprechende angezeigte HEPOLIS be- aus. Mit einem andere Daten abge- Bildschirminhalt kann, falls er- werden. - ~nderungsdienst Das Aufgabengebiet ~nderungsdienst dern und L6schen vorhandener Daten in vorhandene ten. Letzteres bunden. umfagt neben den Funktionen Ver~n- Daten und der Funktion Zuffgen weiterer Datens~tze auch die Funktion Einbringen neuer Da- ist immer mit dem Er~ffnen eines neuen Datensatzes ver- Mit dieser Funktion wird die Forderung nach dezentralisierter Datenerfassung erf@llt, da alle Funktionen s~tzlich fiber alle Datenstationen Im ~nderungsdienst - Die Punktionen sind zwei Prinzipien sind - mit Ausnahme mit dem entsprechenden Nummer m6glich. des ~nderungsdienstes angewendet worden: des Einbringens Nachrichtenschlfssel Dies ist notwendig, rung genau an dem Datensatz grund- ausge~bt werden k~nnen. neuer Daten - nur und der satzspezifischen um sicherzustellen, durchgeffhrt wird, dag eine ~nde- an dem sie durchgeffhrt werden soll. - Alle Funktionen Anforderung Handelt erfolgen formatgesteuert, eines entsprechenden es sich um die Funktionen Ausgabeformat ~nderungsformates "Ver~ndern" mit den Daten geffllt, met adressiert werden. Umfang der zuzuffgenden Datenfeld, bringen neuer Datens~tze" Funktionen weicht wird das in bereits spezifiziert werden k6nnen. Art und Ist dies aus, in dem jedes einmal Platz hat. Die Funktion "Ein- etwas von den anderen ~nderungsdienst- ab. Der Benutzer kann sich auf jeden Fall so verhalten, ob er der erste w~re, der Daten zu einem bestimmten will. Die dazu erforderliche tenschlfssel Num- vorhandene in der Anzahl, gibt das System ein Standardformat das zugef@gt werden darf, voraus. oder "L6schen", vorausgehen, Datenfelder geht die die durch die satzspezifische Bei der Funktion "Erg~nzen Daten" kann eine Formatanforderung nicht gewfnscht, d.h. jeder Funktion Formatanforderung und den entsprechenden als Objekt einbringen erfolgt mit dem Nachrich- Identifizierungsmerkmalen. Findet 274 das System unter diesen Merkmalen ausgegeben; Benutzer keinen Bestand, ist Bestand vorhanden, kann nun entscheiden, zuordnen will oder nicht. 6ffnung eines neuen Datensatzes des Nnderungsdienstes abgenommen 0her eine Sonderfunktion sind schwieriger Das liegt daran, gleichzeitig erreicht werden mu~. Alle Programme Kontrollfunktion, genommen wird. zu handhaben, aber ein hohes Ma6 an Sicherheit haben eine ~ber die eine Pr~fung der ~nderungsberechtigung da~ nur der Besitzer im Bestand gespeichert kann. Nnderungsversuche als im Bereich yon der DV-Anlage des Nnderungsdienstes So kann erreicht werden, dessen Kennzeichen Da- kann die Er- da~ dem Benutzer so viel wie m6glich Formalismen werden sollen, Der erzwungen werden. des ~nderungsdienstes die des Auskunftsdienstes. angezeigt. ob er seine Daten einem vorhandenen tensatz Die Funktionen wird ein Leerformat wird er vollst~ndig ist - Nnderungen yon Nichtbesitzern vor- der Daten - vornehmen werden programmgesteuert ab- gewiesen. Auskunftsdienst aus jederzeit und ~nderungsdienst und in beliebiger zu erforderlichen Programme k6nnen yon derselben Reihenfolge Datenstation durchgeffihrt werden. und Datenbest~nde sind in HEPOLIS Die da- jederzeit ~erfNgbar. 4. DER VERBUND ZUM INFORN~TIONS-SYSTEM Unter INPOL wird der Zusammenschlu~ systeme verstanden. polizeilicher INPOL ist erforderlich, gebauten Polizeiinformationssysteme Bundeskriminalamtes @bet festgeschaltete Hersteller ben werden~ waren die Funktion Hardware zum Verbundbetrieb statto von allen Beteiligten t~glichen Betrieb realisierbar als durchaus @bernimmt einer zentra- findet in einem Da Rechner verschiedener und Software in INPOL betrie- bestimmte Absprachen - Daten~bertragungsprozedur Hier wurde eine auf den DIN-Normen basierende auf- und mit dem System des Der Datenaustausch Leitungen mit unterschiedlicher Datenverarbeitungs- Das Bundeskriminalamt Informationssystem fen Nachrichtenvermittlungsstelle. (INPOL) um die auf L~nderebene untereinander zusammenzuschlie~en. dabei neben seinem eigenen Sternnetz DER POLIZZI Absprache war. Die Prozedur zufriedenstellend. erforderlich. getroffen, die erweist sich im 275 Datenaustauschsatz Zum Datenaustausch wurden die Verbundteilnehmer beim heitlich anzuwenden wickelt, mit empfangs austauschen. Nachricht dem auf als und Weiterhin Auf Fehler ouch Empfang haben. Nachrichtenformate Daten Weise ist Senden ein ~ber sowohl einen auf falschen den yon Nachrichten die Art des den aufmerksam der die ein- Quittungsverfahren unrichtigen Zustand festgelegt, es mSglich, im 0bermittlungsdatensatz k~nnen auf diese beim wurde die Verbundpartner Fehlerquittungen satzes bestimmte Inhalt Datenbank ent- NachrichtenSender einer zu machen. eines Daten- hinweisen. Nachrichtenkopf Jedem Austauschdatensatz ist ein Nachrichtenkopf vorangestellt, der yon Anwendungsprogrammen verarbeitet wird. In diesem Nachrichtenkopf sind Informationen Nber den Sender und Empf~nger der Nachricht ebenso enthalten wie Angaben Nber ihre Art und L~nge. Der Nachrichtenkopf wird dem Sender in der Quittungsnachricht vom Empf~nger der Nachricht zurNckgesandt. Die zur Zuordnung erforderlichen Daten sind ebenfalls im Nachrichtenkopf enthalten. Die Verbundsteuerung sowie die Aufbereitung der Sende- und Empfangsdaten in das jeweils nStige Format erfolgen im HEPOLIS in einem besonderen Programm, das unter IMS-Steuerung permanent im Rechner vorhanden ist. Zur Umsetzung der Daten in das erforderliche Format dient in diesem Programm ein Tabellen-Modul, der in beiden Richtungen wirksam ist. D.h. mit nur einem Tabellenglied erfolgt die 0bersetzung vom HEPOLIS-Format in das Sendeformat oder vom Empfangsformat in das HEPOLIS-Format. Ober den Verbund werden in beiden Richtungen t~glich zusammen ca. 2000 Nachrichten zuzNglich der erforderlichen Quittungen ausgetauscht. Hierbei handelt es sich ausschlie~lich um Update-Nachrichten. Bestimmte Datenbereiche werden aus Sicherheitsgr~nden und zur Beschleunigung der Ausk~nfte in den an INPOL angeschlossenen Systemen parallel gespeichert. Um sicherzustellen, da~ diese Best~nde auch tats~chlich iden- tisch sind, werden in bestimmten Zeitabst~nden Bestandsabgleiche durchgef@hrt. Hierzu werden die Datenbest~nde zu einem bestimmten Zeitpunkt vom Update ausgeschlossen und entladen. Nach Beendigung dieses Vorganges werden diese (entladenen) Best~nde miteinander verglichen. Un- stimmigkeiten werden protokolliert, beseitigt. auf ihre Ursache hin untersucht und 276 Es versteht sich yon selbst, da~ das vorher erw~hnte Sicherungsverfahren zur Verhinderung yon unberechtigten Updates auch im Verbund gilt. 5. HEPOLIS IM T~GLICHEN BETRIEB Das System wurde im FrOhjahr 1974 mit den Erstdaten geladen. Dabei wurde der Ladeproze~ nicht in Form eines "initialload" durchgefahrt, die einzelnen Datens~tze wurden programmgesteuert sondern in das System einge- bracht. Dabei wurde jeder Zugang ~ber die parallel aufgebauten Suchlisten am jeweils vorhandenen Bestand vorbeigefahrt. dazu, Mehrfachbest~nde Dieses Verfahren diente aus den bis dahin nicht bereinigten handgefahrten Karteien zu erkennen und nicht in das System zu bringen. Bei diesem Erstladen wurden ca. 390.000 Personendatens~tze eingespeichert und in ca. 15.000 F~llen Mehrfachbestand erkannt. Der so aufgebaute Bestand wurde alsbald far den Auskunftsdienst freigegeben. Im November 1974 wurde der aktuelle Personenfahndungsbestand zur Parallel- speicherung vom Bundeskriminalamt abernommen und nach dem oben beschriebenen Verfahren in den Bestand eingefagt. Von 150.O00 abernommenen Datens~tzen trafen ca. 12.500 bereits auf Bestand. In diesen F~llen wurden dem vorhandenen Datensatz lediglich die noch fehlenden Daten zugefagt. Seit Januar 1975 iguft HEPOLIS roll im 24-Stundenbetrieb mit online-update und online-Auskunftsdienst. Die aber die Datenstationen abgewickelte Menge yon Arbeitsauftr~gen liegt derzeit bei durchschnittlich 9.500 t~glich mit Spitzen um 12.2OO tgglich. Da im Nnderungsdienst Arbeitsauftrag zu jedem 2 IMS-Transaktionen geh6ren, liegt die Zahl der abzu- wickelnden Transaktionen bei durchschnittlich 12.OO0, in Spitzen bei 16.000 t~glich zuzag!ich der Transaktionen des Verbundes. Die derzeit stgrkste Belastung lag bei 1.2OO Arbeitsauftr~gen oder etwa 1.6OO Transaktionen in einer Stunde. Das System bew~itigte diese Arbeitslast bei einer mittleren Antwortzeit vo~ 3 Sekunden im Auskunftsdienst und 5 Sekunden im ~nderungsdienst, wo- bei der Betrieb in 3 IMS-Regions abgewickelt wird. Die Arbeitsauftr~ge des ~nderungsdienstes erfordern 2 IMS-Transaktionen, weil der Aufbau der Eingabemaske und das danach erfolgende Update yon verschiedenen Programmen erledigt werden. Dies wurde so geplant, um die auch m6gliche Conversatio~al-Programm~erung zu vermeiden. Im Betrieb sind derzeit 78 TPProgramme und 8 BNP-Programme mit insgesamt 172 Transaktionscodes. 277 Alle Programme benutzen denselben Plausibilit~ts-Pr@fungsmodul selben Fehlerbehandlungsroutinen. und die- Dadurch wird erreicht, dab der Daten- bestand einen m6glichst hohen Grad an Richtigkeit hat und dem Benutzer Fehler einheitlich auf dem Bildschirm angezeigt werden. Die Pr~fungslogik ist so angelegt, dab alle eingehenden Nachrichten bis zum Ende auf Fehler gepr~ft werden. Am Ende der Pr~fung werden festgestellte Fehler in einem Fehlerformat angezeigt. Ist trotz der Fehler eine Verar- beitung m6glich, wird sie durchgef~hrt und das Ergebnis angezeigt. Ist eine Verarbeitung nicht m6glich, erfolgt ein entsprechender Hinweis in der Fehleranzeige. Das System wird aus Sicherheitsgr@nden einmal in 24 Stunden terminiert. Dies ist erforderlich, um die Restartzeiten bei abnormalem Ende so kurz wie m6glich zu halten. Die mittlere Ausfallzeit des Systems liegt unter EinschluB der o.g. geplanten Abschaltungen derzeit bei 2,2% der Verf@gungszeit (bezogen auf 24 Stunden t~glich). Die Restartzeiten bei abnormalem Ende liegen je nach Schwere des Fehlers zwischen 45 Minuten und 2 Stunden. Wesentlich zur Beschleunigung des Restarts hat beigetragen, dab jede Woche eine komplet- te Fassung der Datenbank auf Magnetplatten gesichert wird, so dab lange Restorel~ufe yon den ebenfalls vorhandenen Sicherungsb~ndern entfallen. Zur Fehlerbehebung allgemein ist zu sagen, dab die Restart- und RecoveryRoutinen des IMS sich in der Praxis voll bew~hrt haben. Eine Reorganisation der Datenbank war bisher erst einmal erforderlich. Sie dauerte insgesamt 92 Stunden und verlief nach anf~nglichen Schwierigkeiten reibungslos. Da w~hrend dieser Zeit der %nderungsdienst unterbrochen werden muBte und der Auskunftsdienst in seinen Aussagen mit fortschreitender Zeit immer inaktueller wurde, wird zur Zeit mit Vorrang an der Erstellung eines Programmsystems gearbeitet, das den Xnderungsdienst auch w~hrend der Dauer der Reorganisation erlaubt. In allgemeinen kann gesagt werden, dab HEPOLIS trotz der kurzen Dauer seines Einsatzes yon den Benutzern bereits akzeptiert ist und den weiteren Ausbaustufen erwartungsvoll entgegengesehen wird. 278 6. AUFWAND Zum Abschlu~ HEPOLIS einige Bemerkungen zum Aufwand, in der ersten Ausbaustufe kannt werden, sischen Polizei zum Einsatz nicht erstellt werden konnte, fang eingesetzt zeitweise werden. Dabei sollte nicht ver- erstmals kam. Das bedeutet, werden muBte. Da das System mit eigenem Personal gesamt zu erstellen. dab mit dem Aufbau yon HEPOLIS worben und ausgebildet der geleistet werden muBte, EDV bei der Hes- dab das EDV-Personal in der zur Verffigung stehenden mu~te externes Personal und Programmierer Analyse, Programmierung, Ins- Darin sind ent- Schulung und Datenerfassung, nicht abet die Datenaufbereitung. Gemessen liegen. am Erreichten scheint der Aufwand Um- 1974 waren besch~ftigt. liegt der Aufwand bisher bei 75 - 80 Mann-Jahren. halten Planung, Zeit in erheblichem In den letzten Wochen des Jahres 26 externe Organisatoren ge- Dieser Prozeg dauert noch an. im vernfinftigen Rahmen zu Relational Data Dicti0nar [ Implementation I A Clark, IBM United Kingdom Scientific Centre, Peterlee, UK Abstract The paper presupposes a team of application application generator served by a relational application grows by including not only routines by accumulating new relations, developers database using an (RDB). The for input/output, the latter r e p r e s e n t i n g but data-definition activity by the developers. A data dictionary (DD) is needed (I) to interrelate (2) to relate these to routines, (3) to produce auditing reports The benefits relations, and technical problems input streams and reports, and clerical procedures manuals. of maintaining the DD itself as a RDB are treated. INTRODUCTION This paper assumes a development served by a relational database by adding I/O and processing relations. relations team using an application (RDB). routines, The application but also by accumulating Such relations may be derived in the database, generator grows not only new from already existing as well as being inserted independently as a set of tup!es. We do not want to argue here why we consider an application generator 280 together this with a relational combination data-processing mean a group their to be an a t t r a c t i v e professionals. of h i g h l y own d i s c i p l i n e diverted from their technical nature individuals consider operation, that structure to innovate without of a p u r e l y In p a r t i c u l a r by q u e s t i o n s involved standard with database similar feature of PRTV a new relation derived called to open the r e l a t i o n as a r e a d - o n l y a terminal existing tuples session it contains. by e n t e r i n g relations, INTERSECTION SELECTION DIFFERENCE JOIN is a n a m e d call a ' (relational) entity value' specifies the tuples value there or the values, However, briefly but c o n v e n i e n t l y as intact note that by the term: specifically one w h o s e relation, say in PRTV, there been obtained 'A', rather than latter case. during names of w h i c h we shall to d e s c r i b e so. how Within in fact, which this either the it was derived. relation', contains just the value is no way of e f f e c t i v e l y from A in the contains substrings value or to find to the routines from w h i c h 'derived relational is it is a c h a r a c t e r just h o w to go about doing exist, file, workspace to say that of the r e l a t i o n s relations are e x p l i c i t l y operations: the user's Suffice Peterlee, can be d e f i n e d It is not our purpose is implemented. string w h i c h names, entity within Centre, tuples which acted on by the r e l a t i o n a l this relational an e x p r e s s i o n PROJECTION materialise these A new r e l a t i o n UNION The r e s u l t operations from e x i s t i n g until eg, choice is that of d e f e r r e d into a set of tuples for; purpose the wide to a r e s e a r c h not m a t e r i a l i s e d out h o w m a n y such of c h o o s i n g programming and used at the IBM UK S c i e n t i f i c The chief is, within being for their p a r t i c u l a r nor get essentially we shall generator). a relational developed (i) o who w i s h by c o n s i d e r a t i o n s to be d e f l e c t e d database), for d o i n g called P R T V one for use by a team of non- "non-DP p r o f e s s i o n a l s ' individuals true p u r p o s e (hence the a p p l i c a t i o n to say that we b e l i e v e use of the computer, or the best d a t a of t e c h n i q u e s prototype skilled Suffice to do w i t h d a t a - p r o c e s s i n g . (hence the r e l a t i o n a l We shall By by m a k i n g w i l l not wish data p a t h w a y s data base. we shall m e a n the name of another of A. This recognising that, If for i n s t a n c e is because, say, B has relation A 28I w e r e b u l k - l o a d e d from cards, next relation B created and simply a s s i g n e d the value of A, there w o u l d be nothing inherently d i f f e r e n t about A and B. Indeed, in PRTV as it stands there w o u l d be no way of telling w h i c h came first~ another value, M o r e o v e r either A or B could be r e a s s i g n e d leaving the other unchanged. case if B were derived from A. This is clearly not the Then w h e n e v e r A changed its value, B w o u l d change correspondingly. Since relational values are relatively small entities c o m p a r e d with the large sets of tuples they can p o t e n t i a l l y represent, one must not think that a computer process which forms new relations at run time out of e x i s t i n g relations is n e c e s s a r i l y going to be extravagant. Thus PRTV allows one to formulate as much of one's a p p l i c a t i o n as one cares to in a r e l a t i o n a l algebra, w h i c h on the face of it p e r f o r m s s e t - t h e o r e t i c o p e r a t i o n s upon whole sets of tuples. However, the operations are really p e r f o r m e d on t~e relational values we have just described, with the result that the o p e r a t i o n of forming the union, say, of two large sets of tuples is d e f e r r e d until one actually lists a relation, or opens a sequential file b a s e d on that r e l a t i o n and scans the file. are going to formulate, We in a relational algebra, p r o c e s s e s w h i c h e x p e r i e n c e d p r o g r a m m e r s w o u l d not c o n s i d e r h a n d l i n g in terms of e l e m e n t a r y o p e r a t i o n s w h i c h combine entire sets of tuples, or as they w o u l d see them, sets of records. Instead of a r e l a t i o n a l algebra, used instead, a relational calculus may of course be eg the A L P H A language of E F Codd support ALPHA, nor shown elsewhere, (2). any such relational calculus. PRTV does not yet However, as Codd has it is in principle feasible to translate from one to the other in a natural way. An A L P H A expression resembles a t h e o r e m in the P r o p o s i t i o n a l Calculus. To a logician, this represents a n a t u r a l and general way of m a k i n g an assertion about a given computer process. Other p r o f e s s i o n a l s have their own languages w i t h i n their own disciplines. Whether or not they can u n d e r s t a n d a P r o p o s i t i o n a l Calculus e x p r e s s i o n does not matter: their own languages are likewise amenable to m a c h i n e t r a n s l a t i o n into the r e l a t i o n a l algebra. C o n s i d e r an a p p l i c a t i o n w h i c h accepts a batch of input and p r o d u c e s reports (invoices, cheques, etc). It is c o n c e i v a b l e in p r i n c i p l e to load the input straight into a number of relations, then p r i n t out the reports d i r e c t l y from relations derived from the input relations. How far one p r o g r e s s e s towards this limit depends in p r a c t i c e on w h e t h e r 282 it appears easier to i m p l e m e n t a given step using the r e l a t i o n a l algebra, or a c o n v e n t i o n a l p r o g r a m m i n g is u n l i k e l y to be p r e d i s p o s e d language. A non-DP p r o f e s s i o n a l towards the p r o g r a m m i n g solution, p a r t i c u l a r l y if p r o v i d e d w i t h an a p p l i c a t i o n generator w h i c h constructs the r e l a t i o n a l algebra for him out of more familiar specifications. The m a i n p r o b l e m s w h i c h the a p p l i c a t i o n g e n e r a t o r will have to h a n d l e are those of m a k i n g the w o r k of one team m e m b e r available to another in an orderly fashion, and to stop t h e m u n s u s p e c t i n g l y cutting the ground away from under each others' feet. This can so easily h a p p e n if the result of one i n d i v i d u a l ' s work, e m b o d i e d in a relation, is p a s s e d to another, who i n c o r p o r a t e s it into a derived r e l a t i o n w h i c h is in turn p a s s e d on. It becomes a h e a v y a d m i n i s t r a t i v e task to keep track of what changes to the o r i g i n a l r e l a t i o n are safe, p e r m i s s i b l e , or are n o n s e n s e in terms of the real- w o r l d application. Note that w i t h this remark we do not d i s t i n g u i s h b e t w e e n a p p l i c a t i o n d e v e l o p m e n t and o p e r a t i o n a l r u n n i n g of the application. One p o s s i b l e way of coping w i t h this task is for the a p p l i c a t i o n g e n e r a t o r to a d m i n i s t e r a data dictionary. much cross-indexing, Since the task involves and the a p p l i c a t i o n generator is already served w i t h a r e l a t i o n a l database, it is a t t r a c t i v e to i n v e s t i g a t e m a i n t a i n i n g the data d i c t i o n a r y itself as a r e l a t i o n a l database. A range of tasks may be u n d e r t a k e n by the data dictionary, s i m p l e s t to the most ambitious. (1) Examples from the are: r e p o r t i n g upon all relations w h i c h are a f f e c t e d by u p d a t i n g a given relation, (2) p r e v e n t i n g or o t h e r w i s e q u a l i f y i n g an order to destroy a r e l a t i o n upon w h i c h further r e l a t i o n s are defined, (3) e n f o r c i n g semantic c o n s t r a i n t s imposed by the nature of the a p p l i c a t i o n at either a p p l i c a t i o n d e v e l o p m e n t time, i n s e r t i o n of 'nonsense' relations into the database, eg, to prevent the or at run-time, eg, to ensure that tuples are not i n s e r t e d into a given r e l a t i o n w i t h o u t c o r r e s p o n d i n g tuples b e i n g p r e s e n t in another relation. 283 (4) p r o d u c i n g listings of all routines and reports relating to a given d a t a b a s e relation, (5) for a u d i t i n g purposes, m a i n t a i n i n g an up-to-date clerical procedures manual. often requires c r o s s - r e f e r e n c e d reports, This lists of fields on input documents, and domains in the database. These tasks are r e p r e s e n t e d in order of increasing severity. We shall treat the first three only, d i s c u s s i n g some theoretical and technical p r o b l e m s w h i c h the data d i c t i o n a r y has to face. topics, a l t h o u g h ambitious in practice, The r e m a i n i n g two are t h e o r e t i c a l l y much simpler than the first three. (1) REPORTING UPON UPDATE DEPENDENCIES For the m o m e n t we are p r i m a r i l y concerned w i t h update d e p e n d e n c i e s between relations in the course of application development. sort of update dependency, that between records, will be treated later under the heading of The other or tuples in our case, 'semantic constraints' This facility is s t r a i g h t f o r w a r d l y achieved by m a i n t a i n i n g a DDrelation, call i% RDEPEND, 'DD-relation', we mean on the domains RELIDI, RELID2, DEPTYPE. 'data dictionary' relation, By to d i s t i n g u i s h it from the relations b e l o n g i n g to the application itself. DD-relations may or may not be kept in the same database as a p p l i c a t i o n relations: for r e s e a r c h c o n v e n i e n c e the former is recommended due to the facility for b o o t s t r a p p i n g the data dictionary, the latter advisable however for security. Note that we require some means of r e f e r r i n g to distinct occurrences of the same domain w i t h i n the c o m p o n e n t list of a relation. here by p o s t f i x i n g i, 2, etc, to the domain name etc, for the d o m a i n name RELID). derived relation, what capacity Furthermore, RDEPEND contains a tuple for each stating what relation it depends on (DEPTYPE). other relations, We do this (eg, RELIDI, RELID2, (RELID2) and in Where a relation is derived from a number of that n u m b e r of tuples is present in RDEPEND. if the r e l a t i o n uses another in more than one capacity, more than one tuple for that pair of 'RELIDs' Occurs. Now comes the advantage of using a relational database for the data 284 dictionary. The r e l a t i o n Thus by j o i n i n g which carries those appearing it to i t s e l f a tup!e notation on the r e l a t i o n a l it freely in order is t r a n s i t i v e repeatedly value as w e l l as in RDEPEND. to p r e s e n t algebra, an example. ISBL, This n o t a t i o n used by PRTV, it b e t t e r illustrate In PRTV a user m a n i p u l a t e s relations within of the sense. a relational dependencies, to make expressions in a logical we r e c o v e r for all the i m p l i c i t explicitly L e t us i n t r o d u c e based RDEPEND although is we m o d i f y our points. his w o r k s p a c e by form: C = A * B The first named C = N~A * B command would entity introduced are a c c e s s e d 'A t and construct as yet) earlier with and a value with a RELID its s y m b o l i c equal to the of 'value'; 'join' 'C' (the no tuples of t h e v a l u e s of 'B'. The second command would relational value formed should be read as Suppose a relation incorporate for the RELID: 'C' instead 'A' into the of the v a l u e of A. 'N~A' 'name-A'. we have d e f i n e d 'F ~ by the following sequence of commands: C = N~A D = N~B E = N~C F = N~E Then * D RDEPEND would In order RDEPEND required the f o l l o w i n g ( RELIDI C D E F F RELID2 A B C E D to o b t a i n tuples for every with (detected contain RDEPEND itself by t e s t i n g repeatedly each r e l a t i o n a l operand collating values equal of n o t a t i o n a l design 'equi-join'. which within components each Thus: to of F one m i g h t until no further The type tuples of This means certain specified an e q u i - j o i n 'overlap' operation that the tuples from are c h o s e n by domains. elegantly. by p l a c i n g join appeared 'join' are to be c o n c a t e n a t e d to specify the r e q u i r e d other. an ) dependency its cardinality) o is one called tuples: DEPTYPE N N N N V component It is a m a t t e r Here we show names b e n e a t h 285 RDEPEND RELID1 RELID2 DEPTYPE * RDEPEND RELID1 RELID2 DEPTYPE r e p r e s e n t s a r e l a t i o n a l value with five d o m a i n occurrences. Each tuple in the set so d e f i n e d is formed by taking a pair of tuples from RDEPEND for w h i c h RELIDI in one tuple equals RELID2 in the other. There is a c o m b i n e d tup!e for all such pairS. We may further join to this a relation, DTRANS, w h i c h contains a tuple m a t c h i n g each pair of values of DEPTYPE w h i c h turns up in the above r e l a t i o n a l value. domain DEPTYPE, E a c h tuple of DTRANS contains a third value from the r e p r e s e n t i n g the r e s u l t i n g (ie, transitive) dependency. After that, we can p r o j e c t out just those domains we wish to see, r e n a m i n g t h e m in the process. Note that in a relation, of a given tuple are suppressed. three given objects are related in a given way. these three objects is w h a t comprises the in this case). 'occurrence' all d u p l i c a t e s A r e l a t i o n simply records that, say, The ordered set of 'tuple' (3-tuple, or 'triple' Thus it makes no sense to talk about more than one of this tuple. The three objects are either related, or they are not. We may thus construct the relational assignment statement: RR = RDEPEND RELIDI RELID2 DEPTYPE * RDEPEND RELIDI RELID2 DEPTYPE * DTRANS DEPTYPE! DEPTYPE2 DEPTYPE3 % RELIDI RELID2 DEPTYPE The r e s u l t i n g r e l a t i o n RR has p r e c i s e l y the domains and domain-IDs of RDEPEND (the final once-removed. RR 'project', %, has seen to that), but relates RELIDs Thus RR contains the following tuples only: ( RELIDI RELID2 E A F F C B DEPTYPE NN NN V ) The r e l a t i o n DTRANS can be v i s u a l i s e d as a function with two arguments, D E P T Y P E I and DEPTYPE2, DEPTYPE3. r e t u r n i n g the c o r r e s p o n d i n g object in the domain Indeed in PRTV it can be implemented either as a PL/I function or as an o r d i n a r y relation, with a tuple for every pair of values of DEPTYPEI and DEPTYPE2. following tuples DTRANS Thus DTRANS m i g h t contain the (among others): ( DEPTYPEI DEPTYPE2 DEPTYPE3 N N NN N NN NNN NN N NNN NN NN NNNN N V V V N V Note that the last two tuples say, in effect, ) that if A depends on the 286 name 'B'r value and that B has the v a l u e connection therefore depends on the name If C is changed, 'C', assigns will be lost. assigning the the c u r r e n t value A w i l l not changel On the other hand, (current) value of C to A. This and if B of B to A is a m a t t e r of of convention° RR can be incorporated RDEPEND and the process more° FULL Co this c o n n e c t i o n effectively choice with of C, then A has only a c u r r e n t - back repeated RDEPEND~ (eg, by the expression:) + RR until On the other hand RDEPEND into R D E P E N D = RDEPEND the c a r d i n a l i t y it may be b e t t e r by this process may be m a i n t a i n e d each time more easily of RDEPEND to derive it is called by simple grows no a new relation, for, so that insertion and d e l e t i o n of tuples. When the owner of the c a t a l o g u e d relation, F, w i s h e s to m o d i f y it, the command: List FULL m i g h t be issued. FULL R D E P E N D '°' stands RDEPEND: This for 'SELECT'. QUALIFYING This m i g h t lists AN ORDER imposed a very g e n e r a l under rigidly feet. into the s y s t e m is s a t i s f a c t o r y 'F'. tuples The r e l a t i o n a l DEPTYPE N V NN V NNN DESTROY topic. to be e x p e c t e d others' TO by the a p p l i c a t i o n of an a p p l i c a t i o n each of just those to to be a special themselves, members is equal in operator Thus: be c o n s i d e r e d facility ~F ~ RELIDI = 'F' ( RELIDI RELID2 F E F D F C F B F A constraints basic = a selection such that R E L I D I FULL DEPEND: (2) RELIDI A RELATION case of e n f o r c i n g mo d e l upon However There itself. team for one a p p l i c a t i o n the d e v e l o p e r s claims ignores the g r o u n d to b u i l d from such a facility the p o s s i b i l i t y development as a to inhibit from cutting is a t e m p t a t i o n This semantic it can also be v i e w e d of a s y s t e m w h i c h development ) that what t e a m may not be so for another. The simplest relation such ' q u a l i f i c a t i o n I is of course from w h i c h another relation to refuse has b e e n derived, to d e s t r o y any ie, upon w h i c h 287 there is a n a m e - d e p e n d e n c y , until those d e p e n d e n c i e s have been eliminated. (3) ENFORCING MODEL SEMANTIC CONSTRAINTS IMPOSED BY THE APPLICATION To the theorist this is p r o b a b l y the most interesting use to w h i c h a r e l a t i o n a l data d i c t i o n a r y might be put. One o b j e c t i o n to the use of relational databases stems from the fact that certain p r o p e r t i e s of conventional files, such as d e m a n d i n g a unique value in the key field, or being hierarchical, c o n v e n t i o n a l programming, these are absent. In 'structural' p r o p e r t i e s are e x p l o i t e d to enforce certain semantic constraints arising out of the a p p l i c a t i o n model, such as a p a r t i c u l a r child segment h a v i n g a single parent. However the skills of a database specialist are often needed to exploit such r e s t r i c t i o n s inherent in the a v a i l a b l e structures. It is up to him to ensure that his model of the a p p l i c a t i o n in terms of key fields and segment d e l e t i o n rules behaves like the r e a l - w o r l d counterpart: yet it is often rather hard for a business to find a man with intimate k n o w l e d g e of b o t h realms. Thus it is a t t r a c t i v e for our purpose that the t r a d i t i o n a l restrictions of key-fields and m a n y - o n e m a p p i n g s have to be m o d e l l e d e x p l i c i t l y in PRTV, since the p r o b l e m of e n f o r c i n g semantic constraints can then be split off from that of p r o v i d i n g a structure capable of h o l d i n g the data in the first place. How can one use the r e l a t i o n a l algebra here d i s c u s s e d to model these sorts of update constraints? Suppose we have a standing relation, t r a n s i e n t relation, X, in the database, and a UPD_X, h o l d i n g today's new additions to X. We w a n t to insert into X just those tuples of UPD X whose values of the keydomain, KEY, do not already occur as values of KEY in X. X % (KEY), is a r e l a t i o n a l value, with just one domain, o c c u r r i n g in X. of current keys By joining it to UPD X we express just those tuples of UPD X whose keys already occur in X: X % (KEY) * UPD_X By forming the I KEY I KEY 'DIFFERENCE' of this e x p r e s s i o n w i t h the original UPD X we express all those tuples of UPD X whose keys do not already occur in 288 X. We now simply 'UNION' these with X to get NEW X. Ignoring the m special d o m a i n - o v e r l a p p i n g notation, N E W _ X = NiX + (N~UPD_X - NEW X is given by: (N:UPD_X * (NIX % KEY))) Note that we have made NEW X a derived r e l a t i o n by q u o t i n g the names of relations (N:) instead of their current values. materialised, NEW_X, upon being w i l l c o n t a i n the d e s i r e d set of tuples, w h i c h may be used to replace the current value of X in the database. that X is only ever u p d a t e d in this way. We m u s t then ensure A crude way of doing this is to have the data d i c t i o n a r y keep a list of p e r m i s s i b l e a s s i g n m e n t s given RELIDs, into so that the a p p l i c a t i o n g e n e r a t o r will not accept a c o m m a n d c h a n g i n g X e x c e p t those, e x p l i c i t l y catalogued, w h i c h a s s i g n NEW X into X. m C l e a r l y a similar t e c h n i q u e can be used to insert only those tuples into X w h o s e KEYs occur in another relation, W. The tuples w h i c h fail to get into X can of course be r e c o v e r e d in the expression: It is a c r i t i c a l b u s i n e s s d e s i g n i n g facilities U P D _ X - X. for an a p p l i c a t i o n d e v e l o p e r to impose c o n s t r a i n t s upon h i m s e l f or his colleagues. It p r e s u p p o s e s that both he and we know w h a t sort of security we are s u p p o s e d to be o f f e r i n g him. If this q u e s t i o n is not resolved, it is so easy to end up w i t h a security s y s t e m w h i c h n e i t h e r deters d e l i b e r a t e abuse, nor p r o t e c t s a d e q u a t e l y against a c c i d e n t a l misuse, but appears d e s i g n e d solely to e n c u m b e r lawful operations. Our i n t e n t i o n s w i t h the data d i c t i o n a r y are p r i m a r i l y to reduce the incidence of subsequent mis-modificati'ons to an application. As time goes on, or one gets further away f r o m the d e s i g n e r of a p a r t i c u l a r component, of the c o m p o n e n t is u n a v o i d a b l y the m o d i f i e r less w e l l - i n f o r m e d as to the side- effects such m o d i f i c a t i o n s may have. On the other h a n d we make no attempt as yet to p r o t e c t against d e l i b e r a t e wrecking. C o m p a r e this w i t h the ~facility' sometimes found in d a t a b a s e p a c k a g e s w h i c h simply refuses to accept data w i t h d u p l i c a t e keys. The first non-DP user of s~ch software is i n e v i t a b l y e n g a g e d in research, even if he does not k n o w it, and is in the typical r e s e a r c h p r e d i c a m e n t of h a v i n g a file of grubby data he wishes to load up, p r e c i s e l y to use the s o p h i s t i c a t e d q u e r y facilities the d a t a b a s e package may offer to report on such things as d u p l i c a t e keys. skills, He q u i c k l y has to learn a few DP like how to m a n o e u v r e around the trap for d u p l i c a t e keys. The r e l a t i o n a l data d i c t i o n a r y a p p r o a c h allows just that structure to 289 be put up first inside the database, which suffices to store the raw data, and adequate p r o t e c t i o n to be devised later for the use of the various relations of the appl~cation. C o m p a r e d to the task of d e f i n i n g relations to hold and m a n i p u l a t e the data of the application, as e x e m p l i f i e d by X, it is m u c h harder to w r i t e a s a t i s f a c t o r y UPD X to constrain its use. The latter is as d e m a n d i n g as w r i t i n g a foolproof macro, w h i c h is really w h a t UPD X is. Later w o r k might c o n c e n t r a t e on p r o v i d i n g relations peg', like UPD X 'off the so to speak, that is as a result of some n o n - p r o c e d u r a l s p e c i f i c a t i o n by the a p p l i c a t i o n developer, develop the skill to c o n s t r u c t them himself. rather than require h i m to Such 'constraining' relations may resemble the facilities available w i t h C O D A S Y L / D B T G or IMS. A l t e r n a t i v e l y the sort of (3), 'semantic constraint' w h i c h w o u l d be useful in p r a c t i c e m i g h t be thought out c o m p l e t e l y afresh. Our a p p r o a c h contrasts w i t h the C O D A S Y L / D B T G a p p r o a c h of submitting the update c o n s t r a i n t s as part of the s e p a r a t e d from the 'data definition', 'data manipulation' activity. to be carefully In the e n v i r o n m e n t d e s c r i b e d it is d i f f i c u l t to d i s t i n g u i s h b e t w e e n the two. SUMMARY We have tried to indicate a r e l a t i o n a l approach to many w e l l - k n o w n problems of s o - c a l l e d data definitions. the use of a data dictionary, The key to this a p p r o a c h is itself m a i n t a i n e d as a relational database. Many details of this relational data d i c t i o n a r y clearly remain to be finalised. However the essence of a r e l a t i o n a l data d i c t i o n a r y is that it can be i m p l e m e n t e d even before such q u e s t i o n s need be resolved. Thus the basic structure of, and facilities offered by, such a data d i c t i o n a r y can be changed e x t e n s i v e l y w i t h o u t the need to reload the stored data. This allows of c o n s i d e r a b l e e x p e r i m e n t a t i o n w i t h i n a p a r t i c u l a r project. 290 REFERENCES (1) S J P TODD: PRTV Overview, IBM UK Scientific Centre report No 75, 1975. (2) E F CODD: A Database Sublanguage founded on the R e l a t i o n a l Calculus, Proceedings of the 197! A C M S I G F I D E T W o r k s h o p on Data Description, (3) C O D A S Y L DBTG: Available Access and Control. Data Base Task Group Report A p r i l from ACM, New York. 1971. Data Base System Evaluation Harry L. Hill, IBM The evaluation of data base systems embraces four very significant fields, the first being the design of resource management necessary to build into the product necessary performance attributes to make that product or system an attractive saleable item. The second part is the prediction of performance for a given configuration and workload. The third is the ability to measure the performance and confirm or deny the expectation obtained from the predictive process; and finally the ability to tune the system to accommodate changes made either in the configuration that exists or the user w o r k l o a d t h a t is c u r r e n t l y p r e s e n t e d to the s y s t e m . To cover these four elements of data base evaluation, I have chosen to describe within this paper these topics: I. Concepts of system performance 2. Performance and the development process 3. Predicting and measuring system performance 4. System performance tuning I. CONCEPTS OF SYSTEMS PERFORMANCE Let us look at some of the basic concepts behind system performance. The key ques- tion is one of systems performance sensitivity - the problem is always to find what is in the critical path. Fig. 1 describes clearly the approach that is taken, given t h a t one can i d e n t i f y the b o t t l e n e c k in the s y s t e m ; the key q u e s t i o n is t h a t if I r e m o v e t h a t b o t t l e n e c k , at w h a t p o i n t and u n d e r w h a t c o n d i t i o n s do I hit the n e x t one ( b e c a u s e t h e r e is a l w a y s a n e x t o n e ) . 292 When we talk about the goodness of performance, i . e . how well a system p e r f o r m s , it is necessary to establish measures of goodness. We talk about performance in the f o l l o w i n g w a y s , as shown in f i g . 2 - in terms of t h r o u g h p u t , jobs p e r u n i t time, system data rate, n u m b e r of accesses p e r second to a storage d e v i c e , etc. perhaps more sophisticated and better ways of d e s c r i b i n g performance. T h e r e are For example, t h r o u g h p u t per r e n t a l , d o l l a r s per second per access to a storage device, cost per job, cost per transaction. These latter measures of performance tend to be more r e v e a l i n g of the ~value to the user ~ as we sometimes call it, i . e . the cost performance trade-off. It should be o b s e r v e d , as in f i g . 3, that there are some v e r y s i g n i f i c a n t trends in performance evaluation, In the e a r i y days when we d e s c r i b e d performance in terms of component o r device p r o d u c t i v i t y , you w i l l recall the measures of CPU goodness w e r e in terms of add time, s u b t r a c t time, m u l t i p l y time, etc. We have emerged from that somewhat p r i m i t i v e measure of performance and today we talk about performance in terms of systems p r o d u c t i v i t y , w h e r e the system is the sum of the h a r d w a r e , the software and the w o r k l o a d effects. T o m o r r o w I am confident that we w i l l be t a l k i n g about systems p e r f o r m a n c e not so much in terms of j u s t the system but in terms of the user r e l a t i o n s h i p to that system. I call that ~people p r o d u c t i v i t y t, w h e r e peoplers p r o d u c t i v i t y is geared to maximise the objectives of a g i v e n e n t e r p r i s e o r business. computing system is then but one key element in meeting a business objective. The This is p a r t i c u l a r l y i m p o r t a n t for live t e r m i n a l systems w h e r e the business of a company may be t o t a l i y dependent on the a v a i l a b i l i t y and u s a b i l i t y of the total system, System performance is r e a l l y best d e s c r i b e d in terms of the management of time spent w a i t i n g for systems resources. Fig. q d e s c r i b e s a r e p r e s e n t a t i o n of systems resources because that is what performance is aii about, the management of resources w i t h i n a system allocated to a g i v e n p r o f i l e of w o r k , to date behaves in this w a y , E v e r y s i n g l e system that has been constructed The element of w o r k is offered to the central processing u n i t or w o r k engine and that w o r k is executed by m e r g i n g data w i t h a p r o g r a m to a point w h e r e more data o r p r o g r a m s are r e q u i r e d . At that point in time the processing ceases and a request is queued in f r o n t of a storage device ( i . e . a resource) in o r d e r to obtain additional data or programs to continue or complete the processing. W h e n that work is completed, the processing engine proceeds on to another task. What w e have is a serial processing engine operating on elements of work who's data and programs are queued 293 in parallel against system resources. By placing a 'meter' in the line between the storage and the queue for processing one can get a measure in terms of transactions per second o r system data rate. Fig. 5 shows a plot of system performance against the number of tasks, that is, the depth o r level of m u l t i p r o c e s s i n g and the consequence on the system of these tasks executing work. Notice that as you increase the number of tasks, the system performance increases to the point w h e r e a bottleneck is reached and I have chosen in this case to show the channel at the f i r s t bottleneck. If I w e r e to add channels to the system I w o u l d r e l i e v e that bottleneck w i t h i n the system and I would hit the next one w h i c h I have, in this case shown to be storage devices. So performances p r o g r e s s t h r o u g h ' c e i l i n g s ' o r b o t t l e - necks. Work that is presented to a computing system does not r e p r e s e n t a constant load on all resources. In f i g . 6 1 have shown d i a g r a m a t i c a l l y a time v a r y i n g w o r k l o a d effect on the system w h e r e the height of each pedestal represents 100% u t i l i s a t i o n of that resource notice that I am showing only 3 resources, a channel, a CPU and a d r i v e device. The point is that not all of the time is any one resource the bottleneck, b u t the bottleneck changes from rsource to resource depending upon the demand of the time v a r y i n g w o r k load placed against it. When that resource is 100% u t i l i s e d , it c l e a r l y forms a black mark on top of the pedestal, so by r e m o v i n g that bottleneck, that is, b y p u t t i n g a more p o w e r ful CPU in or a l a r g e r number of channels, this serves to i m p r o v e the overall system performance. C l e a r l y we are seeking an economic design w h e r e the number of black marks on top of the pedestal is reasonably balanced, that is, resources are not wasted. Fig. 7 depicts a system transaction rate versus a time v a r y i n g w o r k l o a d , and a similar argument applies. All transaction-based systems tend to behave in a s i m i l a r way and f i g . 8 shows a t h r e e dimensional plot of response times versus real storage v e r s u s transaction traffic rate. Notice that as the real storage available for processing is decreased, the response time increases. S i m i l a r l y , as the transaction traffic rate increases, the response time increases and all systems tend to behave this way. It should be realised that in v i r t u a l o p e r a t i n g systems the decrease of storage causes an increase in paging rate. Under these c o n d i - tions the CPU u t i l i z a t i o n g e n e r a l l y decreases and the system g r a d u a l l y becomes I/O bound. 294 2, PERFORMANCE AND THE DEVELOPMENT PROCESS As data base systems have g r o w n and become sophisticated, it is necessary to achieve not o n l y good p e r f o r m a n c e , but p r e d i c t a b l e performance. d e v e l o p m e n t process of the p r o d u c t . This has to be b u i l t into the I should like to take as an example the development of storage w h i c h is a key resource in any data base system. Fig. 9 shows a typical d e v e l o p m e n t process w h i c h , in the e a r l y days of the c o m p u t e r i n d u s t r y , started off w i t h the research and development of what I w o u l d d e s c r i b e as the basic parameters of the storage d e v i c e . These parameters w e r e offered to e n g i n e e r i n g g r o u p s who designed them into p r o d u c t s and we developed on that basis the w e l l - k n o w n d i s k d r i v e . The d r i v e s w e r e offered to the CPUs and were i n t e g r a t e d w i t h software systems w h i c h in t u r n were offered to i n d u s t r i e s to c o n f i g u r e and use on behalf of that i n d u s t r y , and those i n d u s t r i e s designed those systems together w i t h t h e i r a p p l i c a t i o n s to generate useful data processing facilities. The p o i n t is that in the e a r l y days we started off w i t h the basic technology and we d i d w h a t is d e s c r i b e d as a ' b o t t o m - u p ' design - that is how the technology of the i n d u s t r y g r e w up~ storage device tf we look today at the basic r e l a t i o n s h i p of the d i r e c t access (fig. 10) you w i l l see that only certain combinations of those basic parameters are of i n t e r e s t to the systems d e s i g n e r , such as data rate and access times areal d e n s i t y is f r a n k l y not v e r y s i g n i f i c a n t to the system d e s i g n e r . size decreases data rate becomes less i m p o r t a n t than access time. S i m i l a r l y as block The consequence of this ' b o t t o m - u p ~ d e v e l o p m e n t process has been that we have decreased in a r a t h e r d r a m a t i c w a y the effective cost to the user of storage. The decrease in storage cost as seen by the user is shown in f i g . 11, i . e . the r e l a t i o n - ship between d o l l a r s per megabyte p e r month for a v a r i e t y of products versus the year of announcement. In f i g . 12 you w i l l also notice the access rate c h a r a c t e r i s t i c s w h e r e the accesses p e r d o l l a r and the accesses p e r second are shown for the same range of p r o d u c t s . If we are to look now at f i g . 13 we w i l l see that the storage technology spans a range of access times, storage capacities and cost p e r b i t . the gap in the continum of storage d e v i c e s , T h i s f i g u r e is i n t e r e s t i n g - o b s e r v e T h i s gap occupies the same time domain as task s w i t c h i n g in several of the medium and high speed p r o c e s s o r s . The technology for storage and data base systems is rich - rich in function and rich in performance and in cost choices. T h e r e is in fact sufficient technology to reverse the process and instead of doing a ~bottom-up' d e s i g n , to take the r e q u i r e m e n t s of modern applications and do a ~top-down ~ design (again see f i g . 9 ) , that is, to define the systems and the a p p l i c a t i o n s that are r e q u i r e d in a business or e n t e r p r i s e and to map them into the technology. 295 3. PREDICTING AND MEASURING SYSTEM PERFORMANCE The timely development of performance tools forms an essential p a r t of d e v e l o p i n g a computing system. It has two major c h a r a c t e r i s t i c s . One, it is i m p o r t a n t to be able to p r e d i c t the performance of a complex data base/data communications system p r i o r to either the h a r d w a r e o r the software being in existence and two, it is i m p o r t a n t that having p r e d i c t e d it and b u i l t it, it is i m p o r t a n t to be able to measure it and validate the p r e d i c tion. The l e a r n i n g process is being able to d e s c r i b e differences. The essential objective in developing performance tools is to be able to establish a d i s c i pline both for d e v e l o p e r s and subsequently for users of a v o i d i n g s u r p r i s e s in performance, since late d i s c o v e r i e s are hard to c o r r e c t . Fig. 14 d e s c r i b e s this o b j e c t i v e and describes the methods that are g e n e r a l l y used to achieve them, that is, to develop models, to v a l i date those models, to be able to t r a c k the i n s t r u c t i o n path length w i t h i n a system and, as k n o w l e d g e is gained, to be able to document that e x p e r i e n c e and c o n s t r u c t a v o c a b u l a r y that communicates both the p r e d i c t i v e and the measurement processes. the process. Fig. 15 shows T h e r e are r e a l l y two types of p r e d i c t i v e c a p a b i l i t i e s , one is analytic and the other is s i m u l a t i v e . In the measurement area there are two types of facilities r e q u i r e d to produce the data necessary for measurement; one is h a r d w a r e and the o t h e r is soft- w a r e monitors. Measurement is both time consuming and expensive, therefore there has been s i g n i f i c a n t emphasis and p r o g r e s s placed upon the development of models in o r d e r to d e t e r m i n e the performance of a system, w h i l e measurement techniques are i n c r e a s i n g l y used to validate these models so that performance information and g u i d e l i n e s can be generated spanning a range of a p p l i c a t i o n s , configurations and w o r k l o a d demands. It should be recognised, h o w e v e r , that m u l t i p l e sub-systems o p e r a t i n g w i t h i n one o p e r a t i n g system are often hard to handle by conventional a n a l y t i c means, and one is forced to c o n s i d e r h y b r i d s of a n a l y tic and simulative techniques. It is most important that the d e v e l o p e r o r user of a model has c l e a r l y in his mind the question he wants the model to a n s w e r . Rarely is a general purpose model sensitive to questions that w e r e not known at the time the model was developed. It is perhaps useful to examine a data base/data communication system from a performance standpoint, and for this I have chosen IMS/VS and have constructed a flow chart for the main processing blocks of that system. Fig. 16 shows the flow of such a transaction; 296 notice that it d i v i d e s itself into three major p a r t s . s w i t e h i n 9 and message queues are handled; The communication p a r t w h e r e message the processing of that message against p r o g r a m and data and the m u l t i p l e calls to that data base for that p a r t i c u l a r transaction; the completion of that transaction and the generation of the o u t p u t message in the message queue, and the h a n d l i n g of that message t h r o u g h a t e r m i n a l access method to a t e r m i n a l . That is, if you l i k e , the life of a transaction; it is born at the terminal w h e r e it enters the system and it dies at the terminal when the transaction is completed. If we were to place 'meters' in the lines j o i n i n g those function to queues and l i b r a r i e s , e t c . , we could in fact measure the a c t i v i t y that is g o i n g on w i t h the system. As we pass m u l t i p l e messages into such a system, we see that the p r o b l e m of performance resolves down to the allocation of resources, CPUs, channels, p r o g r a m s and data to handle the r e q u i r e m e n t s of each d i f f e r e n t transaction. The job, then, is to define a l g o r i t h m s for u s i n g resources and for w a i t i n g for resources. These a l g o r i t h m s s t a r t w i t h w h a t p r i o r i t i e s are associated w i t h each transaction type and must include r e c o v e r y strategies in the event that a resource, a data path o r a queue d i s c i p l i n e fails. A v a i l a b i l i t y and performance are becoming i n c r e - a s i n g l y dependent upon r e c o v e r y schemes d e s i g n e d into the p r o d u c t . There are really only two ways of improving the performance of a data base/data communication system. One is to shorten the transaction path length and the other is to provide either faster or parallel processing resources. It is thus often desirable to be able to calculate the n u m b e r of instructions executed on behalf of an IMS transaction. Fig. 17 shows a typical appraoch to such a problem, where T is the total instructions executed for the IMS transaction, KI through K5 are coefficients representing various IMS and V S releases; Q , U , N and C represent major parameters of most importance and significance in terms of o v e r a l l systems p e r f o r m a n c e . N o w if we were to take these transactions and were to apply values to those parameters, it is conceivable that one could divide the instruction processing capability of the machine by the path length of the transaction and come up with a theoretical m a x i m u m number of transactions per second that that resource could process, given that the processing unit was in fact the major bottleneck in the system. This has been done in fig. 18 and shows the difference in transactions per second processed for an 85% utilised 158 and 168. It should be clear that these are not measured values, they are predicted values, and are shown merely to demonstrate the sensitivity of system performance to changes in the key parameter values that affect it. 297 Fig. 18 is, then, designed to show the s e n s i t i v i t y of a system to changes in the major parameters that affect the system performance. Again this is not a measured e n v i r o n m e n t this is a p r e d i c t e d e n v i r o n m e n t and i t is p r o b a b l y not possible to a c c u r a t e l y r e p r o d u c e this in a measurement e n v i r o n m e n t w i t h o u t r i g o r o u s l y d e f i n i n g several other i m p o r t a n t system and user dependent factors. It does, h o w e v e r , also show on the same theoretical basis the difference in path length between an MVS system and an MVT system. T r a d i t i o n a l l y , it is thought that the systems that have h i g h e r sophistication have longer path lengths and whereas in general this is t r u e , it is clear that in the MVS system, as the data base call s t r u c t u r e becomes more complex, the difference in path length d i m i n i shes s i g n i f i c a n t l y in favour of MVS. Independent of the investment made in d e v e l o p i n g and using models of the system, it is essential to measure the real t h i n g as r a p i d l y as p o s s i b l e . One method used in IBM is shown in fig. 19, w h e r e a simulated n e t w o r k is represented in both h a r d w a r e and softw a r e and a data base is constructed to r e p r e s e n t the application and system data bases. The simulated n e t w o r k is p r o g r a m m e d to generate s c r i p t s at a g i v e n i n t e r v a l and w i t h a g i v e n think time, or range of think times, such that the system u n d e r test appears to be loaded w i t h transactions as though they w e r e coming from real t e r m i n a l s . By the a p p l i - cations of suitable h a r d w a r e probes and suitable software probes, we are able to measure the u t i l i s a t i o n of resources o c c u r r i n g w i t h i n the system u n d e r a v a r i e t y of transaction rates, types and call s t r u c t u r e s . A typical measurement is shown in f i g . 20, in this case an IMS/VS 1.0.1 system r u n n i n g u n d e r VS2 release 2. Notice the l i n e a r CPU u t i l i - sation as transaction rate goes up on this 158 CPU w i t h 2400 Baud lines and 4800 Baud lines. The measurement in question is designed to e x p l o r e the s e n s i t i v i t y of line speeds to system performance. Note that in the 2400 Baud lines case, w i t h ten lines, the line u t i l i s a t i o n became a s i g n i f i c a n t bottleneck in the system and this is evidenced by the response times s t a r t i n g to rise r a t h e r r a p i d l y , whereas at 4800 Baud line speed, the response time is well contained. System performance can be v i e w e d in two ways and f i g . 21 shows that we are e i t h e r using a resource o r we are w a i t i n g for it. Let us now take the flow c h a r t (fig. 16) that we developed to show the life of an IMS transaction. Let us look at that flow c h a r t w i t h respect to the time we spend w a i t i n g for a resource, that is, w a i t i n g for a l i n e , w a i t i n g 298 for b u f f e r s , w a i t i n g for a processing region, w a i t i n g for an application p r o g r a m to be b r o u g h t in, w a i t i n g for I/0, that is, storage accesses to b r i n g data o r p r o g r a m s into the system, w a i t i n g for Iines to handle the o u t p u t message and w a i t i n g for services to t r a n s mit that message to the t e r m i n a l . rces. Let us also look at the amount of time using the resou- Fig. 22 shows, and it is d r a w n to scale, w h e r e if this were 8 inches long, the response time from b e g i n n i n g to end w o u l d be 1 second, making 3 loops around the DL1 call. It is also clear, as we approach a 100% u t i l i s e d system, the units of processing " occupy a smaller and smaller p o r t i o n of the total response time. This c h a r t shows the w a i t i n g time and processing time for o n l y one transaction w i t h i n a 75% toaded system. 4. SYSTEM PERFORMANCE TUNING T h e goodness of performance then, of a data base/data communication system is balancing or tuning two things. It is balancing the supply of resources with the d e m a n d on them, because we are either waiting for that supply or w e are using that supply. Fig. 23 shows this balancing scheme. If w e have a high supply with respect to the demand, are wasting resources. If w e have a h i g h d e m a n d with respect to the supply of resources w e are going to suffer poor response times. In general, performance is a user option since it requires the additon of resources and these generally cost money; is that the case. but not always !n some cases, it is necessary and possible that the resources be tuned to meet the d e m a n d of the workload. elements s h o w n then w e 'Performance tuning is concerned primarily with the in fig. 24, being data base profiles, transaction profiles, profiles of the IMS system, of the processing requirements of the region, of the hardware and software configuration, of the overall teleprocessing configuration, and importantly, the use of tools to measure these resources. Fig. 25 shows the primary factors affecting the performance and the design of the system. T h e n u m b e r of transactions per second is typically in the range of I to 50, although within the next five years I a m confident that you will see that range g r o w towards 200 transactions per second, in terms of E X C P s 0.1 to 5 per data base call. per call, w e are looking today in the range In terms of calls per transaction, w e typically find a n y w h e r e from 5 to 50 calJs with several transaction types exceeding 50 and reaching close to 100 calls per transaction, so the data base designer is faced with designing a system of resources which can efficiently and economically accommodate the range of performance critical factors. 299 The tuning of data base systems is c l e a r l y a complex matter i n v o l v i n g f i r s t l y an awareness of u t i l i s a t i o n of resources, and secondly the u n d e r s t a n d i n g and k n o w l e d g e about the sens i t i v i t y of changing the resource allocation to achieve an o v e r a l l system performance level. The objective then is shown in f i g . 26 - either minimise the transaction path length a n d / o r invoke p a r a l l e l i s m of key resources. The method recommended is f i r s t l y to q u a n t i f y the p r o f i l e s of the transaction and of the system; in response to changes in the w o r k l o a d ; understand the b e h a v i o u r of the system use software monitors to q u a n t i f y that b e h a v i o u r and r e s o r t to h a r d w a r e monitors which do not i n t e r f e r e w i t h the processing c h a r a c t e r i s t i c s of the system; to define experiments to uncover and o r d e r the bottleneck; changes, one at at a time, to the system and measure the effects. and to make Only by measurements do we r e a l l y get smart. Performance tuning can be an iterative process because what one is t r y i n g to do is to optimise the u t i l i s a t i o n of resources and match them against the w o r k l o a d . F r e q u e n t l y that w o r k l o a d is changing and one's job is not done until one has resolved the differences between what one expects, that is the expectation of performance, and what one has a c t u a l l y got. If there is s i g n i f i c a n t differences between those two elements, then c l e a r l y there must be an explanation w h i c h always seems to lie in better u n d e r s t a n d i n g of what the system is d o i n g . system. I mentioned the c o m p l e x i t y of tuning a data base/data communication It is c e r t a i n l y not true that e v e r y one behaves d i f f e r e n t l y . T h e r e are some typical causes of bottlenecks which are f r e q u e n t l y uncovered and those r e a l l y fall into three categories, as shown in f i g . 27 - resources of a teleprocessing n e t w o r k - balancing of those resources and the selection of b u f f e r sizes and message format buffers; r e g i o n resources, that is the amount of p r o g r a m loading that is done; the size of application p r o g r a m s ; the s t r u c t u r e and the s t r u c t u r e and the size of the data base; of extended function w i t h i n that data base structure; the the use and lastly, the CPU resources, w h e r e its use is determined l a r g e l y by the amount of system and user I / 0 and the use of bufferpool services. F i n a l l y , I should lilte to discuss trends w i t h i n data base/data communication system performance. Those trends r e a l l y fall into three broad areas - trends in p r e d i c t i o n , t r e n d s in measurements and trends in t u n i n g . 1 t h i n k that over the next five years we are g o i n g to see generalised use of analytic tools for dedicated systems and some g u i d e lines based on analytic tools for m i x e d systems. We are going to see the specific use of simulation and h y b r i d tools for m i x e d or complex systems. We are also going to see the a v a i l a b i l i t y of tools at an e a r l y p o i n t in the design of systems to help users choose 300 amongst d i f f e r e n t c o n f i g u r a t i o n s w h i c h have d i f f e r e n t p r i c e performance c h a r a c t e r i s t i c s . In terms of measurement t r e n d s , we are going to see integrated software performance m o n i t o r s , because b a s i c a l l y performance is a user option and it is p r o p e r that the user understands what the system is d o i n g and w h a t choices he has to change it. Where a software m o n i t o r impacts the basic b e h a v i o u r of the system, we are g o i n g to see i n t e g rated h a r d w a r e b u i l t into the p r o d u c t to facilitate measurement and so be able to monitor the p e r f o r m a n c e w i t h little o r zero o v e r h e a d . We are going to see selective performance r e p o r t g e n e r a t i o n , and we are going to see d y n a m i c performance information and monitor i n g of key resources, so that information can be made a v a i l a b l e to a user to p e r m i t him to manage his system in line w i t h some o v e r a l l strategic d i r e c t i o n that has known cost p e r f o r m a n c e trade-offs ~ Lastly, in performance t u n i n g , I believe that we are going to see a family of tools a v a i l - able for the design of major components. That is, the design of TP n e t w o r k s , of data bases, of m u l t i p r o c e s s i n g systems to p e r m i t the d e s i g n e r at an e a r l y stage to become f a m i l i a r w i t h the b e h a v i o u r of those elements of the system that are l i k e l y to be a system bottleneck. We are going to see system-managed p e r f o r m a n c e generation r e p o r t s , and t u n i n g controls that are made available on an open loop basis. It is conceivable that in the next five to ten years many of the t u n i n g controls can be architected into a closed loop system so that the system is able to tune itself, and at this p o i n t I refer to t u n i n g of the system in terms of allocating resources in accordance w i t h a p r e d e t e r m i n e d set of performance s t r a t e g i e s . Some of these can be d e t e r m i n e d b y the m a n u f a c t u r e r and some w i l l be d e t e r m i n e d and selected by the end u s e r . T h i s concludes my presentation on the Evaluation of Data Base Systems. 301 Concepts of System Performance Sensitivity The Problem" Find What's in the Critical Path, i.e., What's the Bottleneck A n d . . . What's the Payoff When I Remove That Bottleneck and Hit the Next One. Fig. 1 Because... There Always is a Next One Performance Measures of Goodness How Can We Talk About Performance? Thruput (Jobs/Unit Time) System Data Rate # Accesses/Sec # Terminals Supported Terminal Response Time Or Perhaps: Thruput/Rental $/Sec/Access Cost/Job Fig.2 Cost/Transaction 302 Trends in Performance Evaluation Notice the Trend from" Component or Device Productivity To System Productivity (System = Hardware + Software + Workload) To People Productivity Fig. 3 (People Productivity = Maximized Enterprise Objectives) A Representation of System Resources Q Key CH - Channel D - Device Work Demand Q Q - Queue Q Fig. 4 Transactions/Sec • • t I I i • I I 303 A W a y to Think About Bottlenecks ~ , , \ \ \ \ \ \ \ \ \ \ \ \i \ \ \ \ \ \, \ \ \ \ \ \,\ \ \ , , System Performance i CPU / / o.,,. (e.g.: System Data Rate) ~/ Fig. 5 I I I I I , 1 2 3 4 5 n Tasks SYSTEMS PERFORMANCE VS TIME FOR A TIME VARYING WORKLOAD SYSTEMS T PERFORMANCE[ j~2..~ CPUBOUND ~CHANNE J o ~ OR CPU CH Fig. 6 ~-30 L BOUND 304 TRANSACTION RATE VS TIME FOR A TIME VARYING DBDC WORKLOAD o ~RESOURCE | UTILIZATION ~ 4 X / / / I ~- l ~/~/_/ .I J I TRANSACTION RATE 2X (T/SEC) 1~6X 1~)0 ¢:. ~ / / / / DASD R.O.T.P. CPU Fig. 7 DBDC PERFORMANCE RELATIONSHIPS t j REAL STORAGE Fig. 8 TRANSACTION TRAFFIC RATE 305 The Development Process A View of the Development Process Parameters Researchand "' Develop I Configure ~ ~ . t Bits/Inch Tracks/inch " AccessTime RotationSpeed Capacity ' ]. 135 :"' 745 158 ' ~ 168 155/165 Products "1 " "t ;'j E " ng,neer 1 t t t ~vs, L Integrate ~ ~ ~ VS2 vM/370 VS2/2 CPU's Applications Fig. 9 DASD Parameter Relationships • ~ I Densityl ~ Capacity Fig. 10 ~ I Ba"d I ~ IRotatio°f - - ~ I Period Data Rate Access Time to Data 306 The Cost of A t t a c h e d Storage 160 f ----F" i -~ 1 7 I { t { ---[- 1 { ~ ~ i t 1 I 1311 120 $/MB/Month 3O5 80 ] 40I 2314 A - . . ~ . . . ot 3330 3340 3330-11 ,I 54 58 56 60 Fig. 11 62 64 66 Year of Announcement 68 70 72 74 Access Rate Characteristics 45 , 4o 3340 © I // // / / 3~ / 25 I ~ 3330 / 30 / 25 /0 / /'/" >~/~334 Accesses/$ (X 103) 20 ~ 20 p~ 15 15 I 10 /// i / 5 ~0 Fig. 12 54 314 / 2311 5 T I 4 1311 ~ 1 J ~ t 1 56 58 60 62 64 66 68 70 72 Year of Announcement 0 74 Accesses/Sec to 1200 Bytes 307 Present Storage Technologies I00 I0 1 ,I i ' .01 (Cost (C/bit) ,001 .0001 .00001 .000001 tM 10M ~0M lOB IOOB 1T Fig. 13 10 ns 100 ns 1 ~s 10 tJs 100 1 10 100 pS ms ms ms Average Access Time 1 s 10 s OBJECTIVES AND METHODS Objective • DON'T CREATE SURPRISES IN P E R F O R M A N C E LATE DISCOVERIES ARE H A R D TO CORRECT Method • DESIGN TOOLS (MODELS) TO ASK/ANSWER QUESTIONS IN A DISCIPLINED WAY • DO IT E A R L Y TO I N F L U E N C E DESIGNERS • SPECIFY A N D TRACK PATH LENGTHS • V A L I D A T E MODELS AND MEASURE TO GET SMART • WHEN Y O U ' R E S M A R T Fig. 14 DOCUMENT IT Storage Capacity (bits) 308 PERFORMANCE TOOL DEVELOPMENT ~PREDICTION ~ ~E'ASuREMENT~ MODELS MONITORS + ANALYTIC +---- + SIMULATIVEHARDWARE SOFTWARE 1 1 VALIDATE s,, I ! I - PERFORMANCEINFORMATION AND GUIDELINES Fig. 15 MAIN PROCESSING BLOCKS OF A TRANSACTION IMS/VS I TO O~ ~ MESS 4 I ~OFM ) , MESSAGE Fig.16 Q ~ ) k uEuss \ 309 IMS PATH LENGTH ANALYSIS HOW MANY INSTRUCTIONS ARE EXECUTED ON BEHALF OF AN IMS TRANSACTION? T = ( K t + K11) + ( K 2 x Q ) + ( K 3 x U) + N [ K 4 + (C x K 5 ) ] K 1.... K s ARE COEFFICIENTS REPRESENTING VARIOUS IMS AND VS RELEASES. Q = FRACTION OF INQUIRY TRANSACTIONS U = FRACTION OF UPDATE TRANSACTIONS N = NUMBER OF DATA BASE CALLS/TRANSACTION C = NUMBER OF DATA BASE lOS/CALL T = TOTAL INSTRUCTIONS EXECUTED FOR ONE IMS TRANSACTION Fig. t 7 IMS PATH LENGTH ANALYSIS IMS TRANSACTION PATH LENGTH (INST R x 103) VS MVT 154 160 176 239 162 169 185 247 114 124 148 243 114 124 148 243 12.3 m 11.8 14IMS/MVS TRANSACTIONS PER SECOND FOR 85% CPU UTI L I ZATION ON 158, 168 12 -' " ' 12,5 11.3 10 - 8.3 8 6 - • 5.5, 5.3 8.1 = 5.2 4.8 ~ 4.~ 3.6 4- 116811581 10.8 I 3,4 2 0 3 5 10 30 3 5 10 30 I O'S/CAL L 3.3 2.0 1.0 0.3 3.3 2.0 1.0 0.3 O/O I N Q U I R Y 0.5 0.5 0.5 0.5 0 0 0 0 O / 0 UPDATE 0.5 0,5 0.5 0,5 1.0 1.0 1.0 1.0 CALLS/TRANSACTION Fig. 18 310 PERFORMANCE MEASUREMENT ENVIRONMENT . . . . CHECK . . . . -tko~ ~ TEST/360 SYSTEM /\ DASD - SYSTEM - DATA BASE SIMULATED NETWORK M, CTL UNtTS/L~NE N, TERMINALS/CTL UNIT Fig. 19 100 90 IMS PERFORMANCE MEASUREMENT LINE COMPARISON tMS/VS 1.0.10NVSZ/2 10 LINES, 300 TERMINALS 2400, 4800 BAUD LINES 158 CPU 80 z O -4 70 z o < 60. N_ -J LU (/3 50/ 40- RESPONSE LU / UJ 03 3020- z o 10- n¢ 03 UJ 1 Fig. 20 2 3 4 5 TRANSACTIONS PER SECOND 6 7 311 WHAT IS PERFORMANCE A SYSTEM OF RESOURCES (CPU, CHANNEL, DASD, TP, STORAGE, PROGRAM, QUEUE, LOCKS,.. ) USE OF RESOURCES WAITING FOR RESOURCES (UTI LIZATION) (WAIT/RESPONSE TIME) DEFINE WHAT YOU MEAN BY PERFORMANCE TIMING AN IMS TRANSACTION ELAPSED TIME IMS FUNCTION INPUT WAIT FOR T.P. LINE I INPUT - OUTPUT F INPUT TERMINAL INPUT MSG HANDLING .... MSG Q MFS LOG V PROCESSING WAIT FOR MPP IMS/VS 1.0:1 370/158 4800 BAUD, 3270 R V PREPARATION OF L \ \ \ \ \ \ \ \ \ \ \ \ \ I - - - ---APPLICATION PROGRAM - - ACB APPL PGM LIB MEG Q PROCESSING PER DL/1 DATA BASES CALL DYN LOG 3x OUTPUT WAIT FOR I T.P. LINE I ,,, ~ OUTPUT MEG HANDLING Fig. 22 ~ t I -- MEG Q MFS -- ouTPuT TERMINAL - - Fig. 21 312 DBDC PERFORMANCE TUNING Supply ~ Resource <,~ Demand App L CPU WASTED RESOURCES TP - - J POOR PERFORMANCE TRANS., RATE BALANCED SYSTEM STORAGE DB DESIGN DEVICES DB CALLS TUNING > BALANCE RESOURCE SUPPLY AND DEMAND Fig. 23 DBDC PERFORMANCE TUNING Primarily concerned with: DATABASE PROFILES • TRANSACTIONS PROFILES IMS PROFILES MPP PROCESSING REQUIREMENTS HARDWARE CONFIGURATION OPERATING SYSTEM PROFILE ® TELEPROCESSlNG CONFIGURATION . OTHER and the use of tools to measure critical parameters Fig. 24 313 PRIMARY FACTORSAFFECTING PERFORMANCE/DESIGN PARAMETER TYPICAL VALUES - #TRANSACTIONS - #EXCPS/CALL - # CALLS/TRANS 1 - 50 0.1 5 5,0 - 50 Fig. 25 A DBDC TUNING APPROACH Objective • • MINIMIZE THE TRANSACTION PATH LENGTH, INVOKE PARALLELISM OF KEY RESOURCES. Method • QUANTI FY PROFI LES - TRANSACTIONS, SYSTEM CON FIGU RATION AND PERFORMANCE GOODNESS, = UNDERSTAND SYSTEM BEHAVIOR IN RESPONSE TO WORKLOAD. • USE SOFTWARE MONITORS TO QUANTIFY BEHAVIOR (4 TIME), MAYBE - HARDWARE MONITORS AND DETAILED TRACE, • DEFINE EXPERIMENTS TO UNCOVER AND ORDER BOTTLENECKS, • FORM IMPROVEMENT HYPOTHESIS, MAKE CHANGE, MEASURE EFFECT. • DOCUMENT EXPERIMENT AND RESULTS. GET SMART. Result e OPTIMUM UTILIZATION OF SYSTEM RESOURCES TO MATCH WORKLOAD. • RESOLVE DIFFERENCE BETWEEN EXPECTED AND ACTUAL PERFORMANCE. Fig, 26 314 TYPICAL CAUSES OF DBDC RESOURCE BOTTLENECKS TP RESOURCES BALANCING NETWORK LOADING SiZE OF TP BUFFERS SIZE OF MESSAGE FORMAT BUFFERS REGION RESOURCES AMOUNT OF PROGRAM LOADING STRUCTURE AND SIZE OF APPLICATION PROGRAMS DATA BASE STRUCTURE AND # CALLS ® USE OF EXTENDED IMS FUNCTIONS AMOUNT OF I/O CPU RESOURCES AMOUNT OF SYSTEM AND USER I/O ® USE OF BUFFER POOL SERVICES Fig. 27 Datensicherheit in DatenbanksFstemen Hartmut Wedekind, Technische Hochschule Darmstadt Zusammenfas.su.n~ Die Begriffe "Datenschutz","Datensicherheit" und "Datenintegrit~t" werden in der Einf~hrung gegeneinander abgegrenzt. Im ersten Haupt- teil werden die Sicherheitsmagnahmen behandelt, die sich auf technische und organisatorische Belange beziehen. Die Prozesse der Identifikation und Authentifikation, die organisatorische Bildung yon Schichten, Bereichen und Berechtigungsmatrizen sowie kryptographische Methoden stehen im Mittelpunkt der Betrachtungen. Der zweite Hauptteil befa~t sich mit Sicherheitsmodellen. Unter Sicherheits- modellen verstehen wir die sprachliche Fixierung der Sicherheitsbedingungen, um diese in ein Datenverwaltungssystem einbringen zu kSnnen. Eine Datenbank beinhaltet alle gespeicherten Daten, ein Datenverwaltungssystem alle Verfahren zu ihrer Handhabung. Wir unterscheiden deskriptive (nicht prozedurale, deklarative) Sicherheitsmodelle , die ffir Relationale Datenbanksysteme vorgeschlagen wurden, von prozeduralen Modellen, wie sie z.B. im DBTG der CODASYL-Gruppe f~r hierarchische Datenbanksysteme vorgesehen sind. 1. E..infahr.ung Die Begriffspaare und Datensicherheit Datenschutz so nahe beieinander , da6 eine vor der und Datensicherheit und D a t e n i n t e g r i t g t gegenseitige Behandlung yon Einzelheiten auf der auf der anderen einen Seite Seite liegen Abgrenzung der Begriffe erforderlich ist. U n t e r dem Thema "Datenschutz" soll die Frage beantwortet werden "Was und wovor ist zu sch@tzen" (15). Man bemfiht sich in dieser Disziplin um die Erarbei- tung yon Rechtsnormen und Organisationsvorschriften die festlegen, was aus ethischen, sozialen, wirtschaftlichen oder nachrichtendienstlichen Gr~nden nicht jedermann zuggnglich sein soll oder nicht in eine Daten- 316 bank eingebracht werden darf. Der Datenschutz ist besonders wichtig im Hinblick auf personenbezogene Daten. Aber auch for Firmendaten (z.B. Kundensta~daten, patent- oder lizenzf~hige Daten) und for Daten der 6ffentlichen Verwaltung (z.B. Daten Ober Baulandplanung) besteht ein Schutzinteresse. In Amerika ist ein Datenschutzgesetz ergangen, in Deutschland existiert ein Gesetzesentwurf der Bundesregierung. Innerhalb der Datensicherheit interessiert man sich for die Frage "Wie ist zu schOtzen"~ Wit wollen uns in dieser Arbeit beschr~nken auf die Fragen der Gew~hrleistung einer Zugriffssicherheit, dutch die unberechtigte Zugriffe abgewehrt werden. Was ein unberechtigter Zugriff ist, wird dutch Datenschutzvorschriften festgelegt. Wir klammern die physische Datensicherheit aus. Hierunter werden Probleme der baulichen Ma~nahmen in Rechenzentren, die Schl~sser- und SchlOsselverteilung, das Anbringen yon Schreibringen bei B~ndern, die Ber@cksichtigung des Feuerschutzes und die Wiederherstellung zerst6rter Dateien verstanden. Die physische Datensicherheit befa6t sich mit der Sicherung vor Datenverlust. Die Datenintegrit~t betrifft die Genauigkeit der Daten. Die Daten mOssen Integrit~tsbedingungen genOgen, die sich im einfachen Fall auf Datenfelder mit einer Datentypdeklaration beziehen; in komplizierten F~llen geht eine Integrit~tsbedingung @ber viele Dateien eines Datenflu~planes hinweg. Die Teile-Nr. einer Auftragsdatei mQssen z.B. eine Untermenge der Teile-Nr. sein, die in der Teilestammdatei aufgef@hrt werden. Abstimmkreise der kaufm~nnischen Praxis sind Integrit~tsbedingungen, die sehr komplizierter Natur sein k~nnen. Integrit~tsbedingungen sind Qualit~tsbedingungen der Datenbank. Datensicherheit und Datenintegrit~t nennt Date ( 9 ) zu recht Zwillings- probleme. Datenintegrit~t ist die Forderung nach Fehlerlosigkeit der Datei; demgegenOber orientiert sich die Datensicherheit am Zugriff. Beide Probleme erfordern die Formulierung und das Einbringen von zus~tzlichen Bedingungen. W~hrend bei der Datensicherheit die Bedingungen aus den abstrakten Normen des Datenschutzes abgeleitet werden, ber~cksichtigen die Integrit~tsbedingungen das verwendete Datenmodell und die konkreten Spezifikationen der Miniwelt der Benutzer. Im Rahmen der Datenintegrit~t wird auch das Problem des m0glichen Integrit~tsverlustes durch einen gleichzeitigen Knderungszugriff behandelt (shared access). 317 2. Sicherheitsma~nahmen 2.1 Identifikation Wenn ein Benutzer und Authentifikation zu einer Datenbank mu~ er sich zuerst identifizieren, wer er ist. Diese Identifikation sie mu~ authentifiziert tifikation Kennwort werden. (DB) Zugang haben will, mu~ auf Richtigkeit Tabelle kann noch weitere ferner vermerkt werden, werden, @berhaupt vom DB-System verwaltet. Die wie Personal-Nummer, enthalten. welche Terminale ob der Benutzer In der Tabelle kann benutzt werden d@rfen. an einem vorher heute dadurch vereinfacht und da~ maschinenlesbare Personenidentifikation einher. Die Kenn- Ausweiskarten Damit identifizierten arbeiten darf. Der Proze~ der Identifikation bei manchen Terminalen gemacht, sein, zur Iden- ist bei DB Systemen das Personenstammdaten Name und Datum der Kennworterteilung Terminal Mittel Jeder Benutzer bekommt ein Kennwort. worte werden in einer Kennworttabelle kann @berpr~ft ~berpr~ft Ein weitverbreitetes aber auch zur Authentifikation (password). so d.h.~ er mu~ dem System sagen, wird auch sicherer verlangt werden. Mit der geht im System die Terminalidentifikation Dem System wird so bekannt, wo die Terminalsitzung stattfindet. Wenn man davon ausgehen kann, da~ die Kennworte und ihre Abspeicherung geheim bleiben, so ist der Identifikationsproze~ Authentifikationsproze~. k~nnen besondere Sehr sicher, Methoden fur die Kennwortvergabe aber auch aufwendig, benutzt werden kann (one time password). zwischen dem berechtigten bleiben, Identifikation vertretbare Infermationen auszunutzen, der Identifikationsprozedur ein weiteres von Finderabdr~cken, wird. zusammen- die AnaFUr auch Identifikations- Bei einer Authentifikation die nur der Person bekannt und sind sind, die sich in als solche ausgegeben hat. Man kann zum Kennwort "0ber-die-Schulter-gucken" Zweckm~ssiger verschl@sselt und Authentifikation Verfahren sind getrennte notwendig. (infiltra- mu~ auf jeden Fall geheim Stimme oder die Unterschriftskontrolle. Authentifikationsprozesse werden. in den Terminalbetrieb wenn sie nicht in einer Geheimschrift wirtschaftlich (2o) f@hren Benutzer und dem System einschalten fallen zu lassen, w~re die 0berpr~fung Beispiel Petersen und Turn Die Kennworttabelle Eine weitere M6glichkeit, lyse der maschinellen werden. das nur einmal auch nicht vor solchen Eindring- die sich mit einem Terminal tion between the lines). ein wird, eingef~hrt ist ein Kennwort, aus, da~ diese Art der Kennwortvergabe lingen sch@tzt, auch gleichzeitig Damit die Annahme realistischer angeben m~ssen. kann auch dieses ist es deshalb, Allein schon durch ein Kennwort allgemein bekannt da~ das System dem Benutzer, der 318 einen Zugang w~n~cht~ eine Frage stellt~ die nur dieser beantworten kann. Auf Vo~schlag von L. Earnest empfiehlt Hoffman (16, S. 92) wie folgt vorzugeheno Beim "log-in" identifiziert der Benutzer sich; er bekommt daraufhin vom System eine Pseudozufallszahl ange- boten, die wir x nennen wollen. Durch eine einfache Transformation T, die vom Benutzer im Kopf durchzuf@hren ist, wird eine Zahl y ermittelt. Das Ergebnis y = T(x) wird eingegeben. Das System vollzieht ebenfalls die Transformation T(x) und pr~ft, ob das Ergebnis tats~chlich y ist. Ein potentiel!er Eindringling kann hSchstens x und y sehen. Die Transformation T ist fur ihn kaum in Erfahrung zu bringen, wenn die Prozedur im Rechner geschQtzt ist. Die Prozedur ist aber gesch~tzt, da nur der Zugriff hat, der authentifiziert worden ist. Es kann f@r T z.B° die folgende Transformation vorgeschlagen werden, die kaum yon einem Dritten ermittelt werden kann: T(x) = ( ~ i - t e Ziffer yon x) 2 + (Stunde des Tages) i=ungerade Es werden also die Ziffern auf den ungeraden Stellen summiert. Die Summe wird quadriert und zur Stunde des Tages addiert. Die dargestellte Methode zur Authentifikation ist sehr einfach und wenig aufwendig. Sie hat dar@ber hinaus den Vorteil, da~ die Kennwort- tabelle jedermann bekannt sein kann, da sie zur Identifikation ben6tigt wird. Geheim bleiben mu~ lediglich T(x). Weitere Methoden zur Authentifikation werden yon Evans u.a. Purdy (21) vorgeschlagen. (10) und Beide Verfahren ~hneln sich sehr stark und bauen auf Erkenntnissen auf, die innerhalb der Kryptographie schrift oder Chiffrekunde, kryptos=geheim) (Geheim- entwickelt wurden. Wegen der gro~en Bedeutung der Kryptegraphie f~r die Sicherheit von DBSystemen werden wit in einem gesonderten Abschnitt auf diese Verfahren eingehen. Auf die erw~hnten Verfahren von Evans und Purdy, die auf der Methode der "Ein-Weg Chiffre" (one way cipher) yon Wilkes (27) aufbauen, sell hier in diesem Rahmen nicht eingegangen werden. 2.2 Schichtungen, Bereichsbildungen und Berechtigun~stabellen Es gibt drei einfache Strukturen, um ein Sicherheitssystem zu organisieren. Es handelt sich um die Schichtung bildung oder Sektionierung der Zugriffsberechtigung (stratification), die Bereichs- (compartmentalization) und die Anordnung in einer Berechtigungstabelle (authorization taSle). Bei der Schichtung werden die Benutzer im Hinblick auf die Zugriffsberechtigung im Sinne einer Hierarchie in Gruppen eingeteilt. Die Schichten der Daten oder die Benutzergruppen erhalten z.B. von oben 319 nach unten die folgenden Bezeichnungen: stren~ geheim I. Schicht geheim 2. Schicht ~treng vertraulich 3. Schicht vertraulich 4. Schicht nicht klassifiziert 5. Schicht Eine Person, die z.B. Zugriff zu streng geheimen Daten hat, um diese zu sehen, zu l~schen oder zu ~ndern, hat auch Zugriff zu Daten in darunter liegenden Schichten. Kann eine Person hingegen nur zu streng vertraulichen Daten zugreifen, so bleiben die Schichten "streng geheim" und geheim" f@r sie unzug~nglich. Allgemein gilt: Eine Person darf nur zu den Daten der Schicht, f@r die sie klassifiziert wurde, und zu Daten in darunter liegenden Schichten zugreifen. Die Schichtung yon Personen und Daten aus Gr~nden der Sicherheit stammt aus dem milit~rischen Bereich. In zivilen Sicherheitssystemen ist diese Sicherheitsorganisation ungebr~uchlich. Aber auch im milit~rischen Bereich kombiniert man h~ufig die Schichtung mit der Bereichsbildung, die im Englischen "compartmentalization" heigt. Bei der Bereichsbildung werden die Daten in disjunktive Teilmengen zerlegt. Eine Teilmenge oder ein Bereich (Sektion) wird einer Person oder auch einer Personengruppe zugeordnet. Daten d@rfen nur genau einmal in einem "Bereich" vorhanden sein. Das Sicherheitssystem mug gew~hrleisten, da6 zwischen den Bereichen Sperren liegen, die nicht durchbrochen werden k6nnen. Martin (19,S.151) sieht die Bereichsbildung als eine vertikale Aufteilung der Daten. Die Schichtung wird yon ibm auch horizontale Aufteilung genannt. I z~ 4~ © In einem vertikalen Bereich sind bei Personengruppen auch horizontale Schichten denkbar. Diese Form wird h~ufig bei milit~rischen Sicherheitssystemen vorgefunden. Auch hier m6chte man, da6 eine 320 Person nur Zugang zu den Daten hat, die yon ihr auch wirklich gebraucht werden. Friedman @4,S.269)nennt die Bereichsbildung eine Um- setzung des milit~rischen Postulats des "Need-To-Know". Jeder soll nur das wissen, was er wirklich benStigt. Eine sehr bekannte Anwendung der Bereichsbildung ist die speichergeschgtzte Aufteilung des Zentralspeichers ffir Einzelprogramme. beim Multiprogrammingbetrieb, Das Betriebssystem gew~hrleistet dag in einem Programm nicht der Speicher- bereich eines anderen Programms adressiert werden kann. Unterstfitzt wird das Betriebssystem dabei hgufig hardwaremggig durch Begrenzungsregister (base limit register). Ein Register dieser Art nimmt eine Basisadresse und die Bereichslgnge auf. Durch Vergleich der Programmadressen mit dem Registerinhalt kann eine Bereichsdberschreitung ent- deckt werden. Die dritte Form der einfachen Strukturen f~r ein Sicherheitssystem ist die Berechtigungstabelle. Kennwort, die Personal-Nr. tigung. Die Tabelle enthglt das und ein n-bit-langes Feld for die Berech- Ist das i-te Bit eine I, so ist ein Zugriff zum Sicherungs- objekt D i erlaubt, bei O wird der Zugriff verwehrt. auch hgufig Benutzerprofil (user security profile) Die Tabelle wird genannt. Der Nach- teil ist, da~ die bin~re Regel "entweder Zugriff oder kein Zugriff" gilt. Bei Datenbanksystemen wird diese Tabelle hgufig als Matrix ausgebildet, wobei die Zeilen die Benutzer und die Spalten die Sicherungsobjekte darstellen. Eine Berechtigungsmatrix soll an einem Beispiel erkl~rt werden, da~ in einer ~hnlichen Form auch bei Conway u.a. zu finden ist. Wit gehen aus v o n d e r Relation PERSONAL GHT, LMB, VST), die auSer der Personal-Nr Personaldaten Leistung (8, S.212) (PNR, LSTG, (PNR) die sehr sensitiven (LSTG), Gehalt (GHT), letzter medizinischer Befund (LMB) und Vorstrafen (VST) enth~it. Die bereits behandelte Kennworttabelle dient zur Zei!enidentifikation. LSTG ! GHT Kennwort 13 C 151 R 74 Q 028 R,~ Bemerkung R~W R,W R R,W R R N N Personalchef Organis.-Chef 43 F 9 7 4 R N N N N Programmierer 14 Z 234 R N N R,W N Mediziner 28 R 862 N R R R R Statistiker R = Lesen erlaubt, W = Ver~ndern erlaubt N = Weder Lesen noch Ver~ndern erlaubt 321 Damit in einem System die Berechtigungsmatrizen entities nicht zu speicheraufwendig for Mengen von werden, wird empfohlen, Zonen und Kategorien zu bilden ( 19. S.6). Eine Zone ist dabei die Zusammenfassung mindestens zweier Mengen yon Sicherungsgegenst~nden. Aus den Mengen Verkaufsteil, Einkaufsteil und Fertigungsteil wird die Zone Teil. Eine Kategorie ist die Zusammenfassung mehrerer Attribute. Aus Kunden-Name, Wohnort und Umsatz kann die Kategorie "Kundeninformation" entstehen. Eine weitere Reduktion des Speicheraufwandes ist die Bildung yon Benutzergruppen. Alle Mitglieder einer Benutzergruppe haben genau gleiche Zugriffsrechte. Zwecks leichter sprachlicher Unterscheidung wird eine Benutzergruppe, die aus sicherungstechnischen Gr~nden gebildet wird, yon Friedman (14.S.269)"Clique" genannt. Zonen, Kategorien und Cliquen sind drei sehr einpr~gsame Begriffe. Gegen~ber den Schichtungen und Bereichsbildungen l~Bt die Berechtigungsmatrix schon die Darstellung von wesentlich subtileren Sicherheitsbedingungen zu. Die Sicherheitsbedingungen h~ngig sein. Eine tabellarische d@rfen jedoch nicht wertab- Darstellung in der Form des Daten- schemas "Matrix" ist dann nur noch sehr schwer m6glich. einer sprachlichen gangen werden. Es mu~ zu Formulierung der Sicherheitsbedingungen Eine Bedingung ist dann wertabh~ngig, @berge- wenn Attribut- auspr~gungen zu ihrer Formulierung ben6tigt werden. Man kann wertabh~ngige Sicherheitsbedingungen yon beliebiger Komplexit~t angeben. Die Vorschriften des Datenschutzes verlangen h~ufig die Einhaltung nur einfacher wertabh~ngiger Bedingungen, wie z.B. die Person) nalsatzbedingung.Wertunabh~ngige Bedingungen k6nnen zur Obersetzungszeit, wertabh~ngige Bedingungen erst zur Ausf~hrungszeit fiberpr@ft werden. Wertabh~ngige Bedingungen sind sehr zeitaufwendig. 2.3 Umgehung der Sicherheitsvorkehrungen In diesem Abschnitt werden Methoden zur Umgehung der Sicherheitsma~nahmen beschrieben. Sicherheitsma~nahmen Gleichzeitig wird die Frage behandelt, welche gebraucht werden, um mit Vorsatz arbeitende Eindringlinge abzuwehren. Bei den Methoden der Eindringlinge handelt es sich um "Schurkereien", die den "naiven" und "rechtschaffenen" Benutzer sehr esotorisch anmuten. F@r viele Angriffe der Eindringlinge ist die Verschlfisselung der Datenbank im Sinne der Kryptographie eine wirkungsvolle Gegenma~nahme. Die Attacken und Verteidigungsma~nahmen auf ein DV-System sind in vorz@glicher Weise in dem viel beachteten ~) d.h. jeder darf nur seinen eigenen Personalstammsatz lesen. 322 Aufsatz yon Peterson und Turn (20) dargestellt. Die Ziele eines vors~tzlichen Eindringens sein: I) Gewinnung yon Information, in ein DB-System k~nnen 2) Herausfinden, welches Informations- interesse ein Benutzer hat, 3) ~ndern und Zerst6ren yon Information, 4) Kostenlose Nutzung von Resourcen des Systems oder Nutzung von Systemresourcen auf Kosten eines anderen. Von Peterson und Turn werden die Methoden (2~ zum vors~tzlichen Eindringen in das System in zwei Kategorien eingeteilt. Es wird yon passiver Infiltration gesprochen, wenn der Eindringling sich auf irgendeine Weise in das DV-System einschaltet, um zu wissen, was vor sich geht. Mine aktive Infiltration liegt dann vor, wenn der Eindringling entweder Systemressourcen nutzen will oder gezielt Informationen gewinnen, ~ndern oder zerst6ren will. Die Methoden der passiven Infiltration sind das Anzapfen yon 0bertragungsleitungen (wiretapping) vom System zum Terminal und das Anbringen von Sonden (electromagnetic pickups) Die 0bertragungsleitungen in CPU und diversen Speichern. gelten als der Teil des Gesamtsystems, der am leichtesten ver!etzbar ist (20,S~291). Nach Peterson und Turn setzt man sich gegen diese beiden aufgefOhrten Angriffe am besten durch eine Verschl~sselung in eine Geheimschrift zur Wehr. Dem Eindringling wird dann aufgebOrdet, die Chiffre zu "knacken". Wenn der Aufwand (work factor) zum Brechen der Chiffre gr6Ser ist als der Wert der gewonnenen Information, lohnen sich diese Angriffe nicht. Diese Aussage ist sehr abstrakt, da zwar der Aufwand zur Codebrechung nicht aber der Wert der Information fur einen Eindringling abgesch~tzt werden kann. Im Hinblick auf die aktive Infiltration k6nnen die folgenden Methoden aufgez~hlt werden: 1)'~asquerading'. Der Eindringling hat sich z.B. ~ber ein Anzapfen der Leitung das Kennwort eines Benutzers besorgt und "maskiert" sich nun mit diesem. Durch VerschlOsseln kann verhindert werden, dan der Eindringling durch Anzapfen das Kennwort erf~hrt. Das VerschlOsseln und Entschl~sseln mug selbstverst~ndlich am Terminal stattfinden. 2) "Browsing" (Schn@ffeln). Der Eindringling ist ein r e c h t m ~ i g e r der den Identifikations- und Authentifikationsproze~ Benutzer, erfolgreich passiert. Er versucht jedoch Daten zu lesen oder zu ver~ndern, zu denen er nicht zugreifen darf. Eine gut funktionierende Zugriffskontrolle ist die beste Abwehr gegen diese Art der Infiltration. 323 3) In die 0bertragungsleitung zwischen Benutzer und System wird vom Eindringling ein eigenes Terminal eingebracht. WNhrend der rechtm~Sige Benutzer am Terminal sitzt, kann sich folgendes ereignen: a) Der Eindringling l~scht das "sign-.off" Kommando und f~hrt fort, im Namen des Benutzers sein Terminal zu bedienen. Dieser glaubt, da$ die Terminalsitzung beendet sei. b) W~hrend da~ Terminal des rechtm~Bigen Benutzers inaktiv ist, schaltet sich der Eindringling ein, um mit der Datenbank zu arbeiten ("between the lines"). c) Der Ei~idringling sucht sich die spezielle Information aus dem Verkehr zwischen dem rechtm~Bigen Benutzer und dem System aus, ver~ndert diese und l~St die modifizierte, fehlerhafte Information zum Terminal des Benutzers 0bertragen. ("piggy- back entr~'). Die VerschlNsselung is t fNr diese drei F~lle die beste Gegenwehr. 4) Diebstahl eines auswechselbaren Datentr~gers. Neben der physischen Absicherung durch speziell verschlie~bare R~ume ist auch bier die Verschl@sselung zu empfehlen. 5) Die Eindringlinge sind Systemprogrammierer mit Detailkenntnissen auf dem Gebiet des S~eicherschutzes, des Programmierens gierten Modus und des Betriebssystems. sind n a t u r g e m ~ die gef~hrlichsten. im privile- Eindringlinge dieses Typs Sie k6nnen absichtlich undichte Stellen fn Systemprogramme einbauen (trap doors) oder sich yon Zeit zur Zeit den Zentralspeicher herausdrucken lassen. Die Systeme sind heute so kompliziert, da~ nur ein Team yon Eindringlingen erfolgreich arbeiten kann, was einen gewissen Schutz darstellt, da ein ganzes Team sich nur in seltenen F~llen auf eine "Scburkerei" dieser Art e i n l ~ t . Durch das Protokollieren gewisser Operationen, wie zum Beispiel das Herausdrucken des Zentralspeichers, kbnnen nachtr~glich unzul~ssige Eingriffe bekannt werden. Peterson und Turn nennen diese Ma~nahmen "tNreat monitoring"; sie messen ihnen eine groSe Bedeutung 5ei. Besonders schwer zu erkennen sind Angriffe, die durch S~ftware-Modifikationen, 0bersetzern, Vor@be~setzern, z.B. durch Nnderungen yon Texteditoren etc. zustande kommen. Da fas.t alle Programme durch andere Programme verarbeitet werden, stellt Bayer (I,S.78) zu recht den Grundsatz auf : Kein Programm i~t sicherer als diejenigen Programme, durch die es bearbeitet wird". 324 Systemprogrammierer mit Detailkenntnissen k~nnen auch die Methode des "eingepflanzten Satzes" benutzen, um die Chiffre schneller zu brechen. Der Eindringling bringt Klartextfragmentein die Datei und sp~rt dann die Chiffre zur Codebrechung auf. Insbesondere Bayer (I) hat auf diesen Vorgang aufmerksam gemacht. 2.4. Krypto~raphische Methoden Kryptographie ist die Lehre yon der Erzeugung eines Geheimtextes aus einem urspr~nglichen Text und v o n d e r Wiedergewinnung eines urspr@nglichen Textes aus einem Geheimtext. Chiffrieren; Der erste Vorgang heist sein Umkehrung wird Dechiffrieren genannt. Andere Bezeichnungen f~r "urspr~nglichen Text" und "Geheimtext" sind "Klartext" und "Kryptogramm" oder "Chiffre". Eine Chiffre ist eine unverst~ndliche Folge yon Schriftzeichen. Die Sicherheit eines Chiffrier- verfahrens beurteilt man nach dem Widerstand oder Aufwand f@r den unberufenen Eindringling. (work factor) Im einem kryptographischen Code, der keine Chiffre ist, k6nnen Teile der Schriftzeichenfolge yon einem Dritten zwar verstanden, aber nicht richtig gedeutet werden. Es werden ~6rter und ganze Satzteile ziemlich willk~rlich ausgetauscht. Wie dieser Austausch durchzuf~hren ist, wird in einem W6rterbuch, das Codebuch geannt wird, festgehalten. Bei kryptographischen Codes soll in der Re- gel auch eine Datenkompression erzielt werden. Wegen des Speicheraufwandes~ der durch das Codebuch verursacht wird, kommen kryptographische Codes f@r DB-Systeme nicht in Betracht. Wir ben6tigen algorithmische Chiffrier- und Dechiffrierverfahren und keine tabellarischen. Es k6nnen drei Arten yon algorithmischen Verfahren unterschieden werden: a) Ersetzungsverfahren (substitution methods), b) Versetzungsverfahren (transposition methods) c) Block-Chiffrierverfahren Die Kryptographie hat eine !angeGeschichte. (block cipher methods) Liebende und Diebe haben ihre Verbindungen immer so gut es eben ging verheimlicht, bemerkt Feistel (13~S.21),um in scherzhafter Weise auf die vorwissenschaftliche Kryptographie einzugehen. wurde die Kryptographie Erst etwa Mitte des vergangenen Jahrhunderts langsam zu einer Wissenschaft. Der geheime Nach- richtenaustausch bleibt jedoch bis tier in dieses Jahrhundert hinein auf Bleistift und Papier beschr~nkt. Durch den Computer hat die Krypto- graphie dann einen neuen, kaum erwarteten Aufschwung genommen. Alle historischen Anmerkungen, die wir im Verlauf der Darstellung machen werden, sind aus dem ber@hmten Buch von Kahn "The Codebreakers" a) Ersetzungsverfahren Bei diesem kryptographischen Ve~fahren wird ein Zeichen (18). 325 des Klartextes setzt. durch ein Zeichen Im Gegensatz Identit~t zum einfacheren eines Klartextzeichens setzen gber eine Tabelle hang algebraisch stungsfRhigkeit der Chiffre Versetzungsverfahren nicht erhalten. 0der algorithmisch, durchf~hren. haben die additiven kommen. aus dem Alphabet er- bleibt die Man kann das Er- d,h. in diesem Zusammen- Innerhalb der Datenbank-Kryptographie Substitutionsverfahren bei gro~en Datenmengen wegen ihrer hohen Lei- eine besondere Bedeutung be- Von den additiven Verfahren wollen wir hier die Verfahren "Addition modulo q" oder die Vign4re-Vernam-Chiffren Im frNhen 16. Jahrhundert hat der Benediktinerm6nch Nberhaupt erste gedruckte Buch @ber Kryptographie Trithemus beschreibt Buchstaben Trithemus das ver6ffentlicht. in diesem Buch eine quadratische als Elemente, Matrix mit deren Zeilen von oben nach unten jeweils um eine Position versetzt werden. Er benutzte Schl@ssel zum Chiffrieren in einem additiven Im sp~ten 16. Jahrhundert wurde diese wieder aufgegriffen hervorheben. und verbessert. wird das im folgenden dargestellte genannt. diese Matrix als Substitutionsverfahren. Idee von Blaise de Vign~re Historisch nicht ganz korrekt Verfahren Vign@re- Verfahren Klartext ..... O I 2 3 A B C D ..... 25 Z O A A B © D ..... Z I B B C D E A 2 C C D E F ..... , 25 Z Z Schl~sselmatrix A B ..... o . B ° ° . , . ° . ° , , . . . ° ° o . C ..... Y f@r die Vign~re-Chiffre Die Spaltenbezeichnungen Die Zeilenbezeichnungen (A,B,C, Klartext wir als Chiffrezeichen etc) gelten f@r den Klartext. werden f@r den Schl@ssel rezeichen wird im Schnittpunkt Wenn einem D i m . zwischen ben6tigt. Zeile und Spalte ein B im Schl~ssel E. Beim Dechiffrieren Das Chiff- gefunden. gegen~bersteht, so linden geht man umgekehrt vor. 326 Wir w o l l e n "klassische" an e i n e m B e i s p i e l Gegeben Chiffrierung mit der V i g n @ r e - M e t h o d e demonstrieren: sei der K l a r t e x t :"KEIN VERPC4TER" u n d d e r S c h l ~ s s e l "KAISERBALL" Klartext : K E I N V E R R A E T E R SchlOssel: K A I S E R B A L L K A ! Chiffre: U E Q F Z V S R L P D E Z W e n n wir den B u c h s t a b e n wie dasi~derABb, prozess A bis geschehen als A d d i t i o n f i n d e n w i r dann, Z die ist, modulo Zahlen 26 a u f f a s s e n . 4 = 4 + 0 = I + I = 8 + 8 = 16 = 16 rood 26 = Q 4 rood 26 = E N + S = 13 + 18 = 31 = + I = 17 + 8 Im v e r g a n g e n e n 25 5 rood 26 = F 25 m o d = Jahrhundert ind6chiffrable". Chiffre = vorhanden der S c h l O s s e l tor e r z e u g t Je l ~ n g e r ist. nur e i n m a l wird, = Z m a n die V i g n @ r e - C h i f f r e ~ 3 ), k a n n kein Nicht zeigen, groSes und d u r c h periodischen desto "le c h i f f r e d a 6 das B r e c h e n Problem zu b r e c h e n benutzt der k e i n e der S c h l O s s e l , 26 nannte Tuckerman mit C o m p u t e r m e t h o d e n Geheimtext Beispiel zu m O s s e n : 26 = U E + A = R 25 z u o r d n e n , FUr das obige ohne die M a t r i x b e n u t z e n K + K = 10 + !0 = 20 = 20 m o d 0 bis so k a n n man den C h i f f r i e r u n g s - ist, w e n n dieser genOgend ist ein G e h e i m t e x t , einen wenn Zufallszahlengenera- Pseudozufallszahlen produziert. schwieriger wird es, die C h i f f r e zu brechen. Der a m e r i k a n i s c h e Vign@re-Chiffre angewendet. zum e r s t e n Mal Vernam zu c h i f f r i e r e n , und Nachrichteningenieur das Gilbert auf e i n e n d i g i t a l i s i e r t e n sah sich v o r die A u f g a b e aus in e i n e m D ~ g i t a l c o d e 25 = 32 Z e i c h e n dargeboten s u c h u n g e n w a r eine A d d i t i o n V e r n a m hat modulo gestellt, Datenstrom ein A l p h a b e t :fOr F e r n s c h r e i b e r wurde. 1917 die Das E r g e b n i s bestand, seiner Unter- 2. Beispiel Dechiffrieren: Chiffrieren: Klartext: 0 100 SchlSssel: O ] 0 1 O'10 Chiffre: I'I 1 I O O' I O I' O O O I I'O ] O O I' Chiffre: O O O I 1 'O 1 O O I' Schliissel:O Klartext: I O I O'I O ] O I' O ] O O ]'J 1 J O O' 327 Die Addition modulo 2 und die logische Operation sind identisch. Das "kxklusive Mit dem "Exklusiven ODER" hat eine eindeutige Inverse rung spielt in der Datenbank-Kryptographie ist eine spezielle "Exklusives ODER" ODER!' wird auch wieder dechiffriert. Die Vernam-Chiffrie- eine besondere Rolle. Sie Chiffre nach dem "Addition modulo q" Verfahren, yon Tuckermann Vign@re-Vernam-System oder abgek~rzt V-V-System das genannt wird. b) Versetzungsverfahren Bei der Anwendung yon Versetzungsverfahren Zeichens es ~ndert sich lediglich die Position. gewM~rt, bleibt die Identit~t zieren z.B. eines der vielen Versetzungsverfahren, des Kinder prakti- wenn sie ihren Namen von hinten nach vorne hinschreiben. kine gebr~uchliche Methode f@r eine Versetzungstransformation das Aufteilen des Klartextes ordnung in einer Matrix. wendet. Der so entstandene in BlScke mit einer anschlie~enden An- Auf die Matrix werden einige Operationen ange- Text wird dann wieder nach einer bestimm- ten Regel in die lineare Form gebracht. leine angewendet ist zu wenig Sicherheit Da Versetzungsverfahren al- bieten - eine H~ufigkeits- analyse der Zeichen kann schon zum Ziele fUhren - wollen wir sie nicht genauer beschreiben. c) Block-Chiffrierverfahren kin Chiffrierverfahren, umwandelt, das n Informationsbits wird Block-Chiffrierverfahren Transformation in n Chiffrebits genannt. Die Bit-Vernam- ist in diesem Sinne ein Block-Chiffrierverfahren. man heute jedoch den Begriff "Block-Chiffrierverfahren" Zus~tze benutzt, Eigenschaften I~ krsetzen dann denkt man an ein Verfahren, Wenn ohne weitere das die folgenden hat: (substitution) weise hintereinander und Versetzen angewendet. Prozessen eine "multiplikative" Chiffrierverfahren 2. Im Gegensatz (transposition) Da das "Hintereinanderschalten" Verkn~pfung bedeutet, auch Produkt-Chiffrierverfahren zu den "Bit fur Bit" oder "Buchstaben Ersetzungsverfahren, genannt. fur Buchstaben" Bits oder Buchstaben bestehen, Block aus n Bits als Ganzes behandelt, anderen abh~ngig von werden Block- bei denen in der Chiffre keine Abh~ngigkeiten zwischen den einzelnen Symbolabh~ngigkeit werden stufen- wird der ks liegt in der Chiffre eine vor. Wenn ein Symbol in der Chiffre von einem ist, kann dutch einen 0bertragungsfehler Symbol ein ganzer Block nicht mehr dechiffrierbar sein. bei einem 328 3. Die Stufe "Substitution" ist eine nichtlineare Auf diesen Eigenschaften Transformation. werden wir im folgenden noch genauer eingehen. Block-Chiffrierverfahren f~r die Rechneranwendung sondere yon Feistel entwickelt sich durch gro~e Sicherheit zu brechen, (13), (12) und aus, d.h. der Aufwand, ist betr~chtlich. (18). Zwischen den Weltkriegen entwickelt, zeichnen eine Block-Chiffre wurden schon unter dem Namen ADFGVX wurden dann Chiffrier-Maschinen die mit einem Pseudozufallschl@ssel Rechneranwendungen .Sie Block-Chiffrierverfahren yon der deutschen Armee im ersten Weltkrieg benutzt wurden insbe- (11) arbeiteten. Erst die haben Mitre der 60-iger Jahre wieder das Interesse f@r Block-Chiffrierverfahren Chiffrierverfahren belebt(13,S.99)~ir bier so darstellen, werden die Block- als w~rden die Transformationen yon Get,ten besorgt. Wir beginnen mit der Beschreibung der nichtlinearen Substitution. ~eT~t S 2" "e, n~3 2" =8 2 z Trens~orrnator: yen hoher 11o 4 Basis (8) in 7 ~_ ,l Nichtlineare J Substitution Es wird angenommen, gegeben wird. nach Feistel da~ der Klartext addiert, einen ganzen Eingabeziffernblock Das Substitutionsger~t zwei Basistransformatoren. in der Darstellung zu n = 3 Bits einge- durch einen beliebigen Ausgabeziffern- 2 auf und verwandelt geht umgekehrt bits gibt es 2 n Substitutionszeichen, durch Basistransformation im wesentlichen Der Eingabetransformator zur Basis ist auch als ein gutes Verfahren aus nimmt einen Block ihn in eine Oktal- vor. F~r die n Eingabe- die auf nichtlineare gefunden werden. Bevor eine Substitution eine Null oder sondern man substituiert (Ger~t S) besteht zahl. Der Ausgabetransformator kannt. in Bl~cken Es werden nicht wie beim Vernam-Verfahren eine Eins zu den Eingabeziffern block. (13, S.25) Weise Die Basistransformation zur Erzeugung yon hash-Adressen im umgekehrten Sinne stattfindet, bewird 329 eine Versetzung vorgenommen, die man sich durch eine einfache Verdrahtung realisiert vorstellen kann. Eine m6gliche Verdrahtung wird in der Abbildung gezeigt. Imsgesamt gibt es 2ni = 8! = 40320 solcher Verdrahtungen (Hardware) oder Tabellen (Software). F~r n = 3 oder 4 ist ein Substitutionsger~t mit einer beliebigen Verdrahtungsm~glichkeit noch zu realisieren. Durch Eingeben gewisser "Tricknachrichten" kann der Eindeingling jedoch die Chiffre brechen (13,S.26).Das ist nicht mehr der Fall, wenn z.B. n = 128 gew~hlt wird. Mit 128 Ein- und Ausg~ngen mdBte der Eindringling 2 1 2 8 ~ I O 3 8 verschiedene Bl6cke eingeben, um die Arbeitsweise des Ger~tes zu erforschen. Das ist nicht durchffihrbar. Diesem Vorteil steht der entscheidende Nachteil gegenOber, dab das Verfahren for, groBe n, z.B. f~r n = 128, technisch nicht realisiert werden kann. Dies ist die Ursache, die dazu fOhrte, dab man mehrere k leinere Substitutionsger~te, z.B. mit n = 3, in einer Stufe parallel anordnete. Da eine Substitutionss-tufe mit mehreren S-Ger~ten noch sehr leicht ~berwunden werden kann, wurde ein Versetzungsger~t nachgeschaltet. Da in einem Versetzungsger~t eine Permutation yon Symbolen vorgenommen wird, spricht man auch yon einem Permutationsger~t (P-Ger~t). Kaum zu brechen ist eine Chiffre dann, wenn mehrere P-Ger~te und S-Ger~ter hintereinander angeordnet werden. Das "Durcheinanderwirbeln" der Bits durch eine komplizierte Transformation H ist so groB, da~ eine Inversion ffir jemanden, der nicht eingeweiht ist, kaum noch nachvollzogen werden kann. Die Abbildung zeigt, wie aus einer einzigen I durch mehrmaliges nichtlineares Substituieren und Permutieren (Versetzen) eine "Lawine" von Ein~en entstehen kann. Bei der Dechiffrierung werden die Stufen in umgekehrter Richtung durchlaufen. Hm das Block-Chiffrier- verfahren f~r den Gegner noch schwieriger zu machen, k~nnen f~r die SGer~te jeweils andere Schl6ssel vorgegeben werden, so dab wir zwischen den Ger~ten $I, $2, .......... $20 unterscheiden k6nnen. - - P __. S __, !~ S --b ~ S _ _ -- ~ o-~ o ~ o--J o ~ ~ -/ s p S o ~ o ---P 0 O ~ Block-Chiffrieren nach Feistel (13, S.IOO) S . ~ 1 Chiffre 330 3. Sicherheitsmodel!e in Datenbanksxstemen 3.1 Deskriptive Modelle Relationale DB-Systeme unterscheiden sich yon anderen DB-Systemen dadurch, da~ mehrere Entwurfsebenen deutlich erkennbar sind. Es k6nnen drei Ebenen erw~hnt werden, die in einem relationalen DBSystem mindestens vorhanden sein mfissen: I. Die logische Ebene, 2. Die Ebene der Zugriffspfade, physischen Abspeicherung 3. Die Ebene der auf den Ger~ten. In DB-Systemen, die auf dem Relationenmodell der Daten basieren, wird davon ausgegangen, da~ Sicherheitsbedingungen zur Miniwelt des Benutzers gerh6ren und da~ ihre Formalisierung deskriptiv auf der logischen Ebene erfolgen mu~. Sicherheitsbedingungen und auch Integrit~tsbedingungen werden prinzipiell genauso behandelt wie Anfragen (queries) der Benutzer. Sicherheits- und Integrit~tsbedingungen k6n- nen sehr komplex sein° Es werden keine Sicherheitsorganisationen Form von Schichtungen, Bereichsbildungen in oder Berechtigungstabellen vorausgesetzt. Gegeben sei eine relationale Datenbank mit n Relationen RI, R2,...,R n. Mit Hilfe einer Datenmanipulationssprache (DML) mit dem Relationenmodell als Grundlage (etwa ALPHA ( 7 ) , SQUARE ( 3 ) oder SEQUEL ( 4 ) ) ist es m6glich, logische Bedingungen so zu formulieren, da~ das Ergebnis der Qualifikation die Beantwortung einer Anfrage ist. In ALPHA wurde die Beantwortung einer Anfrage Zielliste (target list) genannt. In Anlehnung an Chamberlin ( 5 ) und Boyce ( 2 ) wollen wir das Ergebnis der Anfrage "Sicht" (view) nennen. Eine Sicht ist eine Relation, die nicht abspeichert, sondern nur durch logische Bedingun- gen definiert wird. "Views" sind virtuelle Relationen. Wenn die Basis- relationen (base relations) RI, R2... , R n ver~ndert werden, die tats~chlich zur Abspeicherung anstehen, so werden auch die abgeleiteten, virtuellen Reiationen einer ~nderung unte~orfen. Eine "Sicht" ist im Sinne yon Chamberlin ~ 5) ein dynamisches Ergebnis einer Anfrage, in der auch built-in Funktionen wie COUNT, SUM etc. benutzt werden dfirfen. Der Begriff "Sicht" stammt yon Boyce ( 2 ) . Er wird eingef~hrt, um insbesondere die Sprache SEQUEL auch als Datenbeschreibungssprache (Data Description Language, DDL) herauszustellen. Im Sinne einer DDL liegt nur dann eine vollst~ndige Sicht vor, wenn alle Beschreibungsparameter einer Basisrelation deklariert werden. Sichten im Sinne einer DDL sollen an den Basisrelationen LAGER_GUT CNR, BEZ, MENGE, PREIS) LIEFERUNG (NR~ LNR, DATUM) 331 veranschaulicht werden. Es bedeuten: NR = Nummer des Lagergutes, LNR = Lieferanten-Nummer, BEZ = Bezeichnung. Die formale Beschreibung yon LAGER-GUT lautet: DEFINE LAGER GUT TABLE AS: NR(SCOPE=POSINT,REPR=DEC BEZ(SCOPE=ALPHA, (6)) DOMAIN=NAME, REPR=CHAR(~) MENGE(SCOPE=REAL,DOMAIN=SCHUETTGUT,UNITS=TONNE, REPR=FLOAT DEC (15,4) PREIS(SCOPE=REAL, DOMAIN=GELD, UNITS=DM PRO TONNE, REPR = FLOAT DEC (8,2) KEY=NR, ORDER=ASCENDING TNR INDEX,BEZ DEFINE LIEFERUNG TABLE AS: NR LIKE LAGER GUT.NR LNR LIKE NR EXCEPT(REPR=DEC (8)) DATUM (SCOPE=POSINT, REPR=DEC (3)) KEY NR,LNR ORDER=DESCENDING NR, ASCENDING LNR Eine Tabelle wird zun~chst duTch ihren Tabellennamen, die Namen der Spalten und - wenn notwendig - duTch die Ordnung der Zeilen beschrieben. Eine Spalte (Attribut) kann kenntlich gemacht werden duTch einen Namen, einen WerteSereich eine Vergleichbarkeit (SCOPE) z.B. positive ganze Zahl (POSINT), (comparability DOMAIN), die aussagt, ob zwei Werte vergleichbar sind, eine Ma~einheit Darstellung (UNITS) z.B. Tonne und eine (REPR, representation). Die Begriffe SCOPE,UNITS und REPR erkl~ren sich selbst. Zu bemerken ist nuT, da~ gewisse Standard- auspr~gungen w ie POSINT, REAL etc. fur SCOPE und etwa FIXED BINARY, DECIMAL etc. f@r REPR bereitgestellt werden sollten. Was die Vergleichbarkeit anSetrifft, so sind zwei Werte nut dann vergleichbar, wenn sie aus Spalten stammen, fur die der Parameter DOMAIN gleich ist. "Sch~ttgQte~" kSnnen nut mit "SchQttgfitern" und "Geld" kann nur mit "Geld" verglichen werden~ In~besondere dann, wenn mit zwei Relationen ein Ver- bund gebildet werden soll, spielt die Vergleichbarkeit eine gro6e Rolle. Zur Veranschaulichung des Begriffes "Sicht" im Hinblick auf Sicherheitsbedi~gungen wird im folgenden ein Beispiel in der Sprache SEQUEL dargestellt: Aus der Basisrelation "LAGER GUT" soll die Sicht "VERKAUFS GUT" entwickelt werden. Efn Verkaufsgut unterscheidet sich dabei yon einem Lagergut dadurch, da6 das S~h~ttgut in S~cke abgepackt und nach St~cken 332 gez~hlt wird. Die Sicht "VERKAUFS_GUT" enth~it das Lagergut in einem verkaufsf~higen~ abgepackten Zustand. Ein Stfick, d.h. ein Sack soll I/]00 Tonnen wiegen. DEFINE VERKAUFS GUT TABLE AS: LIKE LAGER GUT EXCEPT MENGE.UNITS=STUECK, (MENGE. DOMAIN=SAECKE, PREIS.UNITS=DM PRO STUECK) Ober die folgende Deklaration wird dem System die Umrechnung Tonnen in StNck mitgeteilt. DEFINE CONVERT (TONNE TO STUECK): ]/]O0 TONNE CONVERT kann als Umrechnungsroutine aufgefa~t werden. Um die Um- rechnung selber braucht sich der Benutzer der Sicht "VERKAUFS_GUT" nicht zu kfimmern. Um die abgeleitete Sicht "VERKAUFS GUT" zu einer Sicherheitsbedingung zu vervollst~ndigen, mu~ gekl~rt werden, was dem Benutzer einer Sicht alles erlaubt ist. Wit stellen uns dabei vor, da$ wit der Eigentfimer der Basisrelation LAGER_GUT sind und volle Verf0gungsgewalt 0ber diese Relation haben, Der Begriff "Eigentfimer" wird in diesem Sinne definiert. Chamberlin u.a. (5) schlagen nun die folgenden Verfflgungsrechte vor: I) GRANT (Gew~hren): Hiermit wird verffigt, da~ der Benutzer der abgeleiteten Sicht diese Sicht jedem beliebigen anderen Benutzer zeigen darf. Anders ausgedrfickt: Die Weitervergabe der Leseerlaubnis wird gew~hrt. 2) REVOKE (Widerrufen): Die Verf~gung Nber die Sicht wird wider- rufen. 3) DESTROY (Zerst~ren): Hiermit wird die Erlaubnis zum ZerstOren der Sicht erteilt. 4) INSERT (Einf6gen): Es wird zugestanden, Tupeln in die virtuelle Relation (Sicht) einzuffigen. 5) DELETE (L~schen): Es dfirfen Tupeln gelOscht werden. 6) UPDATE (Modifizieren): Attribute dfirfen ver~ndert werden. Eine Sicht und Verffigungsrechte machen eine Sicherheitsbedingung aus. Dem Benutzer mit der Nr. X sollen die folgenden Rechte zugestanden werden: GRANT VERKAUFS GUT TO BENUTZER.BNR = 'X' (GRANT='NO' REVOKE = 'NO' DESTROY = 'NO' INSERT='NO', DELETE = 'NO', PREIS.UPDATE='YES ') 333 Wir wollen die folgenden Merkmale fur Sichten und Sicherheitsbedingungen herausstellen: I. Verschiedene Sichten k6nnen hierarchisch Geschlechtern" aufgebaut werden. von "Sohn-Sichten". die Basisrelation, die dem Eigent@mer m~ssen der Grundsatz nur beim "Ur-Vater" 2. Die Verf~gungen "Vater-Sichten" geh6rt. Alle ~nderungen werden in den diversen Es wurde bereits im allgemeinen Wir unterscheiden von An der Wurzel des Baumes steht der "Ur-Vater", einer "Vater-Sicht" sichtigt. wie "Generationen 'Sohn-Sichten' aufgestellt, ~nderungen vorzunehmen. oder Berechtigungen, die einer "Sohn-Sicht" zustehen, immer im Umfang kleiner oder gleich sein dem Umfang, "Vater-Sicht" hat. Die "Sohn-Sicht" duzlert werden k6nnen. in ber@ck- den die mu~ aus der "Vater-Sicht" (Nemo plus juris transferre potest, pro- quam ipse habet) . 3~ Beim Widerrufen einer Sicht werden die zugeh~rigen "Sohn-Sichten" zerst~rt. Aus der Sicht VERKAUFS_GUT soll in einem weiteren Benutzer Y eine Sicht entwickelt werden, Beispiel fur den damit er nur die Felder NR und BEZ les.en kann. Wit s chreiben in der Sprache SEQUEL: DEFINE LI~STE FOR YI TABLE AS: S'ELECT NR, BEZ FROM VERKAUFS GUT Es folgt dann: GRANT LISTE FOR Y I TO BENUTZER.BNR = 'Y' (GRANT = 'NO', REVOKE = 'NO', DESTROY = 'NO' INSERT = 'NO', DELETE 3.2 Prozedural 'NO', UPDATE = 'NO') Modelle Bevor wir zu einer kurzen Darstellung ~bergehen, wie sie im DBTG-Report yon prozeduralen (6) und bei Hoffman Modellen (16) aus~uhrlicher zu finden sind, verweisen wir hier auf die Sicherheitsmerkmale des Systems gehen IMS (siehe (26)). Alle Zugriffe ~ber einen zugeordneten Subschema beschreibt. Sicherheitsstufe her eine besondere Es kann yon einem Programm nur zu Daten die durch ein PCB sensitiv Programm darf zweitens nur solche Operationen PROC-OPTIONS Block), der ein Damit wird yon der Architektur vorgesehen. zugegriffen werden, zu einer IMS-Datenbank PCB (Program Communication des PCB definiert wurden. gemacht wurden. ausf~hren, Die kleinste Das die im Felde Sicherungseinheit 334 ist ein Segment~ Eine Berechtigungsmatrix ~ber PROC-OPTIONS implementiert werden. kann mit Hilfe des IMS Weitere Sicherungsm~glich- keiten bietet das IMS an, wenn der Datenkommunikationsteil liert wird. Ms kann dann z.B. spezifiziert werden, instal- da~ Programme hei Angabe von Kennworten aufgerufe n werden darfen und dab auch ein Kennwort ist, um an Terminalen benutzen FUr er£orderlich nut gewisse Kommandos zu d@rfen. Datenbanksysteme Hoffman entwickelte sind das System DBTG "Formulary Model"(16) (6) und das yon zwei wichtige Repr~sen- tanten far Sicherheitsmodelle, in denen die Sicherheitsbedingungen yon einer Zentralstelle in Sicherheitsprozeduren "Formularies" grammierer genannt) direkt umgesetzt eine Tr~gersprache, Wit wollen uns in unserer werden mSssen. (von Hoffman Dabei steht dem Pro- etwa PL/] oder COBOL zur Verfagung. Darstellung auf die wichtigsten Merkmale im System DBTG beschr~nken. In der Sprache Dateneinheit des DBTG ist die kleinste~ ei~ "data-item". ein "data-item" nicht weiter aufl~sbare Ein Name und eine Auspr~gung aus. Mehrere "data-items" machen mit einem gemeinsamen Namen sind ein "data-aggregate". Mehrere tes" bilden einen "data-record", der wie alle Einheiten bezeichnet sein mu~. Die wesentliche DBTG i st der "set". (Vatersatz) pr~gung Struktur oder "aggrega- im Datenmodell und solche, die "member" (Sohnsatz) (set occurrence) besteht hei~en. "member" einem oder mehreren"member records". In einem "set" ist ein "record" Typ entweder (aber nicht beides)• in mehreren "sets" sein. Eine weitere Speicherbereich "areas" zerlegt. yore mathematischen Organisationseinheit Ein "schema" Benutzer ausgew~hlte formalisierte ist grob gesprochen eines "schema". Das "sub-schema" eine vom definiert durch einen PCB festgelegt ist auch im DBTG dutch das "sub-schema" Beschrei- Ein Programm kann nur die in einem "sub-schema" einer ersten Stufe gew$[hrleistet. "set" Der gesamte wird in eine Anzahl yon bezeichneten Ein "sub-schema" Wie im I]MS, in dem das "sub-schema" darstellbar. Begriff ist eine "area". enth~it die gesamte Untermenge zu solchen Daten zugreifen, beschriebeno "owner" oder Damit sind dann Netzwerkstrukturen einer DBTG-Datenbank bung einer Datenbank. Ein "member (Integrit~ts- Ein "member record" kann "member record" Ein "set" ist streng zu unterscheiden (Menge). Eine Aus- aus genau einem "owner record" kann ohne einen "owner record" nicht existieren bedingung). des In einem set gibt es solche S~tze, die "owner" eines "set" record" mit keinem, hierarchische "data-items" eine gewisse Sicherheit sind. wird, in Die zweite Stufe wird im folgenden ] 335 Sicht -Konzept aufgefa~t werden. Das System DBTG unteystNtzt alle angef~hrten "schema". Sicherheitsprozeduren Organisationseinheiten Um nur das Wesentliche auf das "record-Niveau" im Hinblick auf vom "data-item" hier darzustellen, bis zum werden wir uns beschr~nken. Vom DBTG werden zwei Klauseln bereitgestellt: Zu einem PRIVACY LOCK im "schema" und zum anderen ein PRIVACY KEY im Programm des Benutzers. Mit Hilfe der Datenbeschreibungssprache "schema" kann ein "Schlo~" werden, das mit Hilfe eines "SchlNssels" in der Datenmanipulationssprache einfach~ten Fallen wird eine Prozedur aufgefa~t ein Kennwort, aufgerufen. als Kennwort COBOL als Tr~gersprache 'KAIS~RBALL' Namen PERSONAL Schema zun~chst den Fall ist. In der Sprache soll im folgenden das Programm des Benutzers verfNgt, m6ge lauten: darf einen beliebigen Nur wer @ber das Satz mit dem l@schen: : RECORD NAME I S PERSONAL i PRIVACY LOCK FOR DELETE I S Pro~ramm: Im in komplizierteren Wir stellen aufzufassen sein. Die Sicherheitsbedingung Kennwort (PRIVACY KEY), formuliert (DML), geSffnet werden kann. Fall i~t der Schl~ssel dar, da~ der Schl~ssel (DDL) f@r das (PRIVACY LOCK) vor die Daten "geh~ngt" 'KAISERBALL' IDENTIFICATION DIVISION PRIVACY KEY FOR DELETE OF PERSONAL RECORD I S 'KAISERBALL' PROCEDURE DIVISION DELETE PERSONAL Das DBTG sorgt darer, da~ die Zeichenkette PRIVACY LOCK mit der Zeichenkette 'KAISERBALL' KEY verglichen wird. Bei Gleichheit tion DELETE PERSONAL im Hinblick erlaubt. Bei Ungleichheit auf einen beliebigen der Zeichenkette Wir kommen nun zu dem komplizierteren in der Klausel in der Klausel PRIVACY der Zeichenketten drNckt und ein Fehlerstatus-Anzeiger AusfShrung 'KAISERBALL' wir die OperaSatz PERSONAL wird die Operation unter- gesetzt. Fall, der dann vorliegt, einer Operation nicht yon einem Kennwort wenn die sondern yon Bedin- 336 gungen abhRngt° Es soll von dem folgenden Ein Personalleiter Beispiel wenn 1. der Inhalt des Feldes Gehalt kleiner Die erste Bedingung Schema: mit dem Namen DELTA realisiert. wird angenommen, der I. Bedingung gleich 15 ist. sei in der Prozedur mit dem Namen GAMMA und die in der Prozedur den Anweisungen werden: als 10.OOO ist oder wenn 2. der Inhalt des Feldes Abteilungs-Nummer zweite ausgegangen darf SRtze mit dem Namen PERSONAL dann 16schen~ da~ der Personalleiter In den folgennur aufgrund einen Zugriff wiinscht. RECORD NAME IS PERSONAL PRIVACY LOCK FOR DELETE Z IS PROCEDURE GAMMA OR PROCEDUgE DELTA Programn: IDENTIFICATION DIVISION Z PRIVACY KEY FOR DELETE OF PERSONAL RECORD IS PROCEDURE GAMMA PROCEDURE DIVISION DELETE PERSONAL Mehrere PRIVACY LOCKS k6nnen dutch ein OR zusammen definiert gepr~ft. DBTG. werden. Die Prozedur Nbergibt Die Prozeduren Hoffman den Parameter selber werden 'Ja' oder im DBTG nicht weiter 'Nein' dem spezifiziert. (16) jedoch gibt die Struktur einiger Sicherheitsprozeduren Man kann davon ausgehen, Aktionen in einer Anweisung In einer Prozedur wird die Zugriffsberechtigung im Sinne des vorherigen Abschnitts eine Prozedur fNgung gestellt werden mu~. Bei einem komplizierten mit vielen verwickelten Sicherheitsbedingungen eine beachtliche aufgeb~rdet, w~hlt wird. Arbeit an. daS fNr jede Sicht mit den dazugeh6rigen zur Ver- Sicherheitssystem wird der Installation wenn die prozedurale L6sung ge- 337 Literaturverzeichnis I) Bayer, R. und Metzger, J. U.: On the Encipherment of Search Trees and Random Process Files, Institut f~r Informatik, TU M~nchen, M~rz 1975. 2) Boyce, R. F. und Chamberlin, D. D.: Using a structured English query language as a data definition facility, IBM Research Report, Rj 1318, San Jos~, Dec. 10, 1973. 3) Boyce, R. F. u. a.: Specifying Queries as a Relational Expression: SQUARE, in: Proc. ACM SIGPLANSIGIR Interface Meetings Gaitherburg, Maryland, Nov. 4-6, 1973. 4) Chamberlin, D. D. u. a.: A Structured English Query Language, in: Proc. ACM SIGFIDET Workshop on Data Description Access and Control, Ann Arbor, Mich., May I-3, 1974. s) Chamberlin, D. D., Gray, J. M und Traiger, I. L.: Views, Authorization and Locking in a Relational Data Base System, IBM Research Report, Rj 1486, Sam Jos~, Dec. 19, 1974. 6) CODASYL DATA BASE TASK GROUP (DBTG) REPORT, April 1971, erh~itlich bei IFIP Administrative Data Processing Group, 40 Paulus Potterstraat, Amsterdam. 7) Codd, E. F.: A data base sublanguage founded on the relational calculas, in: 1 9 7 1 A C M S~GFIDET W~rkshop on Data Description, Access and Control, San Diego, Nov. 11, 1971, S. 35-68. 8) Conway, R. W., Maxwell, W. L. and Morgan, H. L.: On the Implementation of Security Measures in Information Systems, in: Com. ACM, Vol. 15 (1972), No. 4, S. 211-220. 9) Date, C. J.: An Introduction to Data Base System, Addison Wesley, Reading (Mass.), 1975. 10) Evans, A. und Kantrowitz, W.: A User Authentication Scheme not requiring Secrecy in the Computer, in: Com~ ACM, Vol. 17 (1974), No. 8, S. 437-442. 11) Feistel, H., Notz, W. A. und Smith, J. L~: Cryptographic techniques for machine to machine data communication, IBM Research Report, RC 3663, Yorktown Heights, Dec. 27, 1971. 12) Feistel, H.: Cryptographic coding for data-bank privacy, search Report, RC 2827, Y~rktown Heights, 1970. ~BM Re- 13) Feistel, H.: Chiffriermethoden und Datenschutz, in: ~BM Nachrichten, Teil I, 24. Jg. (1974), Heft 219, S. 21-26. Teil 2, 24. Jg. (1974), Heft 220, S. 99-102. Obersetzung aus dem Englischen: Feistel, H., Cryptography and computer privacy, in: Scientific American, Vol. 228 (1973), No. 5, S. 15-23. 338 14) Friedmann, T. D.: The authorization problem in shared files, in: IBM Systems Journal, Vol. 9 (1970), No. 4, S. 258-280. 15) Hentschel, B., Gliss, H.~ Bayer, R. und Dierstein, B.: Datenschutzfibel, Verlag J. P. Bachem, KSln 1974. 16) Hoffman, L. J.: Computer and Privacy, A Survey, in: Computing Surveys, Vol. I (1969), No. 2, S. 85-103. 17) IBM-Brosch~re: The Consideration of Physical Security in a Computer Environment, Oktober 1972, Fr. Nr. 6520-2700-0. 18) Kahn, D.: The Codebrakers, McMillan, New York, 1967. 19) Martin, J.: Security, Accuracy and Privacy, Prentice Hall, Englewood Cliffs, 1973. 20) Petersen, H. E. und Turn, R.: System Implication of Information Privacy, in: AFIPS Conf. Proc., Vol. 30 (1967), SJCC, Thompson Book, New York, S. 291-3OO. 21) Purdy, G. B.: A High Security Log-in Procedure, in: Com. ACM, Vol. 17 (1974), No. 8, S. 442-445. 22) Stonebraker, M. und Wong, E.: Access Control in a Relational Data Base Management System by Query Modification. University of California (Berkeley) Research Report ERL-M438, 14 May, 1974. 23) Tuckerman, B.: A Study of the Vigen@re-Vernam Single and Multiple Loop Enciphering Systems, IBM Research Report, RC 2879, Yorktown Heights, May 14, 1970. 24) Turn, R.: Privacy Transformation for Databank Systems, Rand Corporation, Forschungsbericht f~r die National Science Foundation, AD-761563, March 1973, ver~ffentlicht auch in: AFIPS Conf. Proc. Vol 42 (1973), S. 589-601. 25) Turn, R.: Privacy and Security in Personal Information Databank Systems, (Prepared for the National Science Foundation), Rand Corporation, R-IO44-NSF, March 1974. 26) Wedekind, H. und H~rder, Th : Datenbanksysteme II, Bibl.iographisches rnstitut, Mannheim, i975. (noch unver6ffentlicht) 27) Wilkes, M. V.: Time-Sharing Betriebe bei digitalen Rechenanlagen [Obersetzung aus dem Englischen), Carl-Hanser Verlag, M~nchen, 1970. 28) Scherf, J.A~: Computer and Data Security, A Comprehensive Annotated Bibliography, MIT Project MAC, January 1974 On the Integrity of Data Bases and Resource Locking Rudolf Bayer, Technische Universit~t M~nchen Abstract The problem of providing operational integrity of data bases as opposed to operating systems is discussed. Techniques of resource locking, mainly individual object locking and predicate locking, are surveyed, improved, and unified. An efficient on-line transitive closure algorithm for deadlock discovery is presented and analyzed. Several strategies for preventing indefinite delay of transactions are proposed. Phantoms and the need for predicate locking are surveyed and reconsidered. Several strategies for handling phantoms are proposed: one without predi- cate locking and two in ~aich predicate locking is needed for writing transactions only, and in which individual object locking sufficies for pure readers. !. INTRODUCTION PrQviding data base integrity means to guarantee the correctness of the data (more precisely their accuracy, consistency, and timeliness) through 1) the proper operation ~f the hardware, 2) the proper operation of the software, as well as 3) the proper use of the system. This paper only covers part of the software aspect of integrity. The problem of guarding data bases against hardware failures has been covered extensively by M.~. Wilkes EWil 72]. Proper use of the system is mainly concerned with quality control in data acquisition and with prevention of accidental or mieschievous misuse, i.e. with the security of computer systems. 3~ As opposed to many other computing environments~ data bases give rise to especially high integrity requirements for at least the following reasons: i) Longevity~ Even rare errors will in the long run lead to a certain contamination and degradation of the quality of a data base. pletely purging ~rroneous data and all their consequences Com- from a data base is difficult. 2) Limited repeatability: covered, Even if data or processing errors are dis- it may be impossible due to time constraints, unavailability or useless to rectify the situation unavailability of the correct source data, of a correct system state preceding the fault. 3) The need for immediate and permanent practice o~ten used elsewhere, availability: This prevents a namely running a program and then checking by careful inspection and analysis whether the result is or at least "looks ~' right, correcting and rerunning the program otherwise° 4) Multiaccess: Data bases are manipulated by many users with probably quite different quality standards. It is infeasible to completely entrust the quality control to these users and difficult to track the source and the proliteration of errors. II. SEMANTIC AND OPERATIONAL INTEGRITY We wish to distinguish between semantic and operational integrity of data bases: By semantic intesritv we mean the compliance of the data base contents with constraints data. Semantic derived from our knowledge about the meaning of the integrity might be enforced by allowing on certain data only a limited set of precisely specified meaningful operations, by adopting a set of programming and interaction conventions, by dynamically checking the results of updates, or by proving for each program manipulating the data base, that the semantic integrity constraints are satisfied. Little is known about how to describe, such semantic integrity constraints. to enforce~ and to implement Still we believe, that semantic integrity is of a much more basic nature than operational integrity, and that a better understanding of semantic integrity would greatly 341 help the solution of other integrity problems as well. been described integrity via the defi- in [Bay 74] to obtain semantic nition of "aggregates" of a set of carefully which limit the processing designed operations An approach has of data to the use directly associated with the data. Operational action" Integrity: For the purpose for external data base manipulation. primitive "actions". operating A transaction is a sequence arising from the activity ahd of more of the system: i) the effort to schedule transactions far as possible [EGLT 74], individual data objects 3) the induced problems lock discovery, sources and of preemption INTEGRITY integrity there is at least a brute force, solution for operational of transactions. integrity, completely namely to avoid and to sequence This is unsatisfactory have been developed for data base applications. Presequence Processes: must be presequenced base transactions will be needed. systems. why they are not As usual in this field we use as the analogon for "transaction". is adopted from G.C. Everest in time for many reasons, for use in operating We will survey these solutions briefly and indicate, satisfactory of dead- of ~e- to resolve deadlocks. SYSTEMS AND OPERATIONAL and better solutions or and to of deadlock among locking transactions, parallelism between transactions the execution sets of data objects or shared use by a transaction of deadlock prevention, As opposed to semantic straightforward in particular in parallel as [CBT 74], accordingly, from transactions OPERATING [KiC 73], (also called'~ecord~' in [CBT 74] and'~ntitieg' in [EGLT 74], for exclusive lock those resources to be processed [Eve 74], 2) the need to acquire resources, "process" let a "trans- for scheduling purposes Most work to date concerned with integrity has been limited to those integrity problems ili. of this discussion [EGLT 74] be the unit of processing The list of techniques [Eve 74]: Processes potentially competing for resources and must execute one after the other. For data it is often not known a priori, which data resources This means that any two transactions competing and must be sequenced. As a consequence, will be potentially no parallelism is 342 possible and we have the unsatisfactory brute force method mentioned before. Still presequencing transactions, may be useful for other purposes~ e.g. through time-stamping, like preventing indefinite delay of transactions by introducing an aging mechanism to increase the prior~ ities of transactions. ~reempt Processes: This technique relies on discovering deadlocks after they have occurred. It then terminates one of the processes (or backs up to an earlier state) involved in the deadlock, the resources locked by that process are freed. As we shall see s this technique plays an important role in data base locking, too, butthere its application is much more difficult due to the large number of transactions and re~ sources involved. This makes deadlock discovery and preemption quite complicated and expensive~ ~Fegrder all System Resources: their resources The processes are then required to claim according to such a total order. It has been Shown~ that more general than linear orders, e.g. hierarchical sufficient to support a deadlock-free locking strategy orders, are [Ram 7~]. In data bases the resources are data objects, which often do not have such an natural order. Furthermore a process might not be able to claim re- sources according to such an order, data dependent [EGLT 74], [CBT 74]. Preclaim needed Resources: claim all the resources since his needed resources might be Before starting to execute, a process has to it will ever need. Typically they are specified on the control cards preceding a job or job-step, and the process is not started until the operating system has granted to it all the requested resources~ This is probably the most common technique for assign- ing non-sharable resources. In a data base environment this technique requires considerabl~ modifications to become feasibles Claiming resources may itself be a com- plicated and lengthy task requiring searching through large areas of a data base. These searches should run concurrently if possible. Deadlock Prevention Algorithms: They often rely on too special proper~ ties of resources - like Habermann's banker's algorithm [Hab 69] - or on too special models of computation ~ like Schroff's algorithm to be generally applicable here. [Sch 74]~ 343 IV. THE CHAMBERLIN, BOYCE, TRAIGER METHOD In [CBT 74] a technique is proposed to provide operational integrity for data bases. The technique can be considered as a modification and combination of several methods described in section III. Integrity of the data base must be guaranteed at the beginning and again at the end of a transaction, it may be - and generally must be - violated by the single actions. Due to the potential interference of two or more transactions executing in parallel, transactions must lock certain parts of the data base for exclusive or shared use. The scheme proposed in [CBT 74] therefore requires each transaction to lock all its resources (parts of a data base, e.g. individual records or fields of records) during a so-called "seize phase" before starting the "execution phase". During the seize phase the data base must not be modified by the seizing transaction and therefore i) preemption of locked resources from a transaction still in its seize phase is feasible, and 2) backing a transaction in its seize phase up to wait for the preempted resource is rather easy. Once a transaction has started its execution phase, it is not allowed to claim more resources, thus no backup will be necessary. At the end of an execution phase a transaction must free all its resources before starting a new seize phase. The seize phase may be a rather complicated task, thus seize phases of transactions should be run in parallel. This raises the deadlock problem again as usual: Let tl, t2 be two transactions, source rl already locked by tl must wait t2 trying to seize re- until rl is freed by tl. But since resources are not locked in any particular order, tl may wish to lock first rl, then r2. If tl successfully seizes rl and t~ successfully seizes r2, then a deadlock has occurred. Such deadlocks must be dis- covered and a resource must be preempted from a transaction involved in the deadlock, say r2 from t2, causing t2 to wait for tl on r2. In [CBT 74] an aging mechanism is attached to transactions to avoid dead~ lock due to indefinite delay of transactions. It is then shown in [CBT 74] that the scheme described is deadlock-free in the sense, that each transaction will eventually be processed. per algorithms seize phases, This requires, of course, the pro- for discovery of deadlocks between transactions in their~ for preemption or resources, and for backing up trans~ 344 actions to certain points within It is now clear~ that fication their seize phases. the scheme proposed and combination in [CBT 74] is a shrewd modi- of the following: 1) Try to preclaim needed resources. 2) If 1) would lead to deadlock, 3) S u p e r i m p o s e a presequencing timestamping The deadlock discovery really applicable, at most Fig. a l g o r i t h m mentioned since it requires t! may be waiting many transactions The resource as useful to release resources. in [CBT 74] is not tl may wait for In the CBT-scheme, to be released by arbitrarily ... twk as the result of arbitrarily many from t1: tl w a i t i n g for other transactions. state of a t r a n s a c t i o n A i = {ril, CLOSURE ALGORITHM that a t r a n s a c t i o n for resources tw1~ tw2, of resources 1: T r a n s a c t i o n - e.g. through and to avoid deadlock AND AN 0N-LINE T R A N S I T I V E one other t r a n s a c t i o n preemptions scheme for transactions delay of transactions. SOME M O D I F I C A T I O N S however, resources. - to enforce an aging m e c h a n i s m due to indefinite V. preempt ~..~ r i t i is determined by the set } qi of resources which it has so far acquired, B i = {(r~', ) .... ±i til ' where (r i , tlj . ) indicates and the set of request pairs (r i' ~ t i )} Pi Pi that resource r.' l~ is desired from transaction d ti ° Any t r a n s a c t i o n 0 t i for which B i is non-empty is in a wait state. S45 We may then define the wait relation wcTxT where T is the set of transm actions, such that (ti, tj)Ew iff 3 r : (r, tj)EB i. We say that t i is waiting for tj (to release r). t i may be waiting several transactions as noted above, and for several resources for from the same transaction. The wait graph ~w is the directed graph a w = (T, w). Deadlock discovery finding pairs amounts to finding cycles (t~ t) in the transitive in G w or, equivalently, (but not reflexive) closure w to + of w. Thus deadlock exists iff 3t£T:(t,t)Ew +. Maintaining maintained quite O(n.m) w is trivial, in any case. expensive, the since something like the Bi's will have to be + Calculating w from w is, on the other hand, best known a l g o r i t h m s requiring O(n 3) [War 62] [Bay 74] steps, where n is the number of nodes in G or and m the W number of arcs. It would be sufficient, closure algorithm however, to have a good "on-line" transitive + since w need only be partly modified as arcs are added to and deleted from w. More precisely, "on-line" transitive rithm solving the following Given w, w +, closure algorithm means an algo- problem: calculate W' ~ W '+, where w' = wU{(ti,tj)} or w' = w~{(ti,tj)}. Although it is quite simple to add an arbitrary arc and calculate w' + from w , i t s e e m s i n t h e g e n e r a l case notoriously difficult to delete an arbitrary + arc and calculate w '+ from w . No better alternative to be known than c a l c u l a t i n g w '+ f r o m s c r a t c h , + ignoring the fact that we already have w . i.e. starting + seems w i t h w' a n d 346 For our purpose~ we need a highly simplified version of the on-line algorithm for the transitive closure only. By closer i n s p e c t i c n one ob- serves, that we need to delete sinks of G and the arcs leading into w sinks of G w only. This is the decisive p r o p e r t y w h i c h makes the diffi+ + cult general p r o b l e m t r a c t a b l e in our special case. To get w' from w now simply amounts to d e l e t i n g or zeroeing out a column from the B o o l e a n + matrix describing w . We w i l l now develop such an on-line t r a n s i t i v e detail. closure a l g o r i t h m in more We assume that transactions w i l l wait in queue q(r) for an al- ready locked r e s o u r c e r. The first t r a n s a c t i o n on a queue has successfully locked cution phase. blocked). Fig. (or seized) the resource, All other t r a n s a c t i o n s We indicate this as in Fig. it may be in its seize or exe- on the queue are waiting (or 2. 2: T r a n s a c t i o n s w a i t i n g for r e s o u r c e r. tl has locked r, ti+ I is w a i t i n g for t i to release (or free) r; i=l,2,o..,k-i, w h e n t i e v e n t u a l l y releases r (and no p r e e m p t i o n s have occurred in the meantime), then ti+ i will seize r. Let us first (Fig. consider the state t r a n s i t i o n d i a g r a m of a t r a n s a c t i o n 3) and the operations relevant to that diagram, w h i c h a trans- action may perform: Fig. 3: The stat A t r a n s a c t i o n t. can p e r f o r m the f o l l o w i n g operations i n v o l v i n g r e s o u r c e l r and another t r a n s a c t i o n tk: 347 Seize r: vrEAi: Free r: A i :: AiU{r) ; update q(r); A i :: ¢; Vr£A i do if tk is next in queue for r then begin (r,t i) must be in Bk; B k := Bk~{(r,ti)} ; A k :: AkU{r) ; update q(r); if B k : ¢ then make t k continue to seize end; + update w and w ; Seize ~nsuccessfully: t i is still in the seize state, let t k be last in queue for r: Case i: no deadlock arises, if t i is queued behind t k in q(r): B i :: {(r,tk)) ; put t i into wait state; + update w and w ; Case 2: A deadlock would arise, if t i were queued as in Case I. This deadlock is discovered by tentatively, but not definitely queueing t i as i~ Case 1, updating + w @ and checking, whether w contains cycles. In this case t i might have to preempt r from t k. t k must be in wait state, since we have a cycle: In this situation t. should move forward in q(r) uni til it can be inserted and no deadlock q(r) arises; update accordingly. Let tz be the first transaction in q(r) (starting from t k) such that inserting t i between tz and t~_ i causes no deadlock, then we have Case 2a. If there is no such tz then we have Case 2b. Note: In [CBT 74] t. is always inserted as close to I the head of the queue as possible. vors the younger transactions analyzed. fa- and must rely heavily on an aging mechanism to prevent The processing This strategy indefinite delay. costs of this aging mechanism are not 348 Case 2a; B~ / \ := ~ B ~ { ( r , t ~ _ i ) ) j U ( ( r ~ t i ) } ; B i := {(r,t~_i)}; update q(r); t i goes into wait + update w and w ; state; Case 2b: tl cannot be executing, otherwise t i would queue behind tl according to Gase 2a. Therefore tl is seizing or waiting. We make t i preempt r from tl, i.e. we queue t i in front of tl; if BI = @, then make tl wait; BI := B10((r~ti)); AI := A1~{r}; A i := AiU{r} ; + update w and w ; Necessary Changes For the following to w and w + and Analysis of their Complexity analysis + . is represented we a~sume that w in an nxn Boolean matrix K with the meaning K~i~j]~ (ti,tj)£w +. Complexity of + the Change to w Operation Description of Operation Seize r: No change to w or w 0 Yr£Ai: Since t i frees all its resources 0(n) + Free r: at the end of its execution phase, we can remove all arcs from w, and delete (tk,t i) or zero out column i of K. For the analysis of the following operations we need two auxiliary procedures seize first. Let t i be in its state. To insert an arc + (ti,t k) into w and to update w accordingly we need the procedure INSERTI. 349 Operation Description of Operation Complexity of + the Change to w To insert (t~,t i) we need the procedure !NSERT2. INSERTI (ti,tk): Comment t i is in seize state; w :: wU{(ti,tk)}; Vt 0. : Vtz :w+:=wCU{(tj,t~)}; (tj,ti)Ew + (tk,t£)Ew + Vt~ Constant O(n 2) at worst, 0(m) average, see lager analysis :w+:=w+U{(ti,t£)}; O(n) :w :=w+U{(tj,tk)}; O(n) (tk,t~)EW + Vt. J + (tj ,ti)£w + w := w+U{ (ti,tk) } ; ~onstant INSERT2 (t~,ti): Comment t i is in seize state; w := wU{(t~,ti)} ; constant Vtj :w+:=w+U{(tj,ti)}; (tj,tz)Cw + O(n) + w := w+U{(t~,ti)}; constant Note: Since t i is in the seize state, there is no t such that (ti,t)Ew +. Consequently no cycle in w +, and therefore no deadlock can arise due to the operation INSERT2 (t£,ti). Seize r unsuccessfully: As before, let t k be last in queue for r: for j := k step -i until i do begin tentatively INSERTI (ti,tj) ; if no deadlock then begin £ := j+l; exit to perform Case 2a end end; perform Case 2b; for each deadlock O(n ~) or O(n+m) resp. 350 Operation Description Complexity of @ ........... the Chang,e ~o w of 0paration Case 2a: make last INSERTI at worst O(n 2 ) or O(n+m) operation definite; if ~ # k+i then begin INSERT2 O(n) (tz,ti) ; if ~r'#r:(r',t~_1)6B ~ then else w::w~{(t~,t~_i)} Case 2b: INSERT2 Analysis search of B~ end; O(n) (tl,ti) ; of INSERTI: Adding a single arc to w, according to INSERT1, say (ti,tk) , requires oring row k of the Boolean matrix K to all rows j with worst this part of INSERTI requires O(n 2) operations. (tj,ti)6w +. At If, however, there are m arcs in w +, then each node on the average will have m/n arcs into it and m/n arcs out of it. Accordingly the average number of operations will be O(n-(m/n)) VI. = O(mD. FOUR STRATEGIES FOR PREVENTING With the locking and preemption that a transaction is delayed deal with this problem, increasingly DELAY schemes proposed indefinitely it is still conceivable, from its execuiton we propose four increasingly costly strategies. eral strategies INDEFINITE effective, It seems quite reasonable within one system successively phase. To but also to employ sev- in order to force trans- actidns which have passed a certain age threshold into their execution phase and out of the system. Strategy t, i: Let t e be the eldest transaction° such that teW+t, with highest priority. Schedule all transactions This clearly has a tendency to speed up the processing of t e. It is easy to find those t from the + t -row of the Boolean matrix describing w . e Strategy 2! Stop all transactions in seize phases from further seizing except those t for which teW+t. Strategy 3: For all r such that t e is waiting transaction that has locked r. If t in q(r) let t r be the is seizing or waiting, r from t r and give r to t e. If t r is executing, preempt r insert t e in q(r) directly 351 behind t r. No new deadlocks can arise if we assume that all these pre~ emptions are performed together in one step. Then recalculate the new + w t Strategy 4: Stop all transactions, which are not executing from seizing further. Then apply strategy 3 for t e until t e has reached its execution phase. Then let the other transactions proceed. ~ome Oberservations on Strategies i~ 2~ 3~ 4: It is clear that all straw tegies will tend to bring t e closer to its execution phase, Strategy i can be generalized to establish a partition of the transactions into a linearly ordered set of priority classes, which can serve as the basis for a general scheduling strategy. still allow indefinite delay. Strategies i and 3 might It is easy to construct a plausibility argument, that strategies 2 and 4 will prevent indefinite delay of transactions. VII. AN ALTERNATIVE APPROACH: PREEMPTION AND PARTIAL BACKUP Although it seems feasible to maintain the basic locking and preempting mechanism proposed in [CBT 74] using the special algorithms described in the preceding sections, there is another argument supporting a more radical preemption than that proposed in the CBT-scheme: Let us assume that rl is preempted from tl by t2, which probably updates rl. Depending on the value of rl, t~ might have locked other resources ri', r1",.., already. tl to lock r1', r1", But since the value of rl changes, the decision of ... should be reconsidered. In other words, t~ should be backed up within its seize phase to precisely the state it was in just before seizing r1~ it should then be waiting for t2 on rl, and the resources r1', r1", ... locked by tl should be freed again. In such a preemption scheme a transaction tl will generally be waiting for at most one other transaction t2 on precisely one resource rl. The wait relation tlwt2 shall now mean that tl waits for the holder t2 of rl and not for the predecessor in q(rl), since we do not need to main- tain such queues. The resulting G w is obviously a forest of oriented trees, the arcs pointing towards the roots. Only roots are processing in the execution or seize phases. All other transactions are waiting. 352 Since a t r a n s a c t i o n tl is w a i t i n g for tz on p r e c i s e l y one r e s o u r c e r1~ we may label that arc w i t h rl. The f o l l o w i n g simple a l g o r i t h m s then d e s c r i b e the n e c e s s a r y operations. Seize u n s u c c e s s f u l l ~ £ Case 1, no d e a d l o c k arises: tl trying to lock r already locked by t2 means that the tree w i t h root tl, i.e. T(tl), is a p p e n d e d as a subtree to t2, the new arc b e i n g labelled w i t h r. Case 2~ d e a d l o c k arises: D e a d l o c k d i s c o v e r y is quite simple: Each s e i z i n g or e x e c u t i n g t r a n s a c t i o n is the root node of one tree. Deadlock arises p r e c i s e l y w h e n tl is also the root node of the tree in w h i c h t2 is. To find this out, just follow the arcs from t2 tO the root. In this case a cycle w o u l d be g e n e r a t e d by i n s e r t i n g an arc (tl,t2). The d e a d l o c k is r e s o l v e d by p r e e m p t i n g the r e s o u r c e r f r o m t2. P r e e m p t i o n works as follows: after r. In the process tz must free r and all resources - see the F r e e it locked operation - corresponding sub- trees of tz will be d e t a c h e d - a l l o w i n g their roots to continue seizing and the arc (t2~t3) from t2 to its father t3 in T(tl) will be deleted. The tree T'(t2) r e m a i n i n g after p r u n i n g T(tz) will be attached as a su~- tree of tl by i n t r o d u c i n g the new arc Free r': (t2,tl) w i t h label r. If t2 frees a r e s o u r c e r' either due to being backed up in its seize phase or due to f i n i s h i n g an e x e c u t i o n phase and there is an arc (t~,t2) labelled r'~ then this arc can be deleted, thereby t4 becomes a root and can proceed in its seize phase. To free such arcs one must represent these trees by data structures in w h i c h it is p o s s i b l e to follow arcs in both directions. VIII. P R E V E N T I N G I N D E F I N I T E DELAY It is p o s s i b l e that for i n d i v i d u a l t r a n s a c t i o n s t a situation similar to a d e a d l o c k might again arise due to t b e i n g p r e e m p t e d and backed up in its seize phase again and again. Strategies i and 2 of section ~I are easily adapted to w o r k for the p r e e m p t i o n and backup technique of sec~ tion VII. 353 The analogon to strategy 3 of section VI is much easier to implement now: Let t t be the eldest transaction in the system again. Assume that e is waiting for t on r and t is not executing. (If t is executing, e nothing can be done except scheduling t with highest priority until t has finished executing.) Then t e will preempt r from t and t will be backed up in its seize phase to a state just before seizing r. t e becomes a root and continues troduced. seizing. (t,t e) labelled r is in- The preemption process works precisely tion VII. The main difficulty appeared, A new arc as described in sec- of strategy 3 of section VI has dis- since we do not explicitly store w +. Instead, cycles are dis~ covered by just following the path from an arbitrary node of a tree to its root, a simple and fast operation. To prevent pathological cases of data bases changing faster than t e being able to catch up in its own seize phase, we can apply an analogon to strategy again. Instead, however, it suffices to prevent 4 of section V! that transactions enter from their seize phases into their execution phases. strategy 5. Since only finitely many transactions will Let this be are in the system at any one time~ and since each executing transaction will run only a finite time, t e will eventually be able to finish both its seize and execution phase, IX. and indefinite PHANTOMS AND PREDICATE In [EGLT 74] a technique ("predicate locking") delay of t e cannot occur. LOCKS is described to use so called predicate for locking logical, tential subsets of a data base instead of locking individual jects ("individual "phantom problem". object locking"). This technique To explain briefly, that there is a u n i v e r s e ~ what phantoms of data objects locks i.e. existing as well as podata ob~ also solves the are, let us assume (called "entities" in [EGLT 741 and "records" in [CBT 741) which are the potential data objects in the data base B. Thus B c ~ . Two transactions locked all their needed resources, add a new object r i £ ~ tl, t2 may have successfully and they may be executing, tl may to B and t 2 may add a new object r 2 E ~ t o B, such that tl would have locked r2 and t2 would have locked rl, if tl or t2 would have seen r2 or rl resp. during their seize phases.rl and r2 are called "phantoms", since they might, but not necessarily will appear in B (materialize) while tl or t2 are in their execution phases. The appearance culty, of just a single phantom, say rl,does net cause any diffi- since this has the same effect as running the transactions tl,t2 354 serially, namely in the order t2 followed by tl. In this case also t2 would not see the object rl created by tl and therefore t2 could not be able to lock rl. It is the goal of predicate locking to schedule trans- actions in parallel as far as possible under the restriction, parallel schedule is equivalent effect on the data base as - a serial schedule. a schedule is a "consistent that the to - i.e. has exactly the same total schedule", One also says that such or that each transaction sees a "consistent view" of the data base. To enforce consistent schedules each transaction t is required to lock (for read or write access) all data objects E ( t ) ~ - irrespective of whether they are in B or are just phantoms - which might in any way influence or be influenced by the effect of t on B. E(t) shall be locked by specifying a predicate P defined o n ~ relation ments o f ~ (or on a part o f - ~ , e.g. on a [Cod 70]) such that E(t)~S(P) where S(P) is the subset of elesatisfying P. Two transactions t1~t2 are then said to be in conflict, predicates PI, P2 it is true that 3r£S(PI)flS(P2) if for their and tl or t2 performs a write action on r~ Thus conflict can arise even if r is a phantom. this case tl, t2 cannot run in parallel, but must be run serially. In The order in which they are run is irrelevant for consistency. This order might be important for other reasons which are not of interest here, The main difficulties in using such a locking and scheduling method seem to be the following: i) Find a suitable predicate Pt for t. Ideally E(t) but then Pt might be too complicated. : S(Pt) should hold, If Pt is chosen in a very simple way, then S(P t) might be intolerably large, increasing the danger of phantoms~ which are really artificial phantoms. 2) The problem " S ( P I ) N S ( P 2 ) ~ " is even undecidable. may be very hard~ In general this problem Thus for practical applications and a given it is necessary to find a suitable class of locking predicates, which the problem " S ( P I ) N S ( P 2 ) ~ " is not only decidable, for but for which a very efficient decision procedure is known. For more details and a candidate class for suitable locking predicates see [EGLT 747. 3) Phantoms might turn out to be a very serious but mostly artificial obstacle to parallel processing in the following sense: phantoms in 355 S(PI)NS6P2) prohibit tl and t2 from being run in parallel. phantoms do not materialize, and if furthermore S(PI)NS(P2)OB=¢, of course, t~ and t2 could have been run in parallel. artificial obstacle phantoms But if these then, How much of an are to parallel processing seems to be un- known and can probably be answered only for concrete instances of data bases. X. A UNIFICATION OF INDIVIDUAL OBJECT LOCKING AND PREDICATE LOCKING Let us start with the crucial observation for this section: "Transactions,which are pure readers, do not need to lock phantoms". A transaction is a "pure reader", if it is composed of read actions only. Obviously for many data base applications the pure readers are a very important class of transactions. To understand our observation, consider two pure readers tl, t2 first. Since there are no write actions at all, there is no possibility phantoms to materialize, for thus they need not be locked. Phantom locking is only necessary to control the interaction with a transaction, say t3, which also performs write operations. We call t3 a "writer". Con- sider the interaction between tl and t3. Let us assume that there is a phantom rES(P~)NS(P3) such that t3 might perform a write on r. Then tl and t3 could not run concurrently, If however, if tz would use predicate locking. tl uses individual object locking and successfully termin- ates its seize phase, then tl can run in parallel with t3 provided that s(P1)ns(P3) where S(PI) = S(PI)nB, : i.e. the set of real data objects (without phan- toms) inlB which tl needs to lock in order to see a consistent view of B. But now S(PI) can be locked by tl using oQnventional "individual object locking" as e.g. described in [CBT 74] instead of predicate locking. If t~ should materialize phantoms, then running tl and t3 in parallel still is consistent and has the same effect as the serial schedule tl followed by t~. The following observation should also be claar now: To control the interaction between the writer t~ and the pure reader tl if suffices, that t 3 use individual object locking according to [CBT 74]. t~ need not lock its phantoms, since tl is not interested in phantoms anyway. We can con- 356 clude that the problem of phantoms - and therefore predicate locking - arises only between writers~ The preceding observations suggest several alternative approaches for handling the phantom problem: Strategy i - Serialize Writers: Since, as we just observed, writers, the simplest concurrently. readers phantoms Concurrency scribed only between is possible between arbitrarily and at most one writer. dual object cause difficulties solution is, not to schedule any writers Consistency locking and by handling is guaranteed deadlocks in the earlier part of this paper. many pure by indivi- and preemptions The problem to run as de- of phantoms does not arise. As mentioned readers. ficant before~ Serializing in many applications writers lo~s of concurrency with its associated ~trategy most transactions are pure in those cases should notcause a signi- and has the advantage difficulties locking is not needed. 2 - Predicate Locks between Writers: Use predicate two writers locks as described in [EGLT 74] only to determine whether t3~ t~ can run in parallel. proceed on account of his predicate After a writer which are pure readers, the individual see strategy exactly as in strategy object locking phase, i. For more details on in particular locking and individual object a more general notion of conflict Let Ul or RI be the set of objects or only read respectively Then define with other transactions, the types of locks, 3. Using predicate analogously. is allowed to locks, he then starts individual object locking to compete for further processing allows that predicate Obviously BI than that used in [EGLT 74]. including phantoms which are updated by a transaction UINRI:~ locking at thia point tl. Define Uz and R2 for t2 and U2NR2:@. :: UI~U2 Bz :: UI~R2 (X.1) B3 :: R~nU2 B4 :: RlnR2 Diagrammatically this can be shown as in Fig. 4. 357 BI B2 Updated by tl UI R21 U2 RI Read by t l B3 Fig. 4: Possible intersections B~ of update and read-only sets. For tI and t 2 to proceed in parallel with individual object locking the following conditions must hold: B~ = B2 = ¢ @ v B~ : @ (x.2) Without individual object locking the stronger condition B~=@^Bs=¢ is required in [EGLT 74]. To see that our weakend condition suffices let us assume without loss of generality that B2=¢ and B3#¢. B3 is read only by tl, but is updated by t~. Also B3 may contain phantoms which are materialized by t2. Let us assume that both tl and t2 are successful in their seize phases, i.e. while locking individual objects excluding phantoms, and then continue to run in parallel. We claim that this is equivalent to the serial schedule tl followed by t2. Since BI. and B2 are both empty, the effect of tl on B cannot in any way influence t2, thus t2 has the same effect on B if it is run after tl or parallel to tl. B3 is not empty, but tl successfully locked all the resources it needed to see a consistent view of the data base. tl may have missed phantoms materialized by t~ , thus the effect of tl will be the same as in the serial schedule tl tz. Consequently running tl and t2 in parallel is equivalent to the serial schedule tlt2, and is therefore consistent. The conditions (X.2) for tl and t2 to proceed in parallel can be gener- 358 alized for ti,t2,.oojt n to proceed concurrently. This is left to the reader. Strategy 3: This strategy sacrifices some concurrency, but is much simpler to implement than strategy 2. There a writer t i was required to perform individual object locking both in the sets U i and R i. It turns out that with the conflict condition of [EGLT 74] between writers, writers need perform object locking only within Ui, but they need not set any read locks. Assume that a writer ti first the predicates locks the sets U i and R i by specifying i and PR" l The condition for two writers t i and tj to PU run concurrently then is: s(P~) n s~P u) = ¢ s(P~) n s(P~) = , (x.3) s(P~) n s(P~) = ¢ After successfully locking U i and R i the writer then proceeds to perform individual object locking within U i by setting "u-locks" use of data objects to be updated. for exclusive These u-locks are necessary for pre- venting pure readers from reading those objects while they are being updated. Since the sets S(P$) are pairwise disjoint, there is never any possibility for conflict between u-locks of different writers. We observe that writers need not set any individual read~lo~ks, "r-locks", i j since S(PR)OS(Pu)=¢, called and conflict of u-locks of one writer and r-locks of another would not be possible anyway. Furthermore, eral r-locks of readers and writers allowed, since data objects are shareable as long as they are only read. The only potential conflict still remaining is between read-access pure readers and update-access which is not a phantom. of a writer to the same data object s, s to be read. This must happen during Several r-locks can be on s, but not both r-locks of pure readers and a u-lock of a writer. Thus if a reader lock (u-lock) of To control this we require pure readers to set r-locks on individual data objects a seize phase. sev- on the same data object would be first then a writer (reader) (writer) sets an r- trying to set a u-lock lock) on the same data object must wait for the reader (r- (writer) to re- lease s. This leads to the usual wait situations with the possibility for deadlock and the need for preemption and backup as described in the 3~ first part of this paper. If a deadlock is discovered then a reader or a writer is backed up within its seize phase for setting r-locks or u-locks resp. as described before. For simplicity we can assume that locking with the predicates PU and PR is one indivisible operation, thus deadlock between writers is not possible during this phase of predicate locking. To summarize, a writer t i proceeds as follows: i If conditions (X.3) are satisfied for all i) Lock predicates PUi and PR" other writers tj which have successfully locked their predicates P$• and P~" then proceed with step 2), otherwise wait, until PUi and PRi can be successfully locked, then proceed with step 2). Start a seize phase setting u-locks on individual data objects to be updated within S'(P~).- In case of conflict with r-locks wait or be backed up within this seize phase. 3) A pure reader performs a seize phase setting r-locks on data objects to be read. In case of conflict with u-locks the reader must wait or be backed up within this seize phase. Summarizing the main advantages of strategy 3 we observe: o Only writers use predicate locking to handle phantoms. o Concurrency between writers is possible. o Writers need an individual object locking phase for setting u-locks o Pure readers do not use predicate locking, they set r-locks during an in their update areas only. In this phase phantoms are ignored. individual object locking phase only and ignore phantoms completely. Note: Since predicate locking is now needed for writers only, it might be quite feasible to replace arbitrary predicates by a fixed partitiening of the data base or by a fixed family of subsets of ~ , whose inter n section properties are known once and for all and recorded in a Boolean matrix (intersection between two subsets is empty or not). Instead of locking predicates the above subsets are then locked by writers. Acknowledgement: I wish to thank Mr. John Metzger, with whom I had many useful discussions during the writing of this paper. 380 Bibliography [Bay 74] Bayer~ R., "AGGREGATES: A Software Design Method and its Application to a Family of Transitive Closure Algorithms". TUM-Math. Report No. 7432, Technische Universit~t M~nchen, Sept. 1974 [Bjo 73] Bjork, L.A.~ "Recovery Semantics for a DB/DC System". Proceedings ACM Nat'l. Conference [CBT 74] Chamberlin~ 1973, 142-146 D.D., Boyce, R.F., Traiger, I.L., "A Deadlock-free Scheme for Resource Locking in a Data Base Environment". formation Processing In- 1974, 340-343 [Cod 70] Codd, E.F., "A Relational Model for Large Shared Data Banks". Comm. ACM 13, 6 (June 1970), 377-387 ICES 71] Coffman~ E.G. Jr., Elphick, M.J., Shoshani~ A., "System Deadlocks". Computing Surveys 3, 2 (June 1971), 67-78 [Dav 73] Davies, C.T.~ ~'Recovery Semantics ceedings ACM NatTl. Conference for a DB/DC System". Pro- 1973, 136-141 [EGLT74] Eswaran, K.P., Gray, J.N., Lorie, R.A., Traiger I.L., "On the Notions of Consistency and Predicate Locks in a Data Base System". [Eve 74] Everest, grity". IBM Research Report RJ 1487, Dec. 30, 1974 G.C., "Concurrent Update Control and Data Base InteIn: Data Base Management (ed. Klimbie, J.W., and Koffe- man~ K.L.), North Holland 1974, 241-270 [Fos 74] Fossum, B.M., ~'Data Base Integrity as Provided for by a Particular Data Base Management'System". (ed. Klimbie, J.W., and Koffeman, In: Data Base Management K.L.), North Holland 1974, 271-288 [Hab 69] Habermann~ AoN., "Prevention of System Deadlocks". Comm. ACM 12, 7 (July 1969), 373-377, 385 [KiC 731 King, P.F., Collmeyer, A.J., "Database Sharing - an Efficient Mechanism for Supporting Concurrent Processes". AFIPS Nat'l. Comp. Conf. Proceedings 1973, 271-275 361 [011 741011e, T.W., "Current and Future Trends in Data Base Management Systems". Information Processing 1974, 998-1006 [Ram 74] Ramsperger, N., 'Werringerung yon Proze~behinderungen in Rechensystemen". Dissertation, Technische Universit~t M~nchen, 1974 [Sch 741Schroff, R., "Vermeidung von totalen Verklemmungen in bewerteten Petrinetzen". Dissertation, Technische Universit~t M~nchen, 1974 [War 62] Warshall, S., "A Theorem on Boolean Matrices". Journal ACM 9, 1 (January 1962), 11-12 [Wil 721 Wilkes, M.V., "On Preserving the Integrity of Data Bases". The Computer Journal, 15, 3 (August 1972), 191-194 D A T A EASE STANDARDIZATION A STATUS REPORT Thomas B. Steele Jr. Equitable Life Assurance Society New York, N.Y.~ USA This paper is a report on the current (1975 September) status of the Study Group on Data Base Management Systems in the United States, together with some remarks on the ISO activity in the area. While the official purpose of this Study Group is an investigation of standardization potential in the area of data base management systems, an important by-product of the work of the Group has been the development of a set of requirements for effective data base management systems. As no existing or proposed implementation of a data base management system satisfies these requirements, it is appropriate to expose these ideas as widely as possible for evaluation. Among the responsibilities Requirements Committee of the Standards Planning and (SPARC) of the American National Standards Committee on Computers and Information Processing generation of recommendations (ANSI/X3) is the for action by the parent Committee on appropriate areas for the initiation of standards development. For some time it has been evident that data base management systems are in the process of becoming central elements of information processing systems, and that there is less than full agreement on appropriate design. In addition to the existence of a number of implementations of such systems (CODASYL 1969), there are several documents generated out of the collective wisdom of some segment of 363 the information processing community which are either proposals for specific systems (CODASYL 1971) or statements of requirements (GUIDE-SHARE 1970), (CMSAG 1971). As is well known there is a debate in the community on whether existing and proposed implementations meet the indicated requirements or whether the requirements as drawn are all really necessary. Further, there are serious questions about the economics of meeting all the stated requirements. In addition to the above considerations there is argument on the appropriate data model to use: relational, hierarchical, network. This particular debate has been referred to as the "theological" discussion of the data base management system theorists. been criticism of the use of this word; criticism by quoting Hilaire Belloc: ultimately theological". There has I can only respond to that "All political questions are Indeed, such it seems to be, from which it follows that the correct answer to the question of what data model to use is necessarily "all of the above". One of the outcomes of the work reported in this paper is a mechanism that permits this answer in a meaningful sense. In the autumn of 1972, responding to the clearly perceived need to rationalize the growing confusion, SPARC, then under the Chairmanship of the author, took formal action to initiate investigation of the subject of data base management systems in the context of potential standardization. o Consistent with its normal practice when confronted with a complex subject, SPARC established an ad hoc Study Group on Data Base Management Systems, under the Chairmanship of D. M. initially Smith of the EXXON Corporation and now under the Chairmanship of the author. This Study Group was convened with a charge to investigate the subject of data base 364 management systems with the objective of determining which, i f an[, aspects of such systems are at present suitable candidates development of American National Standards. for the The "if any" qualification is important because a negative response is just as meaningful as a positive response in a standards context. present" qualification is equally significant, continuing need for review as the requirements, The "at indicating the technologies and economics change over time. The eventual result of the deliberations of this Study Group will be a series of reports in a specified format (SPARC 1974), identifying potentially standardizable elements of data base management systems and recommending whether or not there is a need, technological feasibility and economic justifications for the initiation of a standards development project in the area. be examined is 7 with respect to COBOL. The first interface to The present target date for completion of this work is the beginning of 1976. Report the Study Group has prepared a document As an Interim (SPARC 1975) which has had wide circulation and is soon to be generally published. It is appropriate at this juncture to provide a list of the members of the Study Group and their affiliations to indicate the breadth of representation. It is worth noting the extent to which the user community is participating in this effort, a rare event in data processing standardization on any continent. Bachman, Honeywell Information Systems C.Wo CohnF L. IBM Corporation Florance, W.E. Eastman Kodak Company Kirshenbaumt Equitable Life F. 365 Kunecke, H. Boeing Computer Services Lavin, M. Sperry Univac Mairet, C.E. Deere and Company Sibley, E.H. University of Maryland Steel, T.B., Jr. Equitable Life Turner, J.A. Columbia University Yormark, The RAND Corporation B. The initial tasks of the Study Group were the difficult ones of understanding and coming to respect the varying views of the different individuals--all theologies were (and still are) represented--and developing a vocabulary that was consistent and mutually comprehensible. It is not clear whether this last task has yet been fully accomplished, although considerable closure has been attained. In the course of the early discussions it emerged that what any standardization should treat is interfaces. There is no merit and potential disaster in developing standards that specify how components are to work. What is potentially proper for standards specification is how the components are meshed together; words, the interfaces. in other With this notion in mind a generalized model of a data base management system has been developed that highlights the interfaces and the kind of information and data passing across them. Figure I is a simplified diagrammatic view of this model. It should be noted that, except for the man-system interfaces, technological nature of the interface is not determined; hardware, software, firmware or some mixture. the it could be Indeed, some of the 366 interfaces could be man-man, germane to what follows. although pursuit of that notion is not The important point is that the implementation of the system is not prescribed, only the requirements that must be satisfied. simplified diagram, detailed picture, As was noted above, this is a but in order to maintain consistency with the the numerical identifications of the exhibited interfaces have not been changed so there are some numbers missing. The hexagonal boxes depict people in specific roles. rectangular boxes represent processing functions, The the arrow terminated lines represent flow of data, control information, programs and descriptions, and the dashed boxes represent program preparation and execution subsystems (including compilation and interpretation the solid bars represent functions). essential interfaces, Group's deliberations. Finally, the ultimate subject matter of the Study These interfaces are numbered rather than conventionally named for simplicity of discussion and to avoid confusion. Among the processes and interfaces omitted on this cut down version of the diagram are the various ways that system programmers and machine operators can invade the system to make ad hoc repairs, certain bypasses of the system mechanism that are asserted to promote efficiency but of debatable desirability in view of their impact on data independence, integrity and security, and the entire structure of physical mapping of data onto specific storage devices. All of the latter structure is to be found to the left of interface 21, much of it will be dictated by the laws of physics and, as such, is of little concern to the current investigation. The principal elements of the Study Group's view of a data base management system are displayed and, in particular, the three schema approach, 367 reflecting the new element introduced by this work, is illustrated. The lower right hand side of the diagram, the hexagon labelled "application programmer", the dashed rectangle labelled "application program subsystem" and the two interfaces labelled "7" and "12" comprise the entire non-data base activity of preparing and executing an application program. This structure may be viewed as replicated into a variety of subsystems, all interfacing with the data base management system through interface 12, differing in the nature of the language used by the programmer to communicate across the man-machine interface. This language may be a conventional procedure language such as COBOL, ALGOL or PL/I, recognizable special languages like report generators, inquiry languages or update specifiers, or some potentially new type of procedure or problem language. The critical thing to note here is that all data description passes into the application program subsystem across interface 12 from the data base system itself. This, of course, is nothing new° The lower left hand side of the diagram, "system programmer", the hexagon labelled the dashed rectangle labelled "system program subsystem" and the two interfaces labelled "16" and "18" comprise the entire normal interface available to the system programmer when it is necessary to bypass the ordinary mode of access to the system. Routine system maintenance and modification will occur through this subsystem. There are some exceptions, not concern the thrust of this paper. as noted above, but they do It should also be noted that there is clearly available the installation option of permitting application programmers to operate across this interface, potentially dangerous as that may be. in this construction. Again, there is nothing new 368 It is the upper portion of the diagram that is of concern in this paper. Current data base systems envision a two level structure; the data as seen by the machine and the data as seen by the programmer. A plethora of confusing terminology has been employed to distinguish between these views. The Study Group has chosen to employ the terms ~internal" and "external" to make this distinction. In addition, the Study Group has taken note of the reality of a third level, which we chose to call the "conceptual", that has always been present but never before called out explicitly. It represents the enterprise's view of the structure it is attempting to model in the data base. This view is that which is informally invoked when there is a dispute between the user and the programmer over exactly what was meant by program specifications. The Study Group contends that in the data base world it must be made explicit and, in fact, made known to the data base management system. proposed mechanism for doing this is the conceptual other two views of data, schema. The The internal and external, must necessarily be consistent with the view expressed by the conceptual schema. This required consistency can be maintained and verified in a reasonably fail safe manner only if the conceptual schema is machine processable. The bulk of the remainder of this paper will discuss the nature of the conceptual to the system. However, schema and how it may be made explicit it is worth examining what its presence means to the dynamics of the data base management system operation in terms of the diagrammatic representation of Figure Ignoring the system programmers, operation, who are extraneous to normal there are four human roles identified: administrator, I. the data base administrator, the enterprise the application administrator(s) s and the application programmer(s). Notice that 369 these are roles as opposed to individuals. The same individual may function in different roles and one role may involve several individuals simultaneously. It is critical, however, that there is only one enterprise administrator and one data base administrator (viewed as roles) while there may be several application administrators and several application programmers. This leads to the notion that there can be several external schemas, each representing a different view of the data, provided each is consistent with and derivable from the single conceptual schema. extension there can be several application programmers, necessarily working on the same program, By not that use the same external schema. Each "administrator" is responsible for providing to the system a particular view of the necessary data and the relevant relationships among that data. The central view, as noted above, is that of the enterprise administrator who provides the conceptual schema. must be emphasized, It and apparently with repetition as this point seems to be the most frequently missed by those not on the Study Group who have examined its work, that the conceptual schema is a real, tangible item, made most explicit in machine readable form, Couched in some well defined and potentially standardizable syntax. Much of the remainder of this paper is concerned with conceptual schemas and the author's view of the possibilities for the semantics of such schemas. In order to provide a context, however, a preliminary examination of the dynamics of the process envisioned is appropriate. The enterprise administrator defines the conceptual schema and, to the extent possible and practicable, validates it. Some, but in general not all, of this schema can be checked for consistency by 370 mechanical means. As the conceptual schema is a formal model of the interesting (for the data base management system) enterprise, if the situation is at all complex then the problem of logical incompletability will be encountered conceptual aspects of the (Godel 1931). The schema will contain, among other things, definitions of all the entities to be comprehended--up to the isomorphism determined by identity of those properties defined in the schema as relevant. Relatonships amongst these entities will also be explicated, as will the constraints on allowable values of "data". By defining those persons with some access to the data base management system as entities of interest, it is possible to directly model the rules of access and, thus, provide security control at the level of the conceptual schema. This is a key point. It is well known that there are substantial problems with security control and the importance if a centralized point having a view of the entire system must not be overlooked. The data base administrator (a definition of this role somewhat at variance with the conventional conception of the task) responsible for defining the internal schema. is This schema contains an abstract description of the storage strategy currently employed by the data base management system. stored flat, hierarchical, Whether the data is actually networked, including any meaningful combination, schema. inverted or otherwise, is contained in the internal The "internal syntax" of the data values will also be found in the internal schema; such items as the radix for numeric values, coding schemes used, units of measure, and the like. Access paths and the relational connectivity between data representations will be defined. conceptual All of this must be consistent with and derivable from the schema, which, therefore, must be available for display to the data base administrator,. The internal schema processor (see 371 Figure 1) provides a mechanical check on this consistency. Within the limits imposed by this requirement of consistency with the conceptual schema, the data base administrator is free to alter the internal schema in any way appropriate to optimization of the data base management system operation. Indeed, by use of suitable interpreters it will be possible to reorganize the internal structure of the data base dynamically while normal operations continue. In view of the massive size of some data bases currently comtemplated, this is an essential requirement, and it would seem that only the guarantee of separation of the users' view and the system's view of data provided by interposition of the conceptual schema permits this. The third "administrator" role, the application administrators, provide the external schemas (analogues of the DETG "sub-schemas") which define the application programmers' views of the data. These external schemas are a multiplicity in concept and will, in general, only encompass the portion of the data base relevant to a particular application. It is envisioned that each general application area will have its own application administrator who provides the appropriate schemas for that area. descriptions (schemas) These are the only data seen by an application program and provide the only avenue of data name resolution. It would carry this essay too far afield to discuss the complexities of name resolution and symbol binding; suffice it to say that all external name resolution, whether performed at compile time, program invocation time, or module execution time are done across interfaces 7, 12 and 31 through the intermediation of the appropriate external schema across interface 5. Exactly the same remarks about the consistency of the various 372 external schemas with respect to the conceptual schema as was noted about the internal schema are to be understood, with the qualification that one external schema may be a true subset of another and, under the hypothesis that consistency in this sense is transitive, the external schema processor may only validate one external schema against a more comprehensive one known to be consistent with the conceptual schema. After the appropriate schemas are defined, the system dynamics becomes quite straightforward and little different from current systems. specifier, The application programmer etc.) external schema, declarationst (report specifier, inquiry does his job in the usual way, using the provided both explicitly and implicitly, providing procedural invoking compilation, input across interface 7 and generation or other relevant processes through the application program subsystem. Upon entry to execution mode, requests for data are passed across interface conceptual/external as his set of data 12 to the transformer which computes the mapping between the external data description and the conceptual data description. This description passes across interface 31 to the conceptual/internal transformer which in its turn computes the mapping between the conceptual data description and the internal data description© In general, the internal and conceptual schemas will be static, so, depending upon the mapping complexity and the nature of the implementation, the two transformers description) it may well be possible to collapse (into and out of the conceptual data by computing the composite mapping function. This should not obscure the face that in order to maintain true data independence it must always remain possible to force this process to occur in two steps. 373 Finally, the data request as transformed is passed across interface 30 to the internal/storage transformer. The internal schema will recognize storage as something like a linear, multiorigined address space, and it will be necessary to remap this abstract model of storage onto hardware constructs such as tracks, cylinders and the like. This "dirty" description then is passed across interface 21 into the bowels of the machine transformations therein) process reversed. (and may go through other until actual data is obtained and the This brief description has been couched in terms of obtaining data but, of course, storage of data proceeds in the same way, mutatis mutandis. Question of locks, avoidance of "deadly embrace", security, integrity and other data base managemen t system problems all have their place in this scheme of things, but it is beyond the scope of this paper to consider them. By and large they present no distinct aspects in this three level view from those found in conventional approaches, except that in some instances--security, for example--the solutions may be both easier and more assured. Before turning to a discussion of the conceptual schema it is appropriate to insert a brief excursus on the status of data base management system standardization in ISO. Meeting of ISO/TC97, At the Eight Plenary held 1974 May 14-17 in Geneva, Resolution passed with 14 affirmative and two negative (Canada, France) 11, votes, assigned responsibility for data base management to Subcomittee 5 (Programming Languages) group on the subject and instructed SC5 to establish a study (ISO 1974). Such a Study Group was established by SC5 and several countries submitted position papers. Interim Report. The USA position paper was the SPARC An 1975 June 24-26, the Study Group met in 374 Washington, USA. DC with delegations from France, Germany, Sweden and the Written input was also available from Switzerland and the United Kingdom. The following six points are the conclusions of that meeting: I. The Study Group concludes in response to the Netherlands Proposal on Data Base Management (ISO/TC 97/598), that any standardization action in the area of data base management systems based on existing proposals is premature in the absence of criteria against which to measure such proposals. 2. The Interim Report of the ANSI/X3/SPARC Study Group on Data Base Management Systems (ISO/TC 97/SC 5 (USA-75) N359) is accepted by the ISO/TC 97/SC 5 Study Group on Data Base Management Systems as an initial basis for discussion on a gross architecture of data base management systems. 3. The Study Group acknowledges the need to identify all types of data base management systems users and to specify their requirements~ 4. The Study Group proposes to review and augment the terminology used in N359 and the concepts therein. As the initial effort, the Study Group will establish priorities in terms of the interfaces identified in N359 for further investigation. These priorities will be chosen to optimize the benefits derived from standardization. 5. As a parallel activity to those identified above, the current CODASYL data base specifications will be evaluated. The Study Group notes at this time that preliminary studies by various national and internationl bodies have indicated that the CODASYL specifications are not suitable for standardization as they 375 stand. 6. The Study Group will recommend development work for those interfaces appropriate for standardization for which no adequate candidate exists. The next meeting of this Study Group will be in Paris, 1976 January 12-15. The underlying notion behind the conceptual schema as envisioned by the Study Group is the "entity-property-value" trinity made explicit in GUIDE-SHARE requirements study 1970). (GUIDE-SHARE There is general agreement among the members of the Study Group on the overall nature and objectives of the conceptual schema, but in my judgment there is less real agreement on its exact place in the scheme of things than might seem the case from the Study Group reports. To a considerable extent this lack of agreement does not hamper progress, and may well not matter in the long run provided the distinct views are carefully articulated. What follows is the author's view of the conceptual schema notion and some indications on how it can be formalized. Figure 2 is a schematic illustration of how one can proceed from "reality" to the data models actually used by application programs. It is derived from a metaphysics that may not be wholly congenial to everyone but should at the very least be familiar to those acquainted with the principles of scientific explanation (Braithwaite 1953). It is assumed that a "real world" exists in some meaningful sense. Subordinate to this "true" reality can be found the "perceived" reality obtained through our sensory inputs as transformed by our brains. This immediate, primitive image of reality is, or at least can be, transformed into a rational mental 376 model of reality by a process known as scientific abstraction. This process can be roughly described as: one's perceptions); (2) experimentation (1) observation (noting (stimulation of the perceived reality to generate new perceptions); (3) ~eneralization (intuiting that similar stimulation will generate similar perceptions); (4) theorizin~ generalizations); (identifying fundamental (5) ~ (inferring that new and different stimulations will produce new, albeit expected, finally, (6) verification observing the results). perceptions); and, (initiating these new stimuli and Repeated iteration of this sequence leads to a gradually more refined mental model of the real world. In order to communicate this model to someone--or something--else, it is necessary to use a language. As is well known, natural languages are unsatisfactory media for ~recise communication of the content of scientific models. At present the best available vehicle for such precise communication is that of formal languages 1930). (Tarski While there are complications in the reduction of scientific descriptions of reality to existing formalisms, most of these problems are to be found on the outer limits of the models. Generally one does not really wish to describe a total model of all reality--the "best ~ model whose boundary is fuzzy and moves with the growth and modification of scientific knowledge. What is desired is to describe some limited model of a portion of reality, extracted from the "best" model by a process we can call "engineering abstraction". While it may be the case that the universe is "best" described by the interactions of 3.10 ~0 quarks, the typical engineer is more apt to build his bridge by combining girders, and rivets. cross braces The molecular biologist may view the human being as a complex structure of water, protein molecules, DNA and other, 377 assorted chemicals, but to the insurance agent a human being is not much more than an age, sex and checkbook. abstracts those aspects of "reality" the rest. Thus, formal descriptions appropriate For any application considered relevant one and ignores need only deal with the level of abstraction. This resultant formalism--the "symbolic" model--is model of the interesting derived from the limited, "engineering" embodied in the mind of the perceiver by a process we will call "symbolic abstraction", conventional, subset of reality as and is the linguistic expression predetermined in some syntax of a set of forms to which suitable semantic content is given by the adoption of rules of designation and rules of truth (Carnap 1942). totality of what is known and interesting modeled. It i__ssthe conceptual schemma. It expresses about the enterprise being The processes of mapping from this formal model to the data models we call "internal and "external schemas" may be complex and difficult they are straightforward conceptual in principle, schema has sufficient the schema" in practice, but providing only that the detail to permit all necessary expression. In the author's view the proper choice of formalism--indeed, only acceptable choice--is that of modern symbolic logic; order predicate calculus with identity together with a suitable axiomatic 1958), augmented by appropriate modal logics finally, supplemented associated by "individuals" non-logical predicates formalisms of symbolic invocation of all the analysis (Bernays 1938), & Fraenkel (von Wright 1951), and, (Quine 1961) and the and the axioms for their behavior. The reasoning behind this position conventional the first (Hilbert & Ackermann set theory the is quite simple. Use of the logic and set theory permit the that has been devoted to this topic 378 by three generations of logicians. Both the pitfalls and possibilities are well understood and the limitations clearly defined. Further, available. it is in some sense the most general scheme If one accepts Church's Thesis contemporary logicians, (Kleene 1952), as do most it is the most general scheme that can be contemplated for use with digital machinery. From this it is possible to deduce that anything expressible to a machine with precision at all is necessarily expressible in this fashion. As an aside let me emphasize a point which should be obvious but is, perhaps, worth making explicit for clarity. Whenever in this paper I use the word "set" I intend it in the strictly logical sense as a synonym for "collection" or the German "Mange" or the French "ensemble", not in any way as that linguistic atrocity perpetrated by the DBTG Report wherein the nineteenth, fifth and twentieth letters of the Roman alphabet are used in that order as the name of a peculiar object. This may seem harsh, but the point at issue represents a prize example of the manner in which the information processing sciences generate confusion for themselves and others by casual misuse of words. Indeed, it reminds me of Orwell's Newspeak. In a paper of this character it is not possible to probe the possibilities of the sketch above in any depth. However, certain examples may clarify the power of the approach. It is unequivocally precise in any modern version of set theory as to what is meant by a "relation". A relation is a set of ordered pairs <x,y> being definable as ~ x ~ , t x , y ~ ) (the ordered pair and one can say that x bears the relationship R to y provided that <x,y> g R ("~" being the predicate of set membership). Thus, the confusion between a "relation" and a "relationship", terminological idiocy, which is another example of is made quite precise. 379 Relations of interest can be given names and defined either by enumeration of their members or by any property that must be possessed by a pair to enjoy membership, in exactly the same fashion that any other set is completely defined by its members. The equally troublesome concept of "order" can be explicitly defined. A partial ordering is any relation having the properties of reflexivity, anti-symmetry and transitivity. A linear ordering is a partial ordering where any two elements in its field are comparable and a well-ordering is a nowhere dense linear ordering. Structures of arbitrary complexity can be constructed. of a general array The concept (Steel 1964) developed out of some early data structure studies, and it can be shown that any nondense complex is expressible as a general array so defined. As digital computers cannot deal with dense structures except in finite approximation, this would seem to be sufficient. The modal predicate of deontic logic, its derived predicates "-0-" "O-" "O" (for "obliged to"), and ("obliged to not" E "forbidden to"), and ("not forbidden to" I "permitted to") provide the required paradigm for expressing either legal constraints in the model or defining the rules of access. These examples could be multiplied a considerable length, but should be sufficient to illustrate the point. From a theoretical point of view there is no more suitable vehicle for expressing a conceptual schema. This is, of course, not the whole story. First, theoretical possibility and practical possibility are not identical. There is the danger that the necessary expressions get too large and cumbersome for effective use. with million instruction operating systems, In an age where we deal this is not a fully 380 persuasive argument in any event. It is, however, moot. The number and character of the necessary expressions do not get excessive; unlike, say, the contrast between conventional procedure languages and Turing machines. On the contrary, nearly a century of search for compact notation has resulted in definitional sequences that provide more compact expression than one typically finds in programming language data descriptions perform less of the task. (or sub-schemas) which Some of this is due, of course, to the use of large character sets, but in any case economy of notation is not a problem. A second potential difficulty is the actual use of the tools to construct the desired models, which is a task that is necessarily an art rather than a science. Clearly, if the process of constructing a model could be itself formalized one would already have the model in the input. To this point I can only say that I have personally been partially successful in constructing models of relatively complex insurance procedures, and in a matter of a few days, inventing notation as I went along. This effort was only partially successful in the sense that, while I was able to generate static models with no difficulty, the problem with time and the dynamic behavior of the model caused difficulties of two types. First, thre was the philosophical problem of the potential as opposed to the actual. How does one treat the property "age at death" prior to the actual death of the individual? Formally, of course, this is trivialF but obtaining some assurance that the formalism does not hide an ambiguity or paradox is far from trivial. The second problem with time has to do with the inelegance of making the variable denoting time distinguished and, therefore, case. a special While there is nothing inherently wrong with mathematical 381 inelegance per s e, several thousand years of logical and mathematical history suggest intuitively that something is wrong. Some recent work (Thomasen 1974) on the reduction of tense logic to modal logic hints at a solution to this problem. I have gone far enough with this work to become convinced that the approach is sound and no fundamental invention is required; some hard work to refine the ideas. There remains, only however, one further potential criticism of this approach with which it is necessary to deal. It is a criticism to which I would prefer to comment "a pox on those who raise it" and then ignore the matter. As a practical consideration, however, it will not go away. It is much the same argument that has been raised in the past against every programming language except COBOL; i.e., the language is too much like algebra, only the mathematicians can use it. The argument is irrefutable for if people believe they cannot understand something, they won't! However, there is one difference between this situation and the programming language situation. The only one who must construct models is the enterprise administrator and only the data base administrator and the applications administrators need to read such models. well compensated. These individuals are presumably senior and They can be required to have a little education. Furthermore, while I have no proof, it is my belief that once the barrier of belief in its esoteric character is overcome, it is no harder to teach reasonably intelligent people the relevant logic than it is to teach them COBOL and the DDL. To summarize this personal view of the nature of a conceptual schema, any alternative is either equivalent and therefore equally complex while being less understood for lack of familiarity, or it is not equivalent and therefore can only model a subset of that 382 reality otherwise amenable to modelling. The only real issue is whether some less powerful but more acceptable formalism exists that is adequate for modelling anticipated enterprises for a reasonable future. In my view neither data structure diagrams nor normalized relations (Bachman 1969) (Codd 1970) nor the CODASYL DDL (CODASYL 1971) being discussed at this Working Conference are candidates for such an alternative. As overlaid structures for internal and external schemas they may be quite suitable; the criteria for acceptability being different. In conclusion~ let me reiterate that the latter portion of this paper is my personal view of the appropriate structure for a conceptual schema and does not necessarily represent the view of other members of the ANSI/SPARC Study Group on Data Base Management Systems. On the other hand, the general principle of the three level approach and the essential requirement for the conceptual schema is fundamental to the deliberations of the Study Group. It is reasonable to claim that this position will be maintained in the Final Report of the Study Group and will continue to characterize the official position taken by ANSI on behalf of the USA in any deliberations on data base management systems in the ISO. 383 REFERENCES Bachman, C. W.: "Data Structure Diagrams", Bernays, P. and Fraenkel, Data Base, A. A.: "Axiomatic Set Theory", North-Holland Braithwaite, R. B.: "Scientific Press Carnap, R.: "Introduction (Cambridge, CMSAG Joint Utilities 1:2 (1969). Explanation", (Amsterdam 1958). Cambridge University (London 1953). to Semantics", ~ Harvard University Press 1942). Project: "Date Management Requirements", Systems CMSAG (Orlando, FL 1971). CODASYL: "A Survey of Generalized available CODASYL": from NTIS (Washington, Banks", CACM, 13:6 (1970), pp. 377-387. Mathematica und verwandter (1931), pp. 173-198. "Data Base Management Inc. Hilbert, (New York, D. and Ackermann, W.: 669. Systems I", Monatshefte, SHARE N. Y. 1970). "Grundzuge der Theoretischen 1938). (Geneva-3) S~tze der Principia System Requirements", Logik", ISO: ISO/TC97 (New York 1971). Model of Data for Large Shared Data K.: "Uber formal unentscheidbare GUIDE/SHARE: Systems", DC 1969). "Data Base Task Group Report", ACM Codd, E. F.: "A Relational G~del, Data Base Management Julius Springer (Berlin, 38 384 Kleene, S. C.: "Introduction (Princeton, Quine, W. V. 0.: SPARC: "Outline for Preparation "Interim Report: Systems:, Steel, T. B.; Jr.: A.: for Standardization", DC 1974). Study Committee on Data Base Management "Beginnings (forthcoming). of a Theory of Information CACM, 7:2 (1964), pp. 97-103. Begriffe der Methodogie Wissenschaften und S. K.: "Reduction of tense logic to modal logic, I", 37 I", Monatshefte der deduktiven f~r Mathematik Physikt Thomason# Harvard University MA 1961). (Washington, SIGMOD NEWSLETTER "Fundmentale rev.ed., of Proposals CBEMA Handling", Tarski, Logic", (Cambridge, document SPARC/90, van Nostrand N. J. 1952). "Mathematical Press SPARC: to Metamathematics", (1930), pp. 361-404. J. Symbolic Logic, Von Wright, 39:3 (1974), pp. 549-551. G. H. : "An Essay in Modal Logic", (Amsterdam 1951) . North-Holland 385 ' Enterprise Administral ® ® Conceptual Schema Processor Data Base ~dministratol iptmswm~ "\.dm~,strator/! ® External Schema Processor Internal Schema _~~Processor @ ,0 ~ ,® I I 'n"'na'~I_ "~ ..... -i-'~! Sto,age r"! L,,.n,.°,~°, / ! ! I ~ ! I I Internal (System) Program Subsystem I I Conceptual/ Internal Transformer i I-,conc ' I ! I I I ! External II 1(Application) ~ I Program I I I I I I ~ Subsystem II I I I I I I <~ pplication~', rogramme,/,' p System rogramme/ Figure I ,,,// 386 • Reality" Reality ~--=~"'~ Scientific abstraction Model ~ E n g i n e e r i n g abstractions -~= Conceptual Realm Mental Model imited Models Perceived Scientific progress / / i /"'/' / / /'/ / Symbolic abstraction -v-$ /' / " " ~ ~ I Conceptual Realm I Conceptual I SymbolicModel Schema Internal Schema External ] Schema(s) 1 Figure 2

Distributed Algorithms: 11th International Workshop, WDAG '97, Saarbrücken, Germany, September 24-26, 1997, Proceedings (Lecture Notes in Computer Science)

Hybrid Systems (Lecture Notes in Computer Science)

Exercises in Computer Systems Analysis (Lecture Notes in Computer Science)

Smart Graphics, 10th International Symposium, SG 2010, Banff, Canada, June 24-26, 2010, Proceedings (Lecture Notes in Computer Science)

Computer Vision Systems (Lecture Notes in Computer Science)

Secure Data Management: 5th VLDB Workshop, SDM 2008, Auckland, New Zealand, August 24, 2008, Proceedings (Lecture Notes in Computer Science)

Programming Methodology: 4th Informatik Symposium, IBM Germany, Wildbad, September 25-27, 1974 (Lecture Notes in Computer Science) (English and German Edition)

STACS 2007: 24th Annual Symposium on Theoretical Aspects of Computer Science, Aachen, Germany, February 22-24, 2007, Proceedings (Lecture Notes in Computer ... Computer Science and General Issues)

Data Base Systems, Proceedings, 5 conf

Information Security. ISC 2011 Proceedings (Lecture Notes in Computer Science)

Trustworthy Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Trustworthly Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Computational Logistics (Lecture Notes in Computer Science)

Biomedical Simulation: 5th International Symposium, ISBMS 2010, Phoenix, AZ, USA, January 23-24, 2010. Proceedings (Lecture Notes in Computer Science Theoretical Computer Science and General Issues)

Pattern Recognition: 32nd DAGM Symposium, Darmstadt, Germany, September 22-24, 2010, Proceedings (Lecture Notes in Computer Science Image ... Vision, Pattern Recognition, and Graphics)

Advances in Intelligent Data Analysis X. 10th International Symposium IDA 2011 Proceedings (Lecture Notes in Computer Science)

Static Analysis: Third International Symposium, SAS '96, Aachen, Germany, September 24 - 26, 1996. Proceedings: Third International Symposium, SAS ... 3rd

Data Base Systems: Proceedings, 5th Informatik Symposium, IBM Germany, Bad Homburg v. d. H., September 24 - 26, 1975 (Lecture Notes in Computer Science)

Distributed Algorithms: 11th International Workshop, WDAG '97, Saarbrücken, Germany, September 24-26, 1997, Proceedings (Lecture Notes in Computer Science)

Hybrid Systems (Lecture Notes in Computer Science)

Exercises in Computer Systems Analysis (Lecture Notes in Computer Science)

Smart Graphics, 10th International Symposium, SG 2010, Banff, Canada, June 24-26, 2010, Proceedings (Lecture Notes in Computer Science)

Computer Vision Systems (Lecture Notes in Computer Science)

Secure Data Management: 5th VLDB Workshop, SDM 2008, Auckland, New Zealand, August 24, 2008, Proceedings (Lecture Notes in Computer Science)

Programming Methodology: 4th Informatik Symposium, IBM Germany, Wildbad, September 25-27, 1974 (Lecture Notes in Computer Science) (English and German Edition)

Programming Methodology: 4th Informatik Symposium, IBM Germany, Wildbad, September 25-27, 1974 (Lecture Notes in Computer Science) (English and German Edition)

STACS 2007: 24th Annual Symposium on Theoretical Aspects of Computer Science, Aachen, Germany, February 22-24, 2007, Proceedings (Lecture Notes in Computer ... Computer Science and General Issues)

Data Base Systems, Proceedings, 5 conf

Information Security. ISC 2011 Proceedings (Lecture Notes in Computer Science)

Trustworthy Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Trustworthly Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Computational Logistics (Lecture Notes in Computer Science)

Reachability Problems (Lecture Notes in Computer Science)

Operating Systems: Lecture Notes in Computer Science Vol 80

Architecting Dependable Systems III (Lecture Notes in Computer Science 3549)

Software Engineering for Resilient Systems (Lecture Notes in Computer Science)

Hybrid Systems IV (Lecture Notes in Computer Science)

Multiple Classifier Systems (Lecture Notes in Computer Science 5997)

Provable Security (Lecture Notes in Computer Science)

Information Hiding (Lecture Notes in Computer Science)

Architecture of Computing Systems (Lecture Notes in Computer Science 5974)

Biomedical Simulation: 5th International Symposium, ISBMS 2010, Phoenix, AZ, USA, January 23-24, 2010. Proceedings (Lecture Notes in Computer Science Theoretical Computer Science and General Issues)

Biomedical Simulation: 5th International Symposium, ISBMS 2010, Phoenix, AZ, USA, January 23-24, 2010. Proceedings (Lecture Notes in Computer Science Theoretical Computer Science and General Issues)

Pattern Recognition: 32nd DAGM Symposium, Darmstadt, Germany, September 22-24, 2010, Proceedings (Lecture Notes in Computer Science Image ... Vision, Pattern Recognition, and Graphics)

Advances in Intelligent Data Analysis X. 10th International Symposium IDA 2011 Proceedings (Lecture Notes in Computer Science)

Static Analysis: Third International Symposium, SAS '96, Aachen, Germany, September 24 - 26, 1996. Proceedings: Third International Symposium, SAS ... 3rd

Static Analysis: 11th International Symposium, SAS 2004, Verona, Italy, August 26-28, 2004, Proceedings (Lecture Notes in Computer Science)

Web Services: European Conference, ECOWS 2004, Erfurt, Germany, September 27-30, 2004, Proceedings (Lecture Notes in Computer Science)

Data Base Systems: Proceedings, 5th Informatik Symposium, IBM Germany, Bad Homburg v. d. H., September 24 - 26, 1975 (Lecture Notes in Computer Science)

Distributed Algorithms: 11th International Workshop, WDAG '97, Saarbrücken, Germany, September 24-26, 1997, Proceedings (Lecture Notes in Computer Science)

Hybrid Systems (Lecture Notes in Computer Science)

Exercises in Computer Systems Analysis (Lecture Notes in Computer Science)

Smart Graphics, 10th International Symposium, SG 2010, Banff, Canada, June 24-26, 2010, Proceedings (Lecture Notes in Computer Science)

Computer Vision Systems (Lecture Notes in Computer Science)

Secure Data Management: 5th VLDB Workshop, SDM 2008, Auckland, New Zealand, August 24, 2008, Proceedings (Lecture Notes in Computer Science)

Programming Methodology: 4th Informatik Symposium, IBM Germany, Wildbad, September 25-27, 1974 (Lecture Notes in Computer Science) (English and German Edition)

Programming Methodology: 4th Informatik Symposium, IBM Germany, Wildbad, September 25-27, 1974 (Lecture Notes in Computer Science) (English and German Edition)

STACS 2007: 24th Annual Symposium on Theoretical Aspects of Computer Science, Aachen, Germany, February 22-24, 2007, Proceedings (Lecture Notes in Computer ... Computer Science and General Issues)

Data Base Systems, Proceedings, 5 conf

Information Security. ISC 2011 Proceedings (Lecture Notes in Computer Science)

Trustworthy Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Trustworthly Global Computing: 5th International Symposium, TGC 2010, Munich, Germany, February 24-26, 2010, Revised Selected Papers (Lecture Notes in ... Computer Science and General Issues)

Computational Logistics (Lecture Notes in Computer Science)

Reachability Problems (Lecture Notes in Computer Science)

Operating Systems: Lecture Notes in Computer Science Vol 80

Architecting Dependable Systems III (Lecture Notes in Computer Science 3549)

Software Engineering for Resilient Systems (Lecture Notes in Computer Science)

Hybrid Systems IV (Lecture Notes in Computer Science)

Multiple Classifier Systems (Lecture Notes in Computer Science 5997)

Provable Security (Lecture Notes in Computer Science)

Information Hiding (Lecture Notes in Computer Science)

Architecture of Computing Systems (Lecture Notes in Computer Science 5974)

Biomedical Simulation: 5th International Symposium, ISBMS 2010, Phoenix, AZ, USA, January 23-24, 2010. Proceedings (Lecture Notes in Computer Science Theoretical Computer Science and General Issues)

Biomedical Simulation: 5th International Symposium, ISBMS 2010, Phoenix, AZ, USA, January 23-24, 2010. Proceedings (Lecture Notes in Computer Science Theoretical Computer Science and General Issues)

Pattern Recognition: 32nd DAGM Symposium, Darmstadt, Germany, September 22-24, 2010, Proceedings (Lecture Notes in Computer Science Image ... Vision, Pattern Recognition, and Graphics)

Advances in Intelligent Data Analysis X. 10th International Symposium IDA 2011 Proceedings (Lecture Notes in Computer Science)

Static Analysis: Third International Symposium, SAS '96, Aachen, Germany, September 24 - 26, 1996. Proceedings: Third International Symposium, SAS ... 3rd

Static Analysis: 11th International Symposium, SAS 2004, Verona, Italy, August 26-28, 2004, Proceedings (Lecture Notes in Computer Science)

Web Services: European Conference, ECOWS 2004, Erfurt, Germany, September 27-30, 2004, Proceedings (Lecture Notes in Computer Science)

Recommend Documents