Lecture Notes in Computer Science Edited by G. Goos and J. Hartmanis
39 Data Base Systems Proceedings, 5th informatik Symposium, IBM Germany, Bad Homburg v.d.H., September 24-26, 1975
Edited by H. Hasselmeier and W. G. Spruth
Springer-Verlag Berlin-Heidelberg • New York 19 76
Editorial Board P. Brinch H a n s e n . D. Gries o C. Moler • G. Seegm~iller. J. Stoer N. Wirth
Editors Helmut Hassetmeier Dr.-Ing. Wilhelm G. Spruth IBM D e u t s c h l a n d EF G r u n d l a g e n e n t w i c k l u n g S c h 6 n a i c h e r Stra6e 2 2 0 703 B0blingen/BRD
Library of Congress Cataloging in Publication Data
Informatik S~,~Do!~iL~a~ 5th~ }I~,~teg ,zo2 de." ~6he~ 19~'~. O&ta base system. (Lecture note~ .illeoa2%lter sciemce ; 39) Engl~ ~h o.r German. Sponsored by I~[~ G e ~ n y s~u& the I&~1 ~Torli T ~ e Co!~por atlono Bibliogr~p!~: p. Include-', i u ~ 1. Data base ~%nagement--Congresses. I. ~m,sse3~eia~ TI. I[o Spruth s W~ G. III. IBM De~Itschlan&o IV. IBM Wot'Id Trade Corporation. V. Title° VIo Series° QA76.9°D3152 19T~ 001.6'442 75-46~0 L
AMS Subject Classifications (1970): 00A10, 68-02, 68-03, 68A05, 68A10, 68A20, 6 8 A 5 0 CR Subject Classifications (1974): 4.30, 4.33, 4.34, 4.0, 4.22, 4.6
ISBN 3-540-07612-3 Springer-Verlag Berlin • Heidelberg • New York ISBN 0-387-07612-3 Springer-Verlag New Y o r k . Heidelberg • Berlin This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and. storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin • Heidelberg 1976 Printed in Germany Printing and binding: Offsetdruckerei Julius Beltz, Hemsbach/Bergstr.
Contents Uberlegungen H.
Remus
zur E n t w i c k l u n g von D a t e n b a n k s y s t e m e n
.......................................................
On the R e l a t i o n s h i p b e t w e e n G.
Richter
D a t a Base Research: A.
I n f o r m a t i o n and Data 21
.....................................................
B~aser/H.
Schmutz
A Survey ...........................................
Grundlegendes
zur S p e i c h e r h i e r a r c h i e
C.
..................................................
Sch~nemann
44
114
S y s t e m R - A R e l a t i o n a l D a t a Base M a n a g e m e n t S y s t e m M.M.
Astrahan~
D.D.
Chamberlin,
W.F.
King,
I.L.
Traiger
........
139
G e o g r a p h i c Base Files: A p p l i c a t i o n s in the I n t e g r a t i o n and E x t r a c t i o n of D a t a f r o m D i v e r s e S o u r c e s P.E.
Mantey/E.D.
Carlson
.......................................
D a t a Base User L a n g u a g e s P.
Lockemann
for the N o n - P r o g r a m m e r
...................................................
Ein S y s t e m zur i n t e r a k t i v e n Messdaten U.
Schauer
149
183
Bearbeitung umfangreicher
.....................................................
213
D a t e n b a n k o r g a n i s a t i o n bei der H o e c h s t A k t i e n g e s e l l s c h a f t O.
Saal
........................................................
N u t z u n g von D a t e n b a n k e n einer H o c h s c h u l e E.
Edelhoff
im n i c h t - w i s s e n s c h a f t l i c h e n
R.
Heitm~ller
Clark
Data Base S y s t e m E v a l u a t i o n Hill ......................................................
H,L.
H.
Wedekind
Data Base S t a n d a r d i z a t i o n Steel
279
291
in D a t e n b a n k s y s t e m e n
....................................................
On the I n t e g r i t y of Data Bases and R e s o u r c e L o c k i n g R. B a y e r .......................................................
T.B.
266
Implementation
.....................................................
Datensicherheit
249
beim Hessischen
..................................................
Relational Data Dictionary I,A.
Bereich
....................................................
E i n s a t z eines D a t e n b a n k s y s t e m s Landeskriminalamt
232
315
339
- A Status R e p o r t
.....................................................
362
PREFACE
The papers in these Proceedings were presented at the 5th Informatik-Symposium which was held in Bad Homburg, Germany, from September 24 - 26, 1975. The Symposium was organized by the Scientific Relations Department of IBM Germany and sponsored by IBM Germany and the IBM World Trade Corporation.
The aim of the Informatik-Symposium is to strengthen and improve the com~unication between universities and industry, by covering a subject in the field of computer science, both from a university and from an industry point of view.
During the last 5-10 years, Data Base Systems have developed from a highly speculative "Management Information System (MIS)" approach to a practical production tool. In the late 5O's and early 60's, the application program was viewed as the nucleus of an application, with multiple data sets as accessories to the application program, and multiple, more or less unrelated application programs serving the needs of a larger enterprise or organization. The modern approach views the data base as the nucleus of a data processing operation, surrounded by multiple application programs operating on its data.
This switch has significantly increased the need for features and characteristics, which permit quick adaptions to an ever changing set of external requirements. In the old approach, external changes usually could be contained to one or a few application programs and their associated data sets. Because of the tight coupling between application programs and their data in a Data Base System, external changes are much more pervasive than they used to be. As a consequence, practical Data Base System implementations require a degree of universality and generality unknown in previous data processing installations.
In organizing this Symposium, we structured the subject matter into four topics~ The topic of data structures covers the logical view the user has on internally stored data. This topic is closely related to the subject of data base languages. In doing this, we specifically tried to avoid a repetition of the popular argumentation of the pros and cons of the various data representation models, e.g. the hierarchical, network, and relational models.
VL
The second topic deals with components and technology~ Today the magnetic disk is the main technology for the storage of large amounts of data. Its peculiarities impact to a large extent the structure of today's data base systems. A major change in data base structures can be expected, if and when we succeed to replace the magnetic disk storage by another, more amenable storage structure.
System aspects is the third topic° It includes problems of data security and data integrity. The evolution of data base systems has generated numerous ethical, social and moral questions. It is the responsibility of the data processing community to assure technically acceptable solutions for those issues°
User aspects is the fourth topic of the Symposium. Data Base Systems require a number of tools for their installation, maintenance, and evaluation. Refinement and enhancement of these tools may be one of the major prerequisites for the further development of Data Base Systems.
The editors would like to express their thanks to everybody who contributed to the Symposium by preparing a talk, providing advice for its content and organization or assisting in its administration~
Boeblingen, October 24, i975
H. Hasselmeier
W.G. Spruth
@berlegungen
zur Entwicklung
yon Datenbanksystemen
Horst Remus,
IBM Palo Alto, Californien
Zusammenfassung Bei der Entwicklung te besonders
zur integrierten
Datenverarbeitung
sind zwei Schrit-
bemerkenswert:
- Die Datenbank
als Zentrale,
wobei die Anwendungsprogramme
lichen den Verkehr mit der Datenbank
regeln
im wesent-
(Abfrage oder Aufarbei-
tung). - Das Datenfernverarbeitungsnetzwerk,
das den gleichzeitigen
Zugriff
einem Programm oder einer Datenbank yon mehreren Benutzerstationen
zu aus
gestattet. Die Datenbankzentrale
des Datenverarbeitungssystems
Datei als Zugriffsdatei
fur ein bestimmtes
der Datei yon diesem einen Programm) bezUglich
ihrer Organisation.
genereller
Datenbanksysteme
re @berlegungen Benutzer
-
Programm
Ein weiterer
Schritt
0berlegungen
ist die EinfUgung
mit der Idee der Datenunabh~ngigkeit. ("integrity"
zu der
(mit OPEN und CLOSE
erfordert bestimmte
haben mit der Beantwortungszeit
schutz und Datensicherung
im Gegensatz
("performance"),
und "recovery")
AndeDaten-
zu tun° FUr den
stellt sich das System in zwei Teilen dar:
Das Datenmodell
- Die Sprache mit der diese Daten manipuliert KUnftig
werden
("user interface").
zu 15sende Probleme weisen in die Richtung yon Datenbanken mit
gleichzeitigem schiedene
Zugriff von mehreren
Knotenpunkte
verteilte
Systemen und in Netzwerken
Datenbanken.
auf ver-
]~
ENTWICKLUNG ZUR DATENBANK
Wir betrachten Mengen~ deren Elemente aus alphanumerischen Zeichen zusammengesetzte Daten oder Informationen sind. F@r diese Mengen ergeben sich folgende Operationen: a) Die Abfrage~ d.h. die Herauskristallisierung
gewisser Teilinformation
aus der Gesamtmenge° b) Die Berichterstellung,
d.h. die (meist summarische)
der Informationsmenge,
Zusammenfassung
oder Teilen daraus, nach gewissen nicht not-
wendig automatisch in der Mengenstruktur gegebenen Merkmalen. c) Die Aufarbeit~ng der Informationsmenge,
d.h. HinzufSgung, Ausstreichen
oder Ver~ndern von Teilen der Informationsmenge.
(Eine spezielle Form
der Aufarbeitung ist die Format~nderung, d.h. das Hinzuf~gen oder Fortlassen yon Information relativ zu jeder vorhandenen Teilinformation.) Historisch gesehen ergibt sich bez@glich der Struktur oder Organisationsform yon Informationsmengen folgende Entwicklung
(Abbildung ] zeigt
einen Versuch zur schematischen Darstellung): Der erste Schritt zur Zusammenfassung yon Information ist die Liste, wobei die einfachste Form die fortlaufende Liste ist. Als Datentr~ger in der urspr@nglichen Form dienen Medien auf denen lesbar geschrieben werden konnte. Die Abfrage erfolgte manuell, die Liste wird nach dem infrage stehenden Eintrag
(normalerweise startend am Anfang der Liste)
durchsucht. Eine Berichterstellung
ist in den meisten Fallen unmSglich,
da Einzelabfragen sehr zeitraubend sindo Die Aufarbeitung erfolgt manuell durch Hinzuf~gung eines neuen Eintrags am Ende oder dutch Streichung ~berflSssig gewordener Eintr~ge. Eine ~nderung im Listenformat fiche Information per Eintrag) keiten, da die zus~tzliche
(zus~tz-
f@hrt normalerweise nicht zu Schwierig-
Information ohnehin nur f~r die neu hinzuge-
f@gten Eintr~ge verf~gbar ist. Der n~chste Schritt ist die geordnete Liste mit den gleichen Medien als Datentr~ger.
Eine geordnete Liste entsteht aus einer fortlaufenden Liste
durch Sortierung nach einem Ordnungsbegriff.
Es ist auch m~glich, dab
eine fortlaufende Liste automatisch geordnet ist, z.B. bei chronologischen Listen wie Kirchenbuchregistern.
Die Abfrage ist wesentlich vereinfacht und erleichtert damit die Berichterstellung.
Bei der Aufarbeitung treten Probleme mit der Einschiebung von
Eintr~gen auf. Jede Menge daf~r vorgesehener Platz ersch6pft sich. Das f@hrt entweder zu einer Zerst~rung der Ordnung oder es muss eine neue Liste erstellt werden. Ein gewisser Ausweg sind die Erg~nzungslisten und Hinweise auf solche in der Basisliste Gesamtinformation). @bersichtlichkeit,
(anstelle des Eintrags der
Derartige Verfahren f@hren jedoch schnell zur Unz.B. werden er6ffnungstheoretische
Werke f@r Schach
immer wieder neu aufgelegt. Der n~chste Schritt ware das Auseinanderbrechen der Liste in Einzeleintr~ge, die Kartei. Sie stellt gewisse spezielle Anspr~che an die Medien. Die Schwierigkeiten in der geordneten Liste bez@glich Hinzuf~gen von Eintr~gen si~d beseitigt. Die Erfindung der Lochkarte und die damit verbundene elektromechanische Behandlung von Information bedeutete die M6glichkeit, einzelne manuelle Verarbeitungsschritte
zu automatisieren. Die semi-automatisc~e Einzel-
abfrage ist jedoch im Normalfall zu zeitraubend. Die Berichterstellung kann weitgehend automatisch erfolgen, jedoch mu~ die Lochkartenkarte~ f~r das Programm, d.h. die Tabelliermaschinenschaltung, reitet werden
speziell vorbe-
(Sortieren, Mischen und andere spezielle Arbeitsg~nge).
Die Aufarbeitung erfolgt semi-automatisch.
Problematisch wird die For-
mat~nderung, die meist zur Erstellung einer neuen Kartei f~hrt. Benutzung anderer Medien wie Platte oder Band erm~glichen vollautomatische Verarbeitung und f@hren zur Datei. Normalerweise ist diese, ~hnlich wie die Lochkartenkartei, relativ zu einer bestimmten Anwendung organisiert. Der Programmierer "~ffnet"
(OPEN) und "schlie~t"
(CLOSE) die Datei,
je nachdem ob die zugeh6rige Anwendung l~uft oder nicht. L~uft die Anwendung nicht, wird die Datei unter Umst~nden sogar physikalisch vom System entfernt; jedenfalls ist sie normalerweise nicht f@r andere Anwendu~gen zugriffsbereit. Abfrage und Berichterstellung sind auch nur f~r bestimmte Anwendungsprogramme m6glich. Die gleichzeitige Bearbeitung mehrerer Anwendungen yon ein und derselben Datenstation oder yon einer oder mehr Anwendungen von verschiedenen Datenstationen wird problematisch. Aufarbeitung und Format~nderung erfordern die automatische Erstellung einer neuen Datei.
Eine Vielzahl
yon Anwendungen
menge f@hrt zur Datenbank.
und Benutzern
fNr ein und dieselbe Daten-
Ihre speziellen Erfordernisse
werden im fol-
genden n~her erl~utert.
2o
DATENBANKEN
Implizit
enthalten
minimalen
in der Definition
Redundanz
st~ndlichen Zugriff
UND DATENBANKSYSTEME
Struktur,
zu einer Datenbank
erfolgt normalerweise
@berwachung
ter. Neben der Erhaltung
Systemprogrammierer Beantwortungszeit physikalische
Anwendungen
der Datenbank
der Integrit~t
eine optimale und Speicher
Organisation
weise von Indizes
ist das Konzept der
einer f~r den Benutzer ver-
dem Datenmodell.
Benutzern mit verschiedenartigen eine fortlaufende
der Datenbank
und die Notwendigkeit
von einer Reihe yon
gleichzeitig.
durch einen Datenbankverwal-
der Datenbank
Erzielung
streben diese
von Leistungsfaktoren
an. Sie interessieren
der Datenbank~
Das erfordert
wie
sich daher f@r die
einschlie$1ich
der Wirkungs-
und Zeigern°
Die Anwendu~gsprogrammierer logische Datenmodell
oder "Enduser '~ interessieren
und f@r Wege zum Wiederauffinden
sich f~r das
und zur Aufarbei-
tung yon Datenbankelementen. Um zu verstehen~
welche Forderungen
der Anwendungsprogrammierer, wendungen Zun~chst
yon Datenbanken
oder Begriffs
yon Stapelverarbeitung
oder nachdem eine bestimmte Menge der Echtzeitverarbeitung tenmenge
(batch processing)
erinnert
erfolgt die Verarbeitung
gruppenweise
und
haben, m~ssen die An-
werden.
(real time processing)
Bei der Stapelverarbeitung Merkmales
n~her untersucht
sei an den Unterschied
und Echtzeitverarbeitung
beide, der Datenbankverwalter
an Datenbanksysteme
an bestimmten
zur Verarbeitung
(Abbildung
bez~glich
2),
eines
festgelegten angesammelt
Terminen ist. Bei
wird jeder Schritt sofort auf der gesamten Da-
ausgef~hrt.
Au~erdem sind bei den Anwendungen
zwei Parameter
von besonderer
tung: . die Voraussehbarkeit die H~ufigkeit gleichartiger
Zugriffe
(Repetivit~t).
Bedeu-
Hierbei gibt es bezNglich beider Merkmale eine Reihe yon Mischungen. Man wei~ z.B. nicht im voraus, nach welchem Tell eines Lagerbestands ein Magazinverwalter fragt. Was er darOber wissen will, ist jedoch genauestens bekannt.
Im allgemeinen kann man Datenbankoperationen
folgende verschiedenartige Operationen einteilen
in
(Abbildung 3):
I. Wirkungsvolle Ausffihrung sich wiederholender Arbeiten
(traditionelle
Stapelverarbeitung). 2. Im voraus definierte Abfragen 2
("Wie gro$ ist der Lagerbestand an
Zoll N~geln ?").
3. Zuf~llige, schlecht strukturierte und unvorhergesehene Abfragen
("Wie-
viele Ingenieure in Hamburg haben ein Monatseinkommen von mehr als DM 6000.-- ?"). Ein System, das Nr. I und 2 behandelt, wird "Operational"
oder "Supervisory System" genannt, ein System, das Nr. 3 behandelt, ein "Informa,ions" oder "Executive System". Beispiele for beide Gruppen w~ren: "Operational" Systeme: Bank mit Datenstationen an jedem Schalter, Flugreservierung, Flugsicherung. Informationssysteme;
BOcherei mit Aufsuchen von Information nach Kenn-
wort, Marktinformation fNr Management, Datenbank mit Personaldaten. Ein und dieselbe Datenbank sollte normalerweise die Anwendung beider Systeme erlauben.
3.
SPEZIELLE ANFORDERUNGEN AN DATENBANKEN
Es wurde bereits auf die Forderung der minimalen Redundanz hingewiesen. Die meisten Band-Bibliotheken enthalten eine FOlle von redundanten Daten. Unkontrollierte Behandlung der Frage der Redundanz kann (wie z.B. bei vielen BOroablagesystemen)
zu der Notwendigkeit h~ufiger Um- oder Neuord-
nung fOhren. Eine weitere Frage ist natOrlich der Verbrauch an Speicherplatz und die damit verbundene Kostenfrage. Mehrfache Kopien derselben Daten k6nnen au~erdem wegen eines m6glicherweise verschiedenen Aufarbeitungsstandes zu verschiedener Information fOhren. Ziel einer Datenbankorganisation sollte es also sein, Redundanz zu vermeiden, w o e s
6kono-
misch richtig
erscheint.
chen Wiederherstellung erforderlich
Aus Gr8nden der Datensicherheit
fehlerhafter
Daten kann jedoch einige Redundanz
sein.
Eine weitere Forderung
ist die Vielseitigkeit in der Darstellung von
Datenbeziehungen.
Verschiedene
logische
die jedoch alle auf derselben
Dateien,
Sehr bedeutend
Programmierer
Entscheidende
Benutzer
einer Datenstation
einheit,
die ein System bew~itigen
Verkehrsvolumen, (throughput)
Leistungsfaktoren
Bedeutung
erwarten
(Hinzuf@gen
Leistungssteigerung
der Obertragungen
tere Ma~nahmen in mehrere Datenbank
in Betracht
yon mehreren
beitungssysteme
in der Sekunde rasche
etc.).
ohne Bedeutung.
ist ein Dialog mit einer Antwortzeit
cheneinheit
yon Einflu$
Es ist notwendig,
Nat~rlich
and privacy"
Kontrollen
so gestaltet
nicht
zerst6rt werden
System mu~ daher die M6glich-
= Datenschutz).
gesch~tzt wer-
Diese Forderung
kann ~ber-
da~ das System die Authorisation
und seiner Aktionen ~berpr~ft sollten
der Re-
beinhalten.
tragen werden auf die Forderung~ Benutzers
von 2 Sekunden
untereinander
In vielen F~llen m~ssen Daten vor dem Zugriff Unbefugter ("security
F~r
des Datenbanksystems.
oder andere "Unf~lle"
( D a t e n s i c h e r h e i t ). Jedes
den
Stapelverar-
ist die Leistungsf~higkeit
da~ Daten und ihre Beziehungen
keit yon Datensicherheitstests
der Datenbank
(Stapelverarbeitung).
auf die Leistungsf~higkeit
durch Maschinenfehlverhalten
sind wei-
Ihr Entwurfskriterium
gewisse
erforderlich.
Um die erfor-
aus. F~r traditionelle
des "batch processing"
Anwendungen
zu
oder Zugriff zu einer
ist die Effektivit~t oder weniger
erfor-
Steigerung
wie z.B. Aufspaltung
(Dezentralisierung)
ist die Antwortzeit
Es gibt heute
in den Griff zu bekommen,
zu ziehen,
Rechenanlagen
je Zeiteinheit
und Gro~banken.
Bank-Zweigstellen
besser
Einzeldatenbanken
je Zeit-
ist. Systeme mit hohem Verkehrsvo-
ist eine weitere
von weiteren
fur die
der Obertragungen
die 10 und mehr Obertragungen
Bei derartigen Anwendungen
derliche
beruhen.
kann. Es gibt Systeme mit geringerem
lumen sind z.B. Flugreservierungssysteme bereits Anwendungen,
Datenbank
sind die Antwortzeit
und die Anzahl
bei denen die Anzahl
von geringer
benutzen unterschiedliche
der Leistungsf~higkeit eines Datenbank-
sind die Aspekte
systems.
dern.
und zur mOgli-
(z.B. durch ein Passwort).
sein, da~ geschickte
nicht ohne weiteres
umgehen
k6nnen.
und notiert werden,
soda~ falscher Gebrauch
Programmierer
Auch sollten die Aktionen nachtr~glich
eines Die sie
~be~acht
herausgefunden
werden kann. Ebenso ist es erforderlich,
da~ die Datenbank selbst lau-
fend @berpr~ft werden kann. Au~erdem tritt die Forderung auf, Anwendungsprogramme unabh~ngig yon der Datenorganisation und Zugriffstechnik zu schreiben (Datenunabh~ngigkeit). Z.B. bietet IMS [3] einen gewissen Grad yon Datenunabh~ngigkeit, indem neue Datensegmente an bestimmten Punkten der Hierarchie ohne Programm~nderung hinzugef@gt werden k~nnen, oder auch die L~nge eines Datensatzes oder die Aufteilung der Datenbank in Datengruppen ge~ndert werden kann.
4.
DATENBANKSTRUKTUREN
Die Funktion einer Datenbank ist das Abspeichern der Daten und der Beziehungen zwischen den Daten. Die logische Beschreibung einer Datenbank wird das Datenbankschema genannt. Ein Schema definiert also das Datenmodell fur den Anwender. Ein Subschema ist die Aufgliederung der Datenbank f~r ein spezielles Anwendungsprogramm. Abbildung 4 zeigt das Zusammenwirken der verschiedenen Teile innerhalb eines Datenbanksystems und insbesondere die Bedeutung der Begriffe Schema und Subschema. Abbildung 5 zeigt die Aufgliederung einer Datenbank zur Arbeitsplatzbeschaffung. Die Beziehungen zwischen den einzelnen Dateien sind klar ersichtlich. Die Arbeitgeberdatei gibt die Einzelheiten zu dem Feld "Arbeitgebernummer",
die Talentdatei die Einzelheiten zu dem Feld "Gefor-
dertes Talent" in der Arbeitsplatzliste. form f~r Datenbankstrukturen:
Hierbei zeigt sich eine Haupt-
die hierarchische Gliederung.
Die Dateien
"Arbeitgebernummer"
und "Talentgruppe" sind Untergliederungen der Datei
"Arbeitsplatzliste"
~Eltern-Kind-Beziehung).
Die M@glichkeit Beziehungen zwischen den einzelnen Datenfeldern in der Datenbankstruktur zum Ausdruck zu bringen, hat zu drei wesentlichen Datenbankorganisationsformen gef@hrt: ]. Die hierarchische Datenbankstruktur
(Abbildung 6). Hierbei hat der
hSchste Level einen und nut einen Knotenpunkt,
die "Wurzel des Baumes".
Jeder Knotenpunkt eines anderen Levels erh~it genau einen Knotenpunkt in dem n~chsth6heren Level zugeordnet.
Knuth
[4] definiert
sprechend
einen Baum oder eine hierarchische
Struktur
ent-
als "eine endliche Menge T von einem oder mehr Knotenpunk-
ten mit a. einem speziell
ausgezeichneten
Knotenpunkt,
der Wurzel
des Baumes
und b. m~O verbleibenden
disjunkten
(unverbundenen)
wobei jede dieser Teilmengen Teilbgume
genannto"
IMS [3] verwendet
die hierarchische
2. Falls ein Knotenpunkt Ebene zurNckgef@hrt
Netzwerk
~'
bezeichnet.
Die entstehende
zeigt einige einfache
Komplexere existieren.
entstehen,
nur ein spezieller
Netzwerkstruktur wenn mehrfache,
ohne Redundanz
den Datenbankelementen
Abbildung
NatNrlich
7
ist
Fall dersel-
ist ein Stammbaum. Level
und Redundanz
zurNckgef~hrt
werden.
k6nnen
Die Aus-
[I] fNhren zu einer Netzwerkstruktur. auszukommen
und die Beziehungen Kalk@l darstellen
data base" nach Codd
zwischen
zu k6nnen,
(siehe ausf~hrliche
Be-
in [2]).
Die Grundoperationen
zur Formung neuer Datens~tze
Die Sprache
aus sehr elegant,
doch haben sich Implementierungen
Leistungsf~higkeit mit Datensgtzen
erscheint
sind Vereinigung
und Durchschnitt.
vom mathematischen
bisher wenig durchgesetzt.
auf dem gleichen Level
keit des Datenmodells manipuliert
des Wortes Sprachbe-
nicht algorithmisch
von Mehrfachindizes
als algebraischen
f@hrt zu der "relational
"Netz-
den Elementen verschiedener
auf Baumstrukturen
der Codasylgruppe
3. Die Forderung
schreibung
zwischen
Unter EinfNhrung
verwendet.
yon Netzwerkstrukturen.
oder Baumstruktur
Netzwerkstrukturen arbeitungen
"plex structures"
Beispiele
Beziehungen
Gebrauchs
wird im angloamerikanischen
einer einfachen
Strukturen
bestimmbare
nicht mehr
Struktur wird als
Wegen des vielseitigen
reich hgufig die Bezeichnung eine hierarchische
einer h6heren
werden soll, kann die Beschreibung
in der Datenindustrie
ben. Ein Beispiel
Datenbankstruktur.
auf mehr als einen Knotenpunkt
durch einen Baum erfolgen. werkstruktur
Teilmengen T I ..... Tm,
ein Baum ist. Diese Teilmengen werden
und Einfachheit
werden k6nnen.
aus Gr~nden der
Die Vorteile yon Datei
gliedern
sich um Obersichtlich-
der Sprache mit denen Beziehungen
Darstellungen
Form k~nnen durch Verwendung
Standpunkt
in "relational
von Mehrfachindizes
data base"-
und Redundanz
auf
obige Formen der hierarchischen oder Netzwerkstrukturen
zur~ckge-
f~hrt werden. Im Zusammenhang mit Datenbankstrukturen wird h~ufig yon Listen und Ringen gesprochen (chains or lists, rings). Diese Strukturen beziehen sich jedoch auf die Art, in der Datens~tze innerhalb einer Datei untereinander verbunden sind. Sie beschreiben daher Techniken, wie logische Strukturen aus physikalischen erreicht werden, w~hrend die unter I-3 beschriebenen Strukturen spezielle Formen logischer Strukturen darstelfen. Ein entscheidendes Element f~r beide, die Listen- als auch die Ringstruktur,
sind die Zeiger (pointer),
die yon einem auf den folgenden
Datensatz weisen. Bei der Ringstruktur sind dabei normalerweise zweiseitige Zeiger gebr~uchlich.
5.
DATENBESCHREIBUNGSSPRACHEN
Eine Sprache, die die logische Datenstruktur beschreibt,
sollte die
folgenden Forderungen erf@llen: Die Gliederung in Datenmengen wie Dateien, S~tze, Segmente, Datenelemente, sollte klar beschreibbar sein. Jeder Typ einer solchen Mengeneinheit sollte spezifisch bezeichnet sein (z.B. sollten 2 verschiedene Satztypen verschiedene Bezeichnungen haben). Die Untergliederung einer bestimmten Datenmenge in bestimmte Untermengen sollte klar erkennbar sein (welche Datenelemente in einer bestimmten Datengruppierung enthalten sind etc.). Die Aufeinanderfolge mug spezifiziert und Wiederholungen sollten aufgezeigt sein. Die Sprache sollte ausdr~cken, welche Datenelemente als Indizes benutzt werden. Beziehungen zwischen Satztypen, Segmenttypen etc., die die Grundlage der Datenstruktur bilden, m@ssen spezifiziert und klar bezeichnet werden.
10 Nach J. Martin [5] ergeben sich je nach dem Gesichtspunkt des Benutzers verschiedene Level der Datenbeschreibungssprachen (Abbildung 8): I. Die Sprache ffir den Anwendungsprogrammierer, schema beschreibt in DL/I
(z.B. die Datendivision
(PSB = program specification
2. Die genere!le Beschreibung bankverwalter
des Schemas der Datenbank,
ion). Die COBOL Datendivision
3. Die physikalische losgel6st
block)). die vom Daten-
angewandt wird (z.B.: DL/I logical data base descript-
einem Schema zu beschreiben. werden.
description).
die das Datenbanksub-
in COBOL oder die PSBs
erlaubt z.B. nicht, die Beziehungen
Datenbeschreibung
Im Gegensatz
(z.B.: DL/I physical data base
zur logischen Datenbeschreibung,
ist yon Hardware- und Speicherfiberlegungen,
doch fur Leistungsoptimierung Auger DL/I ist wahrscheinlich
in
Sie kann daher bier nicht verwendet
die v@llig
sind diese je-
sehr interessant.
CODASYLs data description language DDL
die bekannteste Datenbankoeschreibungssprache.
6.
0BERLEGUNGEN
BEI DER HARDWARE
Es sind Datenbanken yon der Gr6~enordnung Bytes bekannt. denkbar,
yon mehr als 4 Milliarden
Das entspricht 40-50 Platteneinheiten
eine Platteneinheit
igngerer Zugriffszeit
IBM 3330. Es ist
durch eine gr6~ere Speichereinheit
zu unterst~tzen,
mit
ghnlich wie beim virtuellen Spei-
cherkonzept zwischen Kernspeicher und Platte. Die vor etwa einem Jahr angekfindigte IBM 3850 liefert z.B. 103 bis 104 mehr Speicherraum mit einer um den Faktor 102 verlgngerten Zugriffszeit. Der Benutzer sieht das System als ein einziges Plattensystem, ffir Leistungsf~higkeitsbetrachtungen sind die Hardware-Parameter jedoch von gr6~ter Bedeutung. Zum Beispiel bestehen strenge Abh~ngigkeiten zwischen Antwortzeit, Obertragungsrate und Direktspeichergr6~e, oder Speicherverf@gbarkeit in der niedrigsten Stufe der Speicherhierarchie.
Die Antwortzeit wgchst mit der
0bertragungsrate und f~llt mit mehr Direktspeicherverf~gbarkeit (weniger paging). Die Obertragungsrate kann mit mehr Direktspeicher gesteigert werden.
11 Andere Hardware-Parameter sind nat~rlich die Geschwindigkeit des Computers, der Aufbau und die Komponenten des Nachrichtennetzes.
7.
AUSBLICK
Die zus~tzlichen Anforderungen f~r Erweiterungen bestehender oder Entwicklung zuk~nftiger Datenbanksysteme gliedern sich um die folgenden Aspekte: a) Steigerung der Leistungsf~higkeit.
Wachstum der Datenbank und der
Anzahl der Datenbankbenutzer erfordern h6here 0bertragungsraten und k@rzere Antwortzeiten.
Die Antwort liegt in geeigneteren Datenbank-
organisationen und einer Minimisierung von Verwaltungsfunktionen. Gewisse Hilfsmittel der Hersteller erm6glichen gin "tuning" der Datenbank, dazu ergeben sich Anwender-beeinflu~te Verbesserungsm6glichkeiten.
Gewisse Verbesserungen sind dutch geeignetere Verwendung
yon Hardware erzielbar (multiprocessing oder ~hnliche Verfahren). b) Fortlaufende Operation.
Die Forderung einer 24-st~ndigen Zugriffs-
m6glichkeit zur Datenbank f~hrt zu gewissen Konsequenzen bei der Implementierung. Zun~chst wird bei Unterbrechung durch Fehlverhalten eine schnelle Wiederherstellung der Datenbank und kurzfristige Wiederaufnahme der Operationen notwendig. Das erfordert die F~hrung eines schnell zugriffsbereiten "Journals". AuBerdem sollte an den besten Techniken zur Fehlerverh~tung,
-auffindung und -korrektur gearbeitet werden.
Eine weitere Forderung ist, die Datenbank - bei gleichzeitiger Fortf~hrung des Routinebetriebs - zu reorganisieren.
Ein Dictionary
[7]
kann dabei als wesentliche Hilfe zum Management der Datenbanken dienen. c) Einfachheit der Installierung und Benutzung.
Die Parameter, die zur
optimalen Organisation einer Datenbank f@hren, sind sehr komplex. Systemhersteller helfen allgemein mit automatischen Organisationshilfen oder Hinweisen in der Dokumentation. Die Frage der Installierbarkeit ist weitgehend identisch mit der M6glichkeit, die physikalische Representation der Datenbank zu verstehen. Wiederum kann ein Dictionary
[7] n~tzlich sein.
!2 Einfachheit der Benutzung h[ngt wesenzlich mit der Beschaffenheit der Sprachen zur Datenmanipulierung
und -beschreibung und dem "inter-
face" zu den Programmierungssprachen Weitere Funktionen,
ab.
die zur Vereinfachung
der Benutzung f8hren~ haben
mit der automatischen Regelung des Informationsflusses zu tun. wesentlich ist hierbei die Handhabung der Kontrollinformation (Kontrollbl~cke)~ wie sie z.B. bei der standard network architecture Um die sp~tere Benutzung zu vereinfachen, geh6rige Systeme auf die M6glichkeit
erfolgto
m8ssen Datenbanken und zu-
zur sp~teren Ver[nderung bzw.
Erweiterung ausge!egt sein.
Literatur [!] CODASYL~
"1974 Status Report on Data Base Activities"
(Z] Date, C.J.~ "An Introduction Addison-Wesley,
to Database Systems".
Reading, Mass.
~3~ Information Management
Ig75
System, "System/Application
Design Guide"
IBM Form No. SH 20-9025 [4] ~nuth, D.E.~ "The Art of Computer Programming3 Algorithms".
Addison-Wesley,
Reading, Mass.,
Vol. I, Fundamental
1968
[5i ~artin, J.~ "Computer Data Base Organization", Prentice-Hall, Englewood Cliffs, N.J., 1975 [6] Senko, M.E.~ Altman, E.Bo, Astrahan, M.M and Fehder, P.L., "Data Structures
and Accessing
IB~ Systems Journal [7] Uhrowczik,
in Data-Base Systems".
12, 30-93 (1973)
P.P., "Data Dictionary/Directories".
I~4 Systems Journal 12, 332-350
(]973)
Medium, das menschfiches Schreiben und Lesen erlaubt.
Fortlaufende Liste
Lochkarte
Band, Platte
Lochkartenkartei
Datei
Abbildung 1
ENTWICKLUNG ZUR DATENBANK
Datenbank
Medium,separierbar je Eintrag
Kartei
Geordnete Liste
Datentr~ger
Datendarstellung
Semiautomatisch, die Kartei wird fLir das entsprechende Programm vorbereitet
Manuell
Manuell, bestimmt durch zeitraubende Einzelabfragen
Berichterstellung
Automatisch unbegrenzt
Auto matisch soweit Information vorhanden unbegrenzt
Automatisch, beAutomatisch, die Datei wird fLir das schr~inkt auf die zu dieser Datei geh6ren- entsprechende Programm vorbereitet de Anwendung
Manuell oder semiautomatisch (sehr zeitraubend)
Manuelt, unter Benutzung des Ordn ungsbegriffs
Manuelles Durchsuchen (generell: Start am Anfang)
Abfrage
Automatisch t unbegrenzt
Automatisch, mit h~iufiger Neuerstellung
Semiautomatisch
Manuell, unbegrenztes Hinzuf~Jgen m6glich
H&ufige Neuerstellung wegen Aussch6pfung des Platzes fiJr ZufiJgungen
Manuelt, ZufLigung neuer Eintr~ige am Ende
Aufarbeitung
Automatisch unbegrenzt
Erfordert normalerweise Neuerstellung der Datei
Erfordert normaler~eise Neuersteliung der Kartei
Kein Problem, neues Format bleibt auf neue Eintrage beschr~inkt.
Formatanderung
Co
14 STAPELVERARBEITUNG
~ " - " " l m ~
{ BATCHPROCESSING)
GEMEINSAME~ ~
i 125.s,,7o.2~ llp
GEMEINSAME ( 26,5. )
•
J + ( 25.s., ~3.01 ) V
t 29.5. )
y
! !
+ ECHTZEITVERARBEITUNG
(
REALTIMEPROCESSING)
T
I' r
ABBILDUNG 2
15
Operational Systeme
InformationsSysteme
Zugriff
geplant oder vorausprogrammiert
spontan, nicht vorausprogrammiert
Typische Beispiele
Bankschalter Ftugreservierung
Verkaufsanalyse, Personalinformation
Typische Benutzer
Bankschalterbeamte, Vorarbeiter, Unteres Management
lnformationsstab, Mittleres Management, Assistentendes h6heren Management
Normalzweck
Unterstiitzung von Routine Operationen
Unterstlitzung von Planung und dringenden InformationsbediJrfnissen
Antwortzeit
Sekunden
Minuten oder Stunden
Implementierer der Anwendung
Programmierer
Informationsspezialist
lmplementierungszeit
Wochen oder Monate
Stunden
Typische Sprachen
COBOL, FORTRAN, PL/I
IQF, GIS
MERKMALE FOR DATENBANKSYSTEME (nachJames Martin) Abbildung 3
I
DATENBANK SYSTEM
1
ABBILDUNG 4
WIRKUNGSWEISE EINES DATENBANKSYSTEMS
SYSTEM PUFFER
ARBEITSBEREtCH DES PROGRAMMS
ANWENDUNGS PROGRAMM A
17
NAME
ADRESSE
NAME
I
ADRESSE
VERFOGBARKEIT
I
i
ERFAHRUNG
ARBE1TSKLIMA
AUSBILDUNG
t
l-t DATEN
GEHALT
SOZIALE LEISTUNGEN
ABBILDUNG 5
AUFGLtEDERUNG EINER DATENBANK A R B E I T S P L A T Z B E S C H A F F U N G
TALENT GRUPPE
TALENT DATEI
ARBEITGEBER NUMMER
ARBEITGEBERDATEI
ARBEITSPLATZLISTE
I
ABBILDUNG 6
HIERARCHISCHE DATENBANKSTRUKTUR
/ \
WURZEL
jl
1
LEVEL 4
LEVEL 3
LEVEL 2
LEVEL
~o
~BBILDUNG
7
DATENBAN KNETZWERKSTRUKTUREN
411
20
ANWENDUNGSPROGRAMMIERER t
SUBSCHEMA
A
i tSUBSOHEMAI ,,
-..../_...scHEMA ./~ZU
GLOBALE ODER GENERELLE DATENBANKBESCHREtBUNG ( DATENBANKVERWALTER)
AUTOMATISCHE AUSF(JHRUNG DURCH DATENBANKSYSTEM
I
PHYSIKALISCHEBESCHREIBUNG
Oa DNUNG SUBSCHEMA
PHYSIKAL1SCHE J SPEICHERZUORDNUNG
I DATENBANKBESCHREIBUNG
LEVEL DER DATENBESCHREIBUNGEN
ABBtLDUNG 8
On the ~ e l a t i o n s h i R Gernot Richter, (G~D),
Sf.
between Information
Gesellschaft
fuer
and Data und
Mathematik
Datenverarbeitung
Augustin
Summary On
the
background
analyzed
of a general
which explicitly
represeniation.
Using a conceptual
to talk about information on
the
representation
In
the
of
with
a
data base management
For information
discussed.
have
been
characterized their
functional
realization.
of
This
gives
in
[ANSI]
recognized
under to
level present
motivation
to
in the field of
allows for the exchange
roles
work stations than
Years ago this kind of functional (Instanz)
consideration.
In
a
of messages
which units
these functional or
within the system rather
and applied in [ABN]
introduce
functional
Recently
as
offices influence each other by communicating been
the
differentiation
communicating
There the term office
units
The significant
and representation
some topics concerning
in the sense of [DIN]. identified
been introduced
of C. A. Petri.
some ideas
~ystems
only by their function
technical
has already
which has been designed manipulation,
a view has been proven to be very useful
consisting
(Funktionseinheiten)
its
systems.
systems
them
and
and data a definition is outlined.
for conceptual
I. A model view of information
considers
systems a view is
information
are presented.
for the information
are
plea
(IMC)
and their
these considerations
data base technology conclude
units
system
structures
For the concepts of format light
between
of information
role of type declarations is shown.
model of information
distinguishes
following
by
units
a suggestion
has been chosen fox the
information messages.
complementary
systems
So the need has
functional
between offices.
the
To this
unit which kind
of
22
functional
units
the
concept of interfaces concept of channel: communication only
term c h a n n e l
(Kanal)
was given in [ABN].
as used in [ANSI] has a direct relation An i n t e r f a c e
The
to
the
is a system of rules which govern the
via a c o n s i d e r e d channel.
by its function within the system
Also a channel is c h a r a c t e r i z e d serving
as
a
facility
where
messages can be posted and taken by the c o m m u n i c a t i n g offices.
This
yields
a model view of information systems
which provides
d e c o m p o s i t i o n into two d i s t i n c t classes of functional - offices channels
-
gained
some
discussion
by the processes they can perform
characterized
by the states they can assume.
publicity,
base management
since
the
and in the area of s t a n d a r d i z a t i o n
With the above model in mind
publication
of
of
two
we
want
offices
recently
[ANSI]
has
is under
(IFIP/TC-2 and I~G)
(ISO/TC 97/SC 5).
via
adequate minimum c o n f i g u r a t i o n to information
To
systems
both in the world of s c i e n t i f i c r e s e a r c h
communication
units:
characterized
This model view applied to data
for the
to
do
a
close
one channel.
examine
the
look
to
the
This seems to be an
interrelation
between
and data.
i l l u s t r a t e this c o n f i g u r a t i o n
where offices are depicted
we use the graphic notation of [PET],
by boxes and channels
by
circles
(in
the
cited paper only e l e m e n t a r y offices and c h a n n e l s are considered). yields fig. is
I.
In the adopted model c o m m u n i c a t i o n
done by exchanging
messages
This
between both offices
via the linking channel.
The arrows in
the above figure only i n d i c a t e the possibility of access and are functional
n o
units.
A further aspect is depicted in fig. only sense if both c o m m u n i c a t i n g
I:
The exchange of messages
offices have a
common
makes
background
of
understanding,
which allows them to interpret the messages found in the
channel.
assumption
The
useful auxiliary
such a "uniwerse of discourse" is a very
of
model for
between t e c h n i c a l f u n c t i o n a l
the
understanding
units.
of
communication
also
23
2. Model i n f o r m a t i o n and abstraction
So
far no reference has been made to a distinction between i n f o r m a t i o n
and data.
But words as "represent"
mapping between two things. there
are
two
abstraction,
and "interpret" indicate
mappings to be considered.
i.e.
a kind
of
It is the goal of this section to show that Both have the nature of an
omission of features not to be considered - hut they
start at different points.
One
kind
of abstraction starts with the so-called initial i n f o r m a t i o n
(Ausgangsinformation), knowledge
which is to
be
understood
or ideas a person has about something
anything else). intended
For a certain
purpose
pragmatic
as
the
whole
context,
i.e.
pursuing
part of it. The information about a person e.g.
is different
for a d m i n i s t r a t i v e purposes and for medical purposes;
information
about
a
technical
from what is needed for e n g i n e e r i n g purposes.
result of the abstraction process information
has
been
(~odellinformation).
yields
indicates,
the
"engineering
called
In
[STEEL] the above abstraction is called the which
the
process for teaching purposes will be So it
i n t e n d e d purpose which controls the abstraction process.
model
an
it might be that not the whole information is needed
but only the "relevant"
different
of
(of the real world or
model".
is
the
In [DURI] the
the
(respective)
similar c o n s i d e r a t i o n s "engineering
The
term
that we are still on the information
of
abstraction,'
model information
level.
In the present
context
we do not adopt any definition of information;
the concept is
used in
the
sense
of
knowledge
or
idea
(about
something).
Thus
i n f o r m a t i o n is viewed as being of mental nature.
It
is
obvious,
that
depending
on
the
respective intended purpose
various abstractions can be performed on the same initial information.
It
is
not
information
of
interest
"exists"
in
this
presentation,
or not - whatever that
whether
the
model
means. However we found the
approach very useful which assumes a level of model information
(as did
also other authors).
Model i n f o r m a t i o n cannot be communicated directly nature.
There must be a r e p r e s e n t a t i o n of it
handed
out
to
the addressee
(on a medium)
which can be
(or which can he stored for later use).
Such a r e p r e s e n t a t i o n is what usually is called between information
because of its mental
"data".
The distinction
and its r e p r e s e n t a t i o n is the background
all the following ideas have been developed.
on
which
24
Now it is possible to show the other a b s t r a c t i o n is
of
a g u i t e different
sense of data)
nature.
C o n s i d e r some messages
which by a g r e e m e n t
have the same meaning.
mentioned above,
between
the
messages
"semantics"
model
information.
There are
informa±ion.
and the process of
Such
rules a
mapping
for
mapping
the
to
the
"interpretation".
So we have an abstraction
pertinent
representational
There
is
one
model
If
several
they all have the
from various r e p r e s e n t a t i o n s
by
ignoring
the
respective
problem
which
might have been apparent
C o n s i d e r i n g the c o m m u n i c a t i o n
already in the
beween an
author
audience he has the need of r e p r e s e n t i n g model information,
he wants to write reference
about.
language
represented
and
is
the
For
this
purpose
beneficial,
in
interpretation
representation
whenever
a
kind
which
of
the
(graphical)
information
following
emphasis
is
laid
and which
can
of which is agreed upon.
g r a p h i c a l language will be p r e s e n t e d in canonical
of
is called
peculiarities.
above discussion. the
information
mapping
usually
messages are mapped onto the same model information, "same meaning".
As
e x c h a n g e of messages is assumed to have the goal
model information.
to
offices
What is "same meaning" in the present case? Any
pointed out,
to exchange
(here in the
communicating
message is c o n s i d e r e d to be a r e p r e s e n t a t i o n of model already
which
and on
be
Such a
used the
for model
i n f o r m a t i o n rather than on one of its possible representations.
3. O u t l i n e s of a c o n c e p t u a l
model of i n f o r m a t i o n
Before dealing with any problems of r e p r e s e n t a t i o n
the
model
What is an adeguate
information
itself
have to be identified.
view of model i n f o r m a t i o n
with respect to a p p l i c a t i o n s ?
brings
least in the past)
us
into
argumentation models"
a
about
(at
This
network,
of
question
very c o n t r o v e r s a l area of
the a d v a n t a g e s and d e f i c i e n c i e s of so-called
(hierarchic,
considerations
properties
relational,
...).
For
"data general
we can avoid this topic by adopting a view which covers
the various ,'data models".
This view has been outlined in [DUHI]
and is
r e f l e c t e d in a c o n c e p t u a l system called I n f o r m a t i o n M a n a g e m e n t C o n c e p t s (IMC).
These c o n c e p t s have been developed as a means for talking about
model information, systems.
in p a r t i c u l a r in the context
Simultaneously,
rules
for
graphic
i n f o r m a t i o n in terms of IMC were developed. IMC
r e p r e s e n t a t i o n of model
Both the basic concepts of
and the related c a n o n i c a l r e p r e s e n t a t i o n s
section to f a c i l i t a t e the treatment of the
of data base management
will be outlined in this
topic
of
"data"
(in
the
25
sense
In
of representation)
IMC
any portion
communication information library,
to
in a factory.
component
Depending
on
aggregate
is
A
way
either
of a
These
immediate
generic
unordered
a
(mathematical)
constructs.
The domain of a nomination
components
selection
of immediate
components
in the Vienna
To show examples
of atoms, above
the
vertex.
(fig.
always
nomina t i o n s
circles. network"
hy
example
of a "relation"
construct
is given
can
at
the
representation
of)
the same construct
the
nature
of
serve
e.g.
manner
[ZEM]).
framework
the
for the
(in the same
Beyond
of IMC.
we first
have
In IMC a box
is shown
either
In a tree r e p r e s e n t a t i o n is e x p r e s s e d
techniques the
is
by t ~
possible.
representation
by small circles are written
of
attached
close
to
the
and the c o r r e s p o n d i n g
of the nomination
we
we cannot.
For
"set
in [DKR].
may appear
representation
of model i n f o r m a t i o n
point
is a set of names.
cf.
In
The names
at the r e p r e s e n t a t i o n the same
boxes.
a to
aggregate
of names is depicted
representations.
whereas
representation.
3).
in that a
(Name)
of a c o n s t r u c t
an
an
nomination
n~me~
and n o m i n a t i o n s
(fig.
to
the
of both r e p r e s e n t a t i o n
in I~C r e p r e s e n t a t i o n
that
within
a
differ
Names only
Language,
canonical
represented
A detailed
If we look notice
a
be a
level)
constructs,
in a nomination
collections,
constructs
or
from
therefore
The c o m p o s i t i o n
the presence
to the component
of
or a n o m i n a t i o n
2) or by trees
of
A combination are
set
Definilion
mentioned
a construct.
aggregation
Atoms
is a in
cannot
(first
of aggregates
function
of names is involved
boxes
i.e.
as a part of
A construct
is of no significance.
to i n t r o d u c e
by nested
finite
of being a c o l l e c t i o n
immediate
the
a
(Atom)
to "be",
relevant
(Kollektion)
types
an
represents
an ~!Rm
its capacity
composition
collection
two
is
no meaning
in
itself.
is
that
in
may be the
an aggregate
i s
(Komponente).
to in a
a book
is either
which
construct
nomination
as s e l e c t o r s
A construct
situation),
of
collection
the property
can be referred
an atom is declared
(in a given
is a ~ R @ ~ 2 ~
the
(Nomination).
A construct
the c o m p o s i t i o n
communication.
within
which
(Gebilde).
Whereas
as e l e m e n t a r y
construct
to information.
a car in an administration,
(Aggregat).
construct,
considered
information
a construct
a family,
a process
be viewed
another
of model
is called
about
or an ~ e ~ a ~ e
compou n d
and its r e l a t i o n s h i p
various appears,
Therefore
of
fig.
in different
2
or
contexts.
locations
3
we In a
where
(the
on the c o n c e p t u a l
level
a concept
is
needed
which
26
allows
to
distinguish
between
different
appearances of one ccnstruct
(within a c o n s i d e r e d e m b r a c i n g construct). (Stelle) pairs
has
been introduced.
(name,
inserted
construct).
at
the
In IMC the concept of
In case of a c o l l e c t i o n the empty
name p o s i t i o n
in the pair.
in
(=relative to)
name
is
The first pair of a spot
d e f i n i n g s e q u e n c e always c o n s i s t s of the empty name and construct,
~R2~
A spot can be defined as a sequence of
the
which the spot is considered.
reference So with the
symbols of fig. ~ the c o n s t r u c t in question appears at the spots
(-,c,)
(home address,c2)
(-,ci)
(place of birth,c3)
(-,c,)
(branches, c s)
which are spots in cio construct.)
(city,c3)
(-,c~)
(The lower case c~s stand
The same c o n s t r u c t
for
the
respective
also appears at the spot
(-,c2)
(city,c~)
in c 2 and
(-,c5)
(-,c3)
in cs.
Another example is c 7 which appears in c, at the following two spots:
(-,c,)
(ho~e address~c 2)
(-,c,)
(date of birth,c 4)
It turns outs
(street,c6)
(number,cT)
(year,c,)
that the concept of spot is e s s e n t i a l
for the discussion
and u n d e r s t a n d i n g of some s o p h i s t i c a t e d
aspects in data base management
systems,
the
not least
information
Fig.
those
(constructs)
2 and 3 show,
always c o n s t r u c t s
concerning
and data
a t
system.
information
models
between
by the way, that in c a n o n i c a l graphic r e p r e s e n t a t i o n s p 0 t s
spo% structure is hierarchic, hierarchic
interrelationshi~
(representations).
But
it
are depicted.
one sigh% be is
obvious,
(in hierarchic,
network,
As by definition
tempted that
in
to
label
I~C
any a
a 1 1
existing
etc.)
the spots
relations,
form h i e r a r c h i c trees. So
far only individual c o n s t r u c t s
have been considered.
types or d e c l a r a t i o n s has been said nor used tacitly. is a set.
But not any set is a type.
determined
what are the e l e m e n t s
we focus on ~ e ~ _ ~ f constructs.
In
the
constructs world
First
of
of such a set. (Gebildetyp),
Nothing about
A type in general
all,
it
has
to
be
In the present context thus the
elements
are
of data base management systems instead of
27
"element"
the terms
"occurrence"
or "instance"
of
a
type
have
been
adopted. But
not
even
constructs
any
set
that only constructs for exchange. be
of constructs
has to be declared
the
specifies
"understood" are made,
via
is a construct type.
considered
channels
by interpretation.
should be called
type(s)
As only representations of
an
what constructs a "type
information
system,
can
a type
and
definition/declaration
is often called a "data definition
one
sloppy terminology
of
are admitted
can
be
in which type declarations
but unfortunately example
saying
of constructs
will be represented
A language,
A type of
communication,
which belong to the specified
Sore precisely:
exchanged
declaration
for a
language,,,
language".
This is
which is so characteristic
for the
field of data processing.
Not even "type declaration will
be
shown
below,
representational
level).
language', would be sufficiently
also
other types have to be declared
Therefore,
is a "construct type declaration composition
of
declaration, applied example
in
constructs
a graphic analogy
to
box
in
the to
representation, occurrence
the This
if
by others is specified
in a recursive type
the
type
definition
canonical
construct
of a particular type
in
particular.
the
be An
5, an occurrence
where in both figures the small
~[R@__~es~nation
emphasis
can
representation.
is shown in fig.
in fig. 6,
"type
language
plate"
is
a place for inserting (Typenbezeichnung)
also
used
in
as
the we
construct
is put on the fact that the construct
is
(cf. fig. 6 and 10).
It would be beyond the scope of this paper to discuss involved
(on the
such a language
As far as only the
upper righthand corner provides say.
speaking
As
(CTDL).
for a graphic type definition
name of the type or prefer
strictly
language"
construct
of that type is represented
precise.
the
aspects
concept of type in general and of construct
types in
The one or the other will he addressed
all in
the
following
paragraphs.
After this very short outline,
concepts to talk about model information
and a canonical
technique
representation
type has been emphasized
because
guestions of representation
of
are available.
its
to be discussed
great
The concept of
importance
for
in the next section.
the
28
~. Data as r e p r e s e n t a t i o n s For
convenience
the
term
" d i g i t a l data" i n d i c a t i n g which
consist
(pictures,
of
"data" that
characters
sounds,
etc.)
is used in the following instead of
only (cf.
are
not
representations
are
[DIN]).
representations
Other
investigated
considered
with regard to their
r e l a t i o n s h i p to information.
R e f e r r i n g to the c o n f i g u r a t i o n of two offices with (fig.
I),
let
the
piece
of
paper
on
r e a l i z a t i o n of a c o m m u n i c a t i o n channel. addressee
three,
that
one
agreement
seven",
or
A multitude
all r e p r e s e n t a t i o n s
there
might
of such
communication. irrelevant,
of
to
So
paper
in
text
taken the
carefully
as "number
for
shape
etc.
granted of
the
in
everyday
c h a r a c t e r s is
On the contrary,
between d i f f e r e n t fonts,
is
default in m a t h e m a t i c a l
literature.
beginning
Or:
In many of
in other places it is.
e x a m p l e s may show that the r e l a t i o n s h i p
and r e p r e s e n t a t i o n make possible
you
because they
meaning which usually is agreed upon at the
or
might
on the c o n s t r u c t level even
and a "plain seven"(7),
usual
~ + 3
and not be interpreted
a difference
are
according
languages the i n t e r s p e r s i o n of blanks in some places is
no relevance,
two
be
agreements
distinguish
programming
These
So
might be i n t e r p r e t e d as "number
but in m a t h e m a t i c a l texts it is not.
a different a
the
The example suggests the
between the c o m m u n i c a t i n g offices.
between a "bar seven"(~)
have
be a
whether
a c c o r d i n g to another agreement the r e p r e s e n t a t i o n
seven",
between
appears
The question is,
two or one construct.
be taken for an a r i t h m e t i c e x p r e s s i o n
have
channel
the i n t e r p r e t a t i o n of the various r e p r e s e n t a t i o n s is the
subject of a g r e e m e n t s to
a
fig. 7
i n t e r p r e t s the five r e p r e s e n t a t i o n s there as r e p r e s e n t a t i o n s
of five, four, answer,
which
(data)
has to be e s t a b l i s h e d
mutual u n d e r s t a n d i n g
between i n f o r m a t i o n
in advance in order to
in c o m m u n i c a t i o n
via a channel.
What
are the p r o v i s i o n s to be made? For a c o m m u n i c a t i o n background
of
to
be
possible
understanding,
r e p r e s e n t a t i o n s onto constructs. agreements
may
be
there
i.e.
a
must
be
a
prior
predefined
mapping
In the course of c o m m u n i c a t i o n
used to extend this cemmon background:
common of
further
One office
passes the d e c l a r a t i o n s to the other, the latter one accepts or rejects them.
The d e c l a r a t i o n s c o m p r i s e
29
- construct
type declaration
- representation
Construct The
type declaration.
type declarations
construct
communicated
were discussed
type declaration
via the considered
in
determines channel.
the
preceding
the constructs
The construct
type declaration
language is a part of the above mentioned common
background.
The representation
a
type.
It
constructs
what
are
type
we
arrive The
at
the
An example
may illustrate
representation intuitively.)
Fig.
to
of
ccnstruct of
channel.
occurrences
of
x~presentation
a
~
language
(RTDL)
mentioned common background.
the relationship
be
Although
necessary
indication
to
the
type declaration
type and their respective
are not
declared
representations
in the regarded
of
concept
representation
is a further part of the above
been
to
admissible
the set of all representations
(Darstellungtyp).
languages
refers the
of this type which can be exchanged
Considering given
type declaration
determines,
section.
which can be
discussed
between construct
occurrences.
here
and
should
it is a very simple example,
depict the ideas presented
type
and
(The used ad-hoc be
understood
many figures have
sc far,
which gives an
about the magnitude of usually implied declarations.
8 shows a declaration
MONTH-NAME,
of the four construct
YEAR and DAY-NUMBER.
types
CALENDAR-DATE,
The latter three are types of atoms,
the first one is an aggregate type.
Additionally
the type
composition
is shown in IMC representation.
Fig.
9
shows
MONTH REPR, the
a
pertaining
YEAR REPR,
declaration
construct types MONTH-NA~E,
DATE
PEPR
is
the
of four representation
and DAY REPR are the representation YEAR,
representation
and DAY-NUMBER, type
for
the
types:
types
for
respectively. construct
type
CALENDAR-DATE. In spite of the extensive remain:
The character sets to be used,
the medium
(paper e.g.)
to the pre-existing Fig. of
declarations
common
course
of
the
assumptions
the arrangement
and other details.
component
type DATE REPR.
of the construct types)
and
still
of characters
on
They all have %o be counted
background of the communicating
10 shows two occurrences
representation
many implicit
offices.
type CALENDAr-DATE
some
occurrences
of
(and the
30
This example suggests that the concept of format belongs to the concept of representation that
only
type.
one
type.
Up to here the assumption has been maintained,
representation
This restriction
of representation declared
type can be declared for each construct
should be dropped now.
If multiple declaration
types for one construct type is provided,
representation
types
close relation to the common use of this term. example of fig. could
9,
declare
representation
representation
of constructs of type
formats,
one "key-word"
It
be
and
can
explicit working
types
(=
above
type DATE HEPR we
formats)
CALENDAR-DATE
in
(two
for
the
"positional"
format).
observed that the separation of construct type declaration
representation in
declaration
(Format)
Referring to the
instead of the one representation
three
each of the
could be called a ~_m_a~
type
existing
decoration
systems.
The
is often simultaneously
area
format.
(=format layout
declaration)
of
the
construct
the specification of the input
This might be a reasonable economical
But to understand the relationship
is
between
information
and
not type and
approach. data
one
should be aware of the double function of such a "data definition". Applying
the
view which has been presented sc far of the relationship
between
information
(representation information
(constructs
and
and
representation
between
two
offices
construct
types)
we
types) outline
via one channel:
and a
flow
properties
(e.g. from a data base).
Office B finds the specified construct
representation
identifies the type of it,
of it),
of
the construct in question into the channel.
regarded channel,
type
(i.e. a
chooses one of the
type declarations and puts
conforms to the representation
of
An office B may be
requested by an office A to retrieve a construct with given
pertaining representation
data
a
representation
As this representation
declaration
established
office A is able to interpret the data
for
the
(knowing the
representation type and construct type). Some
reader
argumentation
might have noticed, is missing,
that in the CALENDAR-DATE example an
why the representations
details of the represented cons%lucts not necessarily processing,
so,
it
because
it
and not the construct. in a representation
only
(cf.
corresponds
fig. to
is %he representation
do not show all
10). Actually, the
practice
the
this is in
data
which occupies storage,
More extensive representations could be provided
type declaration
less extensive declarations,
for
various
etc.). Of course,
capacity of the involved channels
(storage).
reasons
(security,
that would require more In any case the question
31
arises,
whether such a "representation" is really a r e p r e s e n t a t i o n of a
construct.
Strictly speaking,
specifications,
it
r e p r e s e n t a t i o n is there.
Therefore
shows only the ~ a ~ i X ! ~ ! _ _ ~ construct,
is
not.
together
of
the
represented
in "input data")
This leads to the idea,
the
use
definition"
of
the
word
"data"
can partly be justified:
representation
type
in
the
be
entirely
clear
by
that
term ',data
The "data definition" defines
declaration
now,
With this in
criticized
the admissible data,
admissible individual parts of construct representations. should
that
usually means individual part of the full
r e p r e s e n t a t i o n rather than the full r e p r e s e n t a t i o n itself.
its
all
a full
because the r e p r e s e n t a t i o n a l part common to all occurrences
(e.g.
mind,
with
a r e p r e s e n t a t i o n in the a b o v e sense
(Individualteil)
of that type is in the type declarations. da~
Only
which allow the interpretation of the construct,
the
omission
in
i.e.
the
However,
it
of the word "type" is
misleading.
5. Practice oriented remarks
In this
concluding
section
some
applications
of
the
ideas
about
i n f o r m a t i o n and data as discussed above shall be tried.
First
a
preliminary remark:
system of IMC has been offered compete
with
other,
misunderstanding.
IMC
about information,
view
on
as a new proposal of a known
data
models.
data
That
that the model
would
%o
be
a
aiming to he a c o n c e p t u a l tool for speaking
on this level comprising the various
N e v e r t h e l e s s it is a specific
well is
There might be the impression,
c o n c e p % u a 1
data
models.
model and as such offers a
model information which allows to form a wariety of
i n f o r m a t i o n structures,
but has its own limitations,
too.
It is not the task of this paper to outline the features of hierarchic, network,
r e l a t i o n a l or other data models.
in
context,
this
so-called
to
Hut it might be of interest
what these attributes refer.
They refer %o %he
"data structures" which can be established
in a system of the
respective
model and which are supported by the
system's
functions.
With the t e r m i n o l o g y introduced above
we would of course say
" i n f o r m a t i o n structure', instead of "data structure" structure
in
representation efficiency, communication
our
understanding
as
structure
security,
or
purposes
the
any
goal
possible
else
of
structures
as meant here.
of
normally is left to the implementor,
manipulation
the
Data
information
in order to achieve this of
nature. constructs
For and
32
related q u e s t i o n s c o n c e r n i n g
model i n f o r m a t i o n are of main interest:
what levels of a g g r e g a t i o n are nominations what
are
the
restrictions
or
collections
for the nesting of constructs,
special generic types adjusted to the
application
in
On
available, are there
question
(e.g.
"relations",
which in terms of IMC are c o l l e c t i o n s of equally domained
nominations,
called c o l l e c t i v e s
orientation
in
extensive
address c o n s t r u c t s other
questions.
(Kollektiv)),
constructs,
what properties can be used to
(independently of their representation), The
answers
to
these
p e r t a i n i n g o p e r a t i o n s on the c o n s t r u c t s hierarchic,
It
is
a
network or r e l a t i o n a l
matter
of
course,
i n f l u e n c e d by r e p r e s e n t a t i o n of "redundancy" benefits
and
clarified, but
to
chance)
appearance
are of
are of relevance.
of
redundancy.
~ @ ! _ _ § ~
construct
"consistency
(cf.
constraints"
But
it
has
to be
constructs,
of
appears
an
embracing
(Parallelstelle).
type
that a
(necessarily or If
declaration
hy
the system it
will store the r e p r e s e n t a t i o n of the c o n s t r u c t each time it appears
(at
a
parallel
spot)
to be. that
or
It is c o n c e i v a b l e the
same
with the RESULT
(usually once).
The more often the
the higher the degree of redundancy is in p r i n c i p l e
technique
consistency-conditioned
the
less often
is stored,
decide,
Once a
whether
representation
is free to
this
so-called
the SOURCE clause of [DDLC]).
offices)
the
r e q u i r e d
s p e c i f i c a t i o n of this kind has been established,
(as one of the c o m m u n i c a t i n g
be
problem
It has been shown,
at several spots is
has to be specified in the
consistency
The
does not refer to the level of
construct
to
It is not intended here to consider the
at which the same c o n s t r u c t called
a
model
also e f f i c i e n c y and other aspects
techniques
disadvantages
Spots,
many
together with the
data
may appear at several spots as a component
construct. by
the
that r e d u n d a n c y
a
and
(or something else).
that
is one of them.
questions
render
the level of their representation.
construct
what is the support for
could
(and actually is done sometimes)
he
applied
p a r a l l e l spots.
feature of [DDLC]).
said
also
for
other
than
Such a s i t u a t i o n is also given
On the model i n f o r m a t i o n type level
RESULT clause specifies that the atom at the s p e c i f i e d spot is the
result of the e x e c u t i o n of a specified procedure, at other spots as input. additionally
is
In both the
specified,
SOURCE
which uses c o n s t r u c t s
and
the
RESULT
clause
whether a r e p r e s e n t a t i o n of the depending
atom is m a i n t a i n e d p e r m a n e n t l y
(ACTUAL)
by the system,
or is made up
only when r e q u i r e d for passing it via the c o m m u n i c a t i o n channel to r e q u e s t i n g office causes
redundancy.
(VIRTUAL). However
In the strict sense, also
i n t e r p r e t a t i o n of the ACTUAL and VIRTUAL
another,
the
the ACTUAL feature less
restrictive
feature is conceivable,
where
33
the
system still remains
assumed above)
free to follow the s p e c i f i c a t i o n
Doing a closer look to the d i s c u s s i o n of redundancy one encounters
a
(the "system")
is a
unit with a storage as a private channel fig.
11
is
configuration
often
preferred
containing
(input channel,
two
stated.
representations) RESULT
rather
than
are
the is
a
With
this
what is the object channel
which
the
As a matter of fact this is seldom clearly
input format declaration
(e.g.
sequence of atom
(e.g.
SOURCE feature,
made up to one complex declaration package,
d e c l a r a t i o n into the same package.
well known under
1.
a diagram
we have also three places to
complexity of which is still more increased by
"optimization"
fig.
and data base format declaration
feature)
functional
If we consider a r e p r e s e n t a t i o n tyFe declaration,
is applied to?
In particular,
To show explicitly
computerized
channels or still better three channels
the question has to be answered, declaration
configuration
(the "data base"),
data base, output channel)
represent constructs.
type
(in the context of
system
is a slight modification of that used so far.
that one of the offices
like
(as
or to understand it only as an efficiency constraint
data base management systems) which
verbatim
label
"schema',.
minimization
of
packing
the
construct
Such d e c l a r a t i o n packages The
consequence
of
the
are
such
an
the number of characters to be
written by the programmer at the expense of
quality
of
software,
in
particular of clarity.
Finally
some
remarks on the relationship between information
on %he one hand and their manipulation appropriate. or their
on
the
other
hand
and data might
be
If would be an obvious question to ask whether constructs
representations
are
r e p r e s e n t a t i o n s can he handled,
manipulated.
Strictly
speaking,
as was stated previously.
only
But so-called
data
manipulation languages do not refer to the r e p r e s e n t a t i o n a l level
only.
Primarily they are designed for the manipulation of constructs.
This will be illustrated by an example of the retrieval of a construct: The properties which are specified as parameters of a request refer a
construct
rather than to a r e p r e s e n t a t i o n of it.
to
The delivery of the
found construct is done by putting it into the respective channel in an agreed representation, is "navigation".
i.e.
meeting the output format.
This term refers to moving from one spot to the other
in an e x t e n s i v e construct.
Also here no reference to the r e p r e s e n t a t i o n
of this c o n s t r u c t is involved. some r e p r e s e n t a t i o n at.
Another example
Only upon request
of the construct
(at the spot)
In case of a data base management system,
the
navigator
gets
where he has arrived
he does not receive the
34 representation
on which the retrieval has been performed,
representation.
A counter-example,
representation in the data base in the output channel Although
a
information,
this
implementor,
does
the
user
has
reguirements. time
exert
language refers %o the level of model
not
imply
representations
accessed in order to execute several
representation
and
interests access.
to
way
application
of
adequacy
and
resources
will decrease. from
manipulation given
to
computer
computer
and
level.
a
efficiency.
concepts,
security
However,
of
update /
compromise
between
in overall
computing
information
differentiation
hand
computing
(traffic density, balanced
facilities to system interfaces,
view of inforaation
to
A good choice of
More and more it becomes evident,
includes to support conceptual presented
cost,
functions as well as a forecast
the involved people and the intended
to this goal.
On the other
the influence of storage and biased
access
it is up to the
which refer to storage and
should yield
considerations
actual
also the policies of
time,
acting in the future
etc.)
to
move
influence
has
some influence to the information
user's
no
B~t again,
in what way he has provided to be
He
These requirements
retrieval ratio, efficiency
that
manipulation commands.
construct types and of manipulation the
where the
is the same as
(librarian's counter).
takes place in the system.
which
is a library,
(room with book-shelves)
~'data manipulation"
representations
however,
but an output
time
that we have
stractures
and
where more preference is application. wherever
This goal
useful.
The
and data is intended to be a contribution
35
References
[DIN]
DIN/Fachnormenausschuss 44300 "Information Institute
[ANSI]
ANSI/X3/Sparc/DBMS
Study
GMD/Arbeitsgruppe the description (German).
[ PZT ]
Prozesse".
[DURI]
R. Durchholz
and
[DKR]
Beschreibung
Verlag,
"Concepts
T. B. Steel
"Data
Jr.,
IFIP-TC-2
"Abstract 10/5,
(German).
Datenbanksysteme,
E. Falkenberg
base
J. W. Klimbie
Conference
(German).
a "A
status
technical
1975 Elektronische
G. Richter, und
"Design of a data programs
(DAGS)"
Systementwuerfe
und W. Klutentreter,
fuer (Hrsg.),
1974
Description
CODASYL
data
197~
basic system for application Datenmodelle
Report".
for
Namur, January
Objects"
In:
CODASYL/Data
1967
1968
W. Klutentreter,
GMD, St. Augustin,
diskreter Haendler,
standardization
Special Working
Rechenanlagen R. Durchholz,
base
of the DDL",
H. Zemanek,
for
systems"
Basel,
Data Base Management,
(eds.), North-Holland,
base management
[ DDLC ]
zur
Birkhaeuser
In:
"Terminology computer
ueber Aufomatentheorie,
G. Richter,
systems".
American
1971
and K. L. Koffeman,
report".
Report.
fuer Betriebssystemnormung,
(Hrsg.),
DIN
German
1975
of models of job processing
in-depth evaluation [ZEM]
Interim
February
"Grundsaetzliches
Unger,
management
[STEEL]
Group,
In: 3. Colloguium
(~NI),
(German).
March 1972
Institute,
GMD, St. Augustin,
C. A. Petri, Peschl,
vocabulary"
for Standardization,
National Standards
lABS]
Informationsverarbeitung
processing;
Language Committee
DDL Journal of Development,
(DDLC), June 1973
"June 73
36
a~nd
I,,,office ..... %_______
Figure I
_
Configuration of con~unicating functional units
office
office
"user"
"system"
Figure 1!
office B
Extended configuration of communicating functional units
37
name
f•ly
home address
~
iJACKSON I
city
~
I HOUSTON1
~ ~
street
first name
FOHN BiJ
~
street name
place of birth
[ HOUSTON ]
[JAckSON
date of birth number
~
~
year ~ m o n t h
i~71 day branches
[WASHINGTON 1
LOS ANGELES]
[ANN A~oR, 1 t HO~-'STON ]
Figure 2
Constructsin iMC box representation
_
~
Figure 3
..........
.o~)sTo,.
~ % jhumber ~
~)street
i) home address .....
t
.....................
"
- -
/
~ /
X
. ~
/-~hvear f ~onth ~ d a y ~ y q ] ~ j ~ ~__
~"'~lace 7f~irth
.-......] 1 F~os ,,.,~,s]--
fir.~ame Sz~,e
"[ ..................
1 | / ~ branches
t
I ranmalmly~ name~
Constructsin IMC tree representation
streetf-~ name ~ ] -__ "~
k~ic i t y
f ~ ~
¢O O0
39
, ,, /C?. ~
name
f•iy
homeaddress
/
FJAc~SO~
city
~
C3
C~
first name
[JO~N '~-I C6 _
0 s<eeti
_
~
place of birth//
c3
1H o u s T o ~ ] ~
~c~
date of birth f
\ C7
1~7 day branches¢ ..~.,. IWASHINGTON ]
[~os A~G~Es I
j~.,~
[CAMBRIDGE
[ANN A~BOR ]
I~{ousTON
C3
Figure 4
Construct representation of fig. 2 with additional lettering for reference purposes
--c 5
40 EMPLOYEE
----
~ Figure 5
~DSCR
SKILLS
MBE R
Jt
Graphic construct type definition
EMPlOYeE
PERSON
¢
SKILLS~
....
IsKILLCODE I 1120 . J. WA=TERS ]
I
,ISK~LLCODE 1135
NUMBER
5 7 8 ~ Figure 6
Occurrence of construct type defined in fig. 5
41
Figure
see n e x t page
7
construct atom:
JANUARY,
construct atom:
FEBRUARY,
... D E C E M B E R
type Y E A R
1900~INTEGE~
construct atom:
type M O N T H - N A M E
1999
type D A Y - N U M B E R
1~INTEGER~31
construct
type C A L E N D A R - D A T E
nomination:
MONTH
--> c o n s t r u c t
type M O N T H - N A M E
YEAR
--> c o n s t r u c t
type Y E A R
DAY
--> c o n s t r u c t
type D A Y - N U M B E R
non-occurrences:
MONTH
DAY
FEBRUARY
3O
FEBRUARY
31
APRIL
31
etc. CALENDAR-DATE
,•MONTH
YEAR
0
atom
~A_Y-NUMBE__R om....
Figure
8
Construct
type d e c l a r a t i o n s
42
representation
type M O N T H REPR
r e p r e s e n t e d c o n s t r u c t type M O N T H - N A M Z string:
1
or
JAN --> a t o m J A N U A R Y
12
or
DEC --> a t o m D E C E M B E R
r e p r e s e n t a t i o n type DAY R E P R r e p r e s e n t e d c o n s t r u c t type D A Y - N U M B E R string:
DECIMAL representation
representation
type Y E A R R E P R
r e p r e s e n t e d c o n s t r u c t type Y E A R string:
DECIMAL representation
representation
type DATE R E P R
r e p r e s e n t e d c o n s t r u c t type C A L E N D A R - D A T E string: (DAY R E P R "-" M O N T H R E P R "-" Y E A R REPR) or (YEAR R E P R "-" M O N T H R E P R "-" DAY REPR) or ("D:" DAY R E P R /// "M:" M O N T H R E P R /// "Y:" Y E A R R E P R
Figure
9
Representation
; delimiter
",")
type d e c l a r a t i o n s
4+3 SEVEN seven
Figure 7
Five c o n s t r u c t r e p r e s e n t a t i o n s
on p a p e r
43
I'CALENDAR-DATE
DAY0 YEA~0 l DAY-N~M~'4 ' ] 19G7'YEAR1 MONTH 0
4-0CT-1967 D:4,Y: 1967,M:OCT
1967-10-4
I CALENDAR-DATE _ ~ MONTH
DAY_~
--1973 ]
< M:MAY,Y: 1973,D: 14
D:14,M:5,Y:1973 14-5-1973 1973-MAY-14
Figure 10
Construct type occurrences and representation type occurrences of fig. 8 and 9
Figure 11
see first page (fig. I)
Data
A®
Base
Eesearch:
Blase~
H.
A
Surve Z
Schm~%z~
Tiergartenst~.
IBM
Wissenschaftliches
Zentrum~
Heidelberg~
15
Abstract The
research
Most
of
models
activities
the of
issues
information~
implementation industry of
ac%ivl%ies
respect
area
of
tial
future
%0
da%~
OF
and
between
with
%rends
Introduction Models
3.
Data
Manlpulation
4.
System
data
modelling user
and and
and
data
data
systems
are
institutes
reviewed.
center
around
manip~lation~
system
and
Comparison
analysis.
requirements
development.
potentially
architecture
base
base
research
shows
emerging
are
principles
with
with
differences
Conclusions
and
aspects~
respect
drawn in
to
the
poten-
research°
Languages
Problems
~.
Storage
6.
Modelling
7.
Summary
8®
Bibliography
Structures and and
objective
and
Search
Algorithms
Analysis
Conclusions
INTRODUCTION
and
in
of
CONTENTS
Data
past
by
documented
research
des~n
I®
The
area
and
established
base
2.
1.
%he
interactive
±echniques
emphasis
TABLE
in
considered
of
present
192/
/49,
this
paper
research
is
primarily
activities
in
to
the
provide
data
an
base
overview
area.
This
over
~a--
45
per
does
not
er~
information
information
survey
retrieval
systems
of
such
an
introductlon
to
available
Ll~htfoot:
Jardlne
and
of
T
data
still
help
is a
or
have
been
such
a
The
the
scheme
an
first
our
shown
programs
is
seen
by
base
the
We
will
~
is
is
which
sical
or
internal
is
actually
we
can
The
use
are
the
of
between selec±
the
conceptual
conceptual
a~e
specified
in
the
in
conceptual the
never
in
subpart
a
and of
the
definition
external information
the
It
with
the
It
is
a
standard-shown
designer
IMS
in
through the
views
exist
of
a
serves
in
as
the
the
double
phy-
form~
help
of
All
of
The
these and
mappln~s
purpose:
sufficient
a
C[!),
administrator
language.
a
central the
Informatlon
[ mapping
mappings.
and
a
conceptual
with
the and
langua@e.
examplel
way
base
as a
at
syntax
base
of
to
installatlonT may
For
way
aspects
referred
"correct"
data
serve
the
or
mapping
neces~ry
informa--
legal
form
of
of
system
system
{fig.
is
information,
pepresents
the
the
data
mapping
and
of
usually
It
physical
internal
information
Is
reflects
Given
responsibility data
been
the
retrieval
information.
corresponding to
the
unconscioesly,
information
directly.
of
information
memory.
of
what
defining
used
vlews
of
type
grammar
is other
view
stored
construct
mapplng~
mappings
for
to
has
major
views
for is
~iven
a
describesv
view
(D.Ao
schemes
or
point.
group
For
specifies
point
a
to
during
responsible This
A
conceptu~l
Experlence
which
knowledge~
persons~
flow
level.
reference
by
similar
central
schema
The
J.A.
S[stems
accepted.
consciously
shows
administrator".
a
and
definition
widely
employed
very
data
group
to
in
commer-
addition
in
our
make
authors
the
information.
similar
Users
question,
is
scheme
conceptual
therefore
a
experience
conceptual is
the
Barnett
-- A
of
danger
interested
of
~{anagemen[ 1974}
already
I and
schema
{A,J.
the
ago.
and
view
IMS
Base
implemented~
fig.
as
of
iS
some
[IMS)
D~ta
the
decade
in
who
depth
Vurth--
aspects
aware
reader, In
Is
and To
who a
integrated
"data
debates.
book.
well
Amsterdam,
This
in
are
System
architecture
concept~l
information
the
of
nearly
is
The
stored,
system?
software,
survey.
(ANSI/X3/SPARC}
mappln~s~
tion.
this
~ which of
such
[n
base
non-compute~orlented
the
systems~
~olland~ in
WedekindVs
scheme
data
base
Management
subject
scheme
group
Date's
to study
No~th
base
simplification ization
recommend field,
to
data
We
the
referenced
a
and
addressed.
Development.
editor)
Is
systems not
Information
litemature
available
and
data
Evolutionary
What
are
limitations
cially
with
commercially
for
{a} a
to
spe--
Fig.
of
a data
base
parametric interactive application programmer data base administrator
external conceptual internal
I :Structure
Users PU IU APR DBA
Views E C I
APP
APP
system
[<--
O
E
47
cific
use
subview {a)
of
%0
the
a
view~
represents
protection, tems
is
primary
scheduling
query
language
typified
two
by
data
is
{b}
"more
possibly~ natural"
purpose
v user
and
to for
is
isolation
the
a
of
transform %he
i.e.
%he
specific
importance
etc.~
slmilar fairly
via
high
the
some
is
incorporated
at
a
the
selected
use.
for
Polnt
reasons
essentially
of
a
a
of
for
higher
sys-
needln~
than
written
into
the
which We
a
the
query
a
user
and
manl--
experts.
well
This
defined
ac--
interacts
language
language
-- s u b l a n - -
manipulation
programmers
wlth
programmers
manipulation
host
data
application
by
the
vlew
data
application
data of
talk
help
This by
users
performs
performs
have solver
language Ke
who
structure.
general~
level
view.
we
problem
query
without
sublanguage. In
The
programs
First
interactive
user"y
simple
language as
consider.
conceptual
level
application
rela%ionship.
%fen
to
the
"parametric
~rogramming
in
called
to
parameters
system
gu~ge
also
groups
"non-DP-professional".
for
with
end-user
user~
very
different
%ions
major
the
at
pulation
is
and
aspects.
ape
is
base
which
%he
There
of
data
language
data
manlpula-
language.
In
practlce~
ly
to
data
be
large
used
base
by
only
problems
the in
preven%ion~
of
one
management
to
access
amounts
person.
system
s±ored
stored
information It
that
it
wlth
reeovery~
therefore
allows
information.
connection
is
are
protect[on~
unlike-
and
in
a
concurrent
a
number
schedullngT
efficiency
to
requirement of
creates
Integrity~
and
a
sharing
Concurrency
system
extremely
of
deadlock
solving
all
these
problems.
While
commercially
plications pert
of
large
number
used
the
for
language
the
rage the
problem activities
conceptual
of
structures
trends
Sections
research
modellin~
contain
research
in
some
the
research and
search
~easuremen%
conclusions and
tO
major
with
and
2
9
and
describe
to
of
IMS)
a
are
devoted
system
and respect
problems
to
deserving
6
will
efforts primary
and results
research.
level
query
5
data
we
wlll
describes as to
section T
and 4
such refer
a
model
models
Section
find
data
section
techniques
Section
to
SUp--
we
high
data In
aspec±s.
analysis
ap-
on
the
user
research.
implementation
algorithms.
towards
single
this
support
concentrates
Correspondingly
oriented to
p~imarily
research
solver.
view
area
(like
users~
interactive
languages
contributions
systems
parametmic
of
system.
manipt*la%ion discuss
employed
Involving
a
sto-
some 7
of
will
recognizable
48
2.
DATA
The
conceptual
ence as
MODELS
in
a
data
close
for
a
as
view
has
base
management
possible
conceptual
model
between
a
of
~s
system.
in%ultive known
in
notions as
The
information
the
a
of
vlew
how
world.
conceptual
A
refer-
should
information.
models.
of
point
such
of
data
Peal
~nd
central
Cle~rly~
possihili%ies
exists
world
introduced
ape
set
which
Peal
%o
view
provides
information
been
Proposals
conceptual
to
encode
Of
course~
be
data
conceptually The
information
is
mapping not
for-
is
that
malized.
Closely of
a
connected
extremely sors~
S
sets er~
has we
visor
a
number
are and
exactly shows
one the
biasing
data
theory.
CODASYL /124/~
One
and
we
~. I.
CRM
gree.
model
information A
n--ary
D1
where such
the as
relation
the
Di
D2
are
x
a
a
to
be
a
of
close
name.
Furth-which as
ad-
taught
students.
to
these
and
ks
an
pPofes-in
professor
course
n~mber
of
P
advises
one
a
consider
objects and
he
and
~*~
a
by
Fig.
reality
2
without
Of
less is
based
the
s±imulatln~ andv
by
and and
are
Sibley as
Codd
the
Information
ideas
Ash
E.F.
on
in
a a
notions
of
Algebr~
of
due /3/.
stimulus series
of
to
Mealy
The
most
for
data
papers
to
subsection.
{CR~)
set
finite
[39--41~
of
43/
named
subset
of
rel~tions a
of
caPtesian
assorted
de-
produc~
Dn
potentially
ks
or
acceptance
finite is
values
set
of
next
numerical a
by
more
/74/,
Model
is
set~
exactly
has
attempts
developed
relation
x
all
and
terms
~elational
the
us
sets
the
students
courses
attempt
earlier
Rovner
devote
will
of
have of
model
let
model.
are
been
within
attended an
We Each
which student
models
An
has
Codd~s
data
%he
Feldman
research
which
of Other
unique
A
and
models
/34/.
successful base
any
courses.
number
in
purposes
situation®
of
data
conceptual
world
him.
professor
a
C is
a
Information
Conceptual set
by
attend
towards
of
illustration
professor
every
Tough± may
and
~ which
for
notion
Fo~
real
students~
know
the
schema.
simplified of
courses
In
with
conceptual
or
infinite string
n--tuples®
To
sets
values, any
of
"scalar"
in
other
wordst
the
elements
Pelationv
data
values a
n-sPy of
%he
49
professors
P # :I PN :A
courses P # :2 PN :B
~
C~ :I CN :M
students
--•C#
:2 CN :C
S~ :1 SN
:L
~
S~ :2 SN
:L
C ~ :3 CN :0
S~ :3 SN
:M
teaches
professors
advises one : many
students
courses
I~" many
Fig. 2:Example situation and schema
: many
50
tuples
~re
tion the
n~med
with
is
homogeneous#
s~me
attribute of
example
information
A
P#
in
contains
The ed all
of P
only
such
relatlons in
fig.
are
elements
al
in
any
be
modeled.
of
other
in
tuple
are
(a)
build
the
can
advised
by
one
is
of
3~
reference.
the
same
This
the
A
rela--
relatlon
allows
a
have
tabular
representation
used
P~
of
in
set
professor
to
some
the
reference
appear
to
the
first
form.
normal
{and
not
sets in
between ways
to
thls
(or
set
examwhich
indicat-
means t or
that
structur-
information
and
can
p~ofessers.
information:
their
wlth
that
For
schema
lists
%hls
ele-
P.
This
way
dis--
key
domain
in
students
store
students
store
or
the
Two
value
relation.
corresponding
all
and
a
their
P~
two
of
is
consequences
least
refer
another
in
as
~elationshi9
at
S
references.
values %o
or
which
scalar has
and/or
keys
same
but
~ and
a~e
the
there
be
the
socalled
This
Consider
fig.
values~
fig. in
way).
Principally we
in
domain~
all
a
of
them.
different
actually
integer
shown
4
with
may have
tuple
key
ease
elements
in
relation
domain
a
shown
relation
another is
two
for
CRM.
a
a
PefePnce
element
play
in
any
names
associated
as
in
wl±hin
tuples
ments. is
relation
domalas
tinct
i.e. names
listing
Some
a
attribute
unique
numbers}
the
professor
his
unique
tuple
o~ {h)
we
which
store
form".
The
every
the
case
relationships are
these
tlon.
to we
However~
therefore relation "thlrd cies.
SC.
The
Codd
professor
[a)
in
professors
%he
the has
relation
of
to
number)~
"firs±
or
normal
%o
/41/~
to
many
normallzato
many
and
relation, the
remove
Date
and
students].
satisfy
Is
additional
an
serve
Codd
pro~essors ~
normalizationsT
essentially
referred
in
advises
relationship
further
be
students
professor
converse
defined
[or not
form.
and
one
int~oductlon
which
Is
would
thls
student/course
ferm"v
Peade~
is
(i.e.
s¢o~e
the
requires
normal
(b)
many
the
Case
between
one
cases
student
student.
For%unately~
courses In
with
advises
the
"second"
some
/49/,
and
redundan-
or
Wedeklnd
/192/.
The
advantages
those7 ers
who
with
ics. tlon~
Since
a
o@ are
background a
relative
relations
used
agree
relation
CR~ to
are
its
in ls
the
set~
complementatlon In
domain
in
tables~
elementary
a
simplicity
apparent
"think"
set etc.
names.
in notions
and
of
operations can
More
its
particular
like
immediately Importantiy~
for
discrete unlonT be
appeal
%o
researchmathematln±ersec-
applied
pro3ection
if may
the be
51
P
S P#
PN
2
I
A
L
I
2
B
M
2
S~
SN
P#
I
L
2 3
SC
C
c~
P~
1
M
2
2
C
I
1
3
3
0
I
2
3
CI~-
Fig.
3:Normalized
C~{ r e l a t i o n s
S (S~ int,
SN char,
P49 int)
P
(P@6 int,
PN char)
key
C
(c@int,
CN char,
P]~ int)
SC
(S#~ int,
Cj~ char)
key
Fig.
4 CRM schema
key
(SJ#
r ef
(P~ t_.ooP . P ~
(C~
ref
(P~ t_o P . P #
)
(P~#) key
(S# , C}% ) ref
)
(S~ t_oo S.S.~ , C}% t_q C . C #
)
52
applled bined
to
relations
using Or
duct
well
Pierce
minology. viewed
es,
the
shown
The
stoned
%hey
to
example,
no
COBOL?
PL/I~ to
ponds
has
used the
a
ALGOL68y most
to
that
±o
of
be
pPedlcate
in
ter-
Codd~s may
calculus
investigated
compro--
relation
"relational
subject
obvious
is
Join
every
may
cartesian
both
be
may
be
approach--
calculus"
and
has
/41/.
are
or
has
the
as
called
o~der
Codd
we
and
structure such
approach
first
and
been
is
which
PASCAL
example
and
equivalen±
intuition
portant
generalization
~elations.
It
different
composition
different
equivalent
43/.
of
of
algebra"
model
/20,
structures
for
new
are
relational
a
predicate
define
that
oN
slightly
"relational
erations of
a
to
a
relations methods
product~
In
as
applied
and
known
It
a
within
a
of
not
structure
the
which
the
consid-
full
science.
record is
used.
violating
critical
offer
computer
hierarchic
frequently
structure
number
does
This
"first
Pang,
It
has,
organization
of
simpler
corres-
is
one
only
normal
form"
imcon--
dition°
The
example
shown
how
one:many to
a
of the
%he
rela%!onshlp
schema
constrain%
many:many
is of
between
~ffected
the
by
professor
constr~in%~
a
students
and
constraints. -
student
completely
professors
Per
example~
relationship
new
relation
the
fact,
is
has
to
has
if
the
relaxed
be
intro-
duced.
Another tion
consequence has
%o
be
of
no
Interest
often
example, name
in
"M",
sor
number
we
allow
a
user
the
key
at
every
two
problems.
user
has
privacy well
to
For
but
for
FirstT know
that
In
the
projection~
in
the
projection
no
a
in
the
two
of
to
if
persons
with
thus invalid.
making One
can
get
%0
a
man's
thls
same
imagine
ways
student
has
the
name.
with
profesThis
may
which
not
does
may
with
is
not There
he
is
on
to
employee's
allowed
managem's
if con-
allowed
the
are
is For
ariseT
salary
statistles
the
his
mana~e~,
number
which
si%uatlons
user
salaries.
man
of
it he
informa-
information.
advisor
domalns~
privacy
the
any
to
compare
find
the
adviso~ this
basic
informatlon9 basic
particula~
of
that
Critical
the
number
Second,
the
of
and
with
order
the
way
a
salary
man
other
more
subset
reasons
of of
that
other
example~
associated
constraint.
happen
only
employee's
numbers
of
learn
howeverT
see
values.
name
%0 is
help
requester
%he
has
there
to
salary, man
know
is
the
the
tolerable,
rain
the
to
since
look
manager
with to
"system"
2~
considered
normalization
encoded
order
the
be
at
of
to
look
immediately salary~
the
contradicting projected
out
it
may
appea~
as
salary
distribution
around
the
one
the
first
%uple
prob-
53
lem
though
lem
is
by
in
SQUARE} are
for
McGee
wlth
common
the
implicit
computer Of
in
tions
for
mercially
Within less
in
model
[like
implementa-
model
P.
and
lime of
as
in
a
it
is
are
those
developed
model
/3S~
/1/
im--
clean, joins s
in
many
employed
of
Senko
is /15S/.
find
other recently
with
of
of
the
as
data
the
a
and
of
graph
model
by
relational
known
/20~
or
designed
binary
also
decom-
more
graph
graph
generally
model
to
model.
up
system a
as
synonym
all
shows
well
papers
PL/I. struc-
models
graph
discusses
DIAM
IBM
adapta--
a
Practically
model
sort
the
execution
are
now
graph
graph
of
1963)0
restrictions
on
~ Other
Science
data
ALGOL68
56/.
models %he
in
The
appears
activities
some
group
in
{as
during
based
research
{as
description
essentially
the
a
and
the
model
9 Amsterdam,
{1969)}.
McGee
model
model
We
formal
Some
papers.
the
Mathematical
of
3
is or
syntax
states
/I,
the
CODASYL
38/.
a
~eneral
are
of
section entities
objects
Holland
"Schema"
origin
this
abstract
PASCAL~
systems
of
an
the
syntax".
set and
and
that
their
over
abstract
6,
activ[tles
graph
37 s
graph
the
s llke
origins
activities
PL/Iv
this
base
The
North
On
In
graphs
interpreter
schema.
number
DBTG
opera%loon
•
model s in
a
entity
a
some
Towards
Walk:
to
Fehder
of
claimed find
discussed
Programming
Wabstmact
data
The
Swenson
K.
this
research
semantics"
the
COBOL,
in
or
case be
model
to
19627 the
model
Automatic
therefore
and
CongP.
influenced to
the
Schmid
file
special cannot
labeled
(McCarthysJ.: IFIP
dels
"data
of
conform
Altman, and
systems These
file
models
McCarthyWs
u
base
data notion
explicitely
/121/.
Astrahan,
prob-
Models
%o
specified
consciously
1968
second
structures.
to
available
data
though
data
back
extensions
refer
sets". flat
a
flat
have
of
"declaration
signers
The
experimental
It
goes
languages
be
of
turn
which
(Lucas
and may
in
entities
has
Reviews
tures
authors.
in
section.
CaM,
the
model).
work
programs
the
between
Laboratories
Annual
what
Data
Proc.
McCarthy's Vienna
in
relations
the
is
explicit
science
to
the
a homogeneous
this
kinds
behind
information)
schema
of
homogeneous
both
or
Computation.
of
the
idea
binary
to
"duplicates
model in
Oriented
The
named
This later
of
Graph
some
allowing
foundation
sense
2,2.
least
known
implementing
/122/o
theoretical make
at
are
examples
described
plementatlons
ways
by
actually
described~
model
elegant
solved
INGRES~ tions
no
mo-
known In
$6~
as
Abrlalls
63~
model
I$~/o to
es--
54
tabllsh
a
connection
between
relations
in
CRY
and
the
the
DIAM
real
world
/isi/.
Among
the
serves
earlier
special
ceptance
of
model,
the
data
base
attention
a
data
/155/0
base
It
system
set
model,
activities.
The
DIAM data
same
mathematical
rigor
than in
stressed
pure
The
subsequently
matical
The
notion
nodes
as
terminology
thing
"that
thought, Some 5,
7
eeg.
IABC'. the
relations
graph.
To a
the
is
related
the
model
We
will
model
and
to
fig.
I®
on
the
dlscuss
the
as
su~zh
a
the
that
its
world
more
DIAM
model
straightforward find
acdata
wlth
fact, real
a
the
standardiza-
the
in
de-
Its
defined
to to
not
mapped
the
entry
and it
possible
of
way
clean
to
mathe-
%o
can
stored
as
as
node entry
fact
events"
a
be
any--
or
/34,
finite
a
set
of
are
labeled
the
node
by
uniquely
entities
are
to
shoe
be
directed
drawing
nodes 6
only
(which
entry
fig.
may
in
in
155/.
identi-
entlties.
Since
between
other and
of
frequently
entity
being and
abstraction most
like
entities.
in
an the
An
of
between
is
of
with
concepts,
rel~tions
the
that
entity.
entitles
simplicity
sets
any
graph
labeling the
graph
over
C~H
we these model
fin--
nodes
in
edges
of
a
in
the
entitles)
we
information and
named
and
other
inter--
node
represent nodes
as
a
ed@es
with
the
representation
example.
of
the
via
in
graph
used.
notion
clear
an
represented
as
are
domains
explicit
be
unary
~
our
node
is
dlstlnction
model
relation
advantage
the
a
between
node
relations
consistent
be
denotations
graph
Fig.
for
model
To
relationships
For
name.
first
from
binary
schema
binary
its
been
be
and
types
of
node
relation.
clean
Other
given
unary
ween
impact
not
associations,
can
the
and
has
has
graph
such
unique
represent
prete
relatlon
call
relations
graph,
assume
~raph.
reality
help
In
blnary
we
have
or
the
a
objects,
with
Information
from
still
of
can
in
and
This
graph
of
in
has
entities
fied
A
described
essential
used
it
essentlally shown
model
formallsm. ~ince
as
effort
foundation.
objects
Its
here
contributed
closeness
mathematical
detail
more
had
CRM.
as
%he
activities
structure
tion
designers
entity
research
symbolic of
the
fop
TIpractical"
to
develop
model
This
removes
domain
entities~
reasons calculus
or
fact
that
to
names.
More
importantly,
is
sense of
the
need
which
mathematical
is
the
not
present
without
convenience.
algebra
oriented
the Like
only
distinguish
bet-
due
to is
in
CaM,
~t
need
%o
deviate
CRM~
It
for
languages
with
is the
55
C~
CN ~C
~N
?S
SC
Fig.
5 :Information
as a g r a p h
56
CN PC 4--
c-~
,s4-SC~
s~ ~_~
ich:I R~--~
one
R ---~ m a n y R 4}--
one
R --
many
Fig.
: one : one : many
6:Schema
: many
to
the
graph
model
int
57
.....
I O
sA~,
>1,, 2300
"SAL
]
2800
I ... 1950
Fig. 7:Subgraphof E, MGR and SAL
I
58
same
rigor
/12~/.
a
• urnish
user
restricts fig.
W
with
to
for
the
an
the a
other
science
does
force
side
subvlew
of
can us
be
to
does
data
Since
not
base
may
which
mapped
exclude
it
the
relationships~
lilustratlon,
computer not
On
seen
practically
these
all to
structures
difficulty
a
{i.e.
be
conveniently
provide
subgraph)
by
the
known
some
form
from
our
to
which
user.
See
structures
in
of
grephs~
high
level
it data
modeling.
2.3.
The
It
is
not
in
the
Equivalence
at
sense
all
surprising
that
in
ple
straightforward
respondin~ ween
of
the
language.
In
question
The
of
of
Bobrow
/17/,
models
are
by
Neuhold
of
creates
3 . i.
Low
Level
As
we
can
see
an
application
%o
as
second
the to
DBTG most
a
we
in
The
models°
DIAM
one
on
how
This
[s~ of
equivalent
model is
a
model
to
a
of
be
simCOP--
choice
bet-
"convenient"
or
however~ the
therefore
can
there
question
question will
or
are
cases
schema
decided
also
of
data
Sihley
same
for
First
a
and
a
while
system.
This
a
has
models /167/
for
superimpose
need
/43/.
DATA
tially
a
models
not
data
come
only
manlpule--
back
to
the
McGee
{at
model
in
investigated /122/0
least
in
creates
a
A
model
on
nsuperimpositlon
results
been
a
new
theory"
thls
Rs
direction
by
Different
the
world
mapping By
a
it
was
are
of
prob-
problem9 stated
reported
by
/82/®
3.
onset
the
section
coexist
on
how the
Codd
EeP.
Frasson
in
/134/y to
~ even
namely
which
the in
other.
be
but
equivalence
likely
researches)
the
must
model next
eonve~%
in
becomes
in
Moreoverv
to
models
the
different
equivalence.
question
lem7
way
data
the
encoded
versa.
schema
processing
question
tion
vice
dlfferen%
"natural" a
and
equivalent
two
that
information
encoded and
CaM
o.f Data....__Mode~s
MANIPULATION
a
"low case
Versus
in
High
fig.
is
program
records in
LANGUAGES
are
of
data or
oe
access
are
LoLic
accessed
interactlvely
typically
programming level"
Level
retrieved
language.
as is
"one the
in at
one
This
a
by
type
record
at
higher
level
a
external
form
terminal, one of
time
and
In
logic"°
the
processed
p~ocessing
"multiple
either
is
first
sequenreferred
Typical records
vla
for at
a
the tlme
59
logic". level
Research logic.
program in
allocation
subset
needs
by
a the
Even
modest
may
very
of
access compared well
plication
be
its
%he
use
to
and
even
Is
still
Of
the
thelm
more
type common
specify to
sub space
in
a
to
todays
level
towards systems
their
processing
Pesul±s
through user
a
viewv
high
oriented implemented
in
oper-
resource
external
systems~
of
tO
and
the
higher
application
going
the
specified
available
for
is
prim~rily
though
the
scheduling
and
be ape
the
required
program
for selection
has
in
is
conceptual
commerclally
it
the
towards
also
logic"
data
the
relevance
as
that
tlme
projects
data to
of
programs
%he
ef~ect~
research to
a
oriented
informatlon
between
nature though
interactive are
In
realize
at
of
this
mappln~
primarily
to
records
purposes, Is
are
important
which
system
program
logic,
is
"multiple
on
The
which
It
case
advance
ate.
activities
ap--
Installa-
tions,
Subsequently searchers then
we
some wlll
is
In
which
they
of
data
models.
Some
Table
I lists CRMt
data
manipulation
referenced. wlth
are
We
languages
start
based
%o l a n g u a g e s
used.
Finally
lansuages
wlth on
which
are
will
come
we
some
of for
the
IS/I
IBM MIT
experimental
some
location
MacAims
data
models,
characterized back
to
it
would
.........
be
systemst more
which
correct
to
remark
reference
algebra
Todd
algebra
Goldstein
UK
RDMS
MIT/MULTICS
algebra
Steue~t
MORIS
Mllano
calculus
Bracchi
SQUARE
IBM
Research
mapping
Boyce
SEQUEL
IBM
Research
mapping
Chamberlin
INGLES
Berkeley
calculus
Held
ZETA
Toronto
definitional
Mylopoulos
DAMAS
MIT
calculus
Rothnie
Table
I.
Some
by
re-
CRM I m p l e m e n t a t l o n s ~
%he
other
developed
the
A by
special the
way
equivalence
Implementations
though
System
the
devoted
CRM
3.2.
ment
be
continue
subsection
of
relational
systems
claim
to
imple-
claim
that
they
60
implement four
homogeneous
represent
concept
of
tlenal XRM~
a
data
and
files
graph
SEQUEL
is
for
system)
a
snduser ing
derived
and
dy
an
Te
give
us
consider
is
an
is
IS/l)
This
tion
In
{P
~
%he
query
the a
data
the
relations
which
on
lan@uage stands CUPID
top its
ef
as
currently
tool
berela--
%o
low
a
mesembles
INGRES
is
ef
RAM,
supporting
definitional
by
top
homoge-
ePiented
management
used
and
the
rela~
of
query
keywomds.
system
data
on
on
top
The
first
between
better)
en
graphics
language
of
let
level
implementer
the
primto
stu-
access.
different
styles
of
query
langua~es~
let
query:
name
of
the
algebra
{S;
C2
is
a
=
~M"
advisor
of
approach)
));
sequence
{operator
=
)%s).
calcuius
Ci
=
%o
query
OF
PRO~
IS
P
RANGE
OF
STUD
IS
S
INTO
R(PROF.PN)
RETRIEVE
=
a
the
we
student~
whose
name
C5)
%
obtain:
C2
selection
v*') I a
refers
oriented
Cl
of
RANGE
=
second
the
selection
value
language
WHERE
(operator and
in
the
to
INGRES)
PROF.P~
=
=
iIth
';'), a
a
projec-
domain.
we
STUD.P~
obtain:
AND
STUD.SN
~M ~
Here
The
answer
P~OF
and
STUD
existential
a
aspect
the
product
{operator
QUEL)
is
has
specifically
rela%ion~tl
expression
cartesian
ZETA
or)
syntax.
Engllsh
and
system
of
binary
based
In
implemented
implemented
directed"
level
is
compact
more
language
%he
"M"P
the
(
It
following
is
a
a
shown)
somewhere
relations
is
stered
QUEL
~WsynTax
impression the
n-ary
has
119/.
high
a
optimization
What
In
a
hls
DAMAS
I%
with
Toronto,
provides
implement
i%Ives.
it
/S3)
~%
calculus,
turn
systems
is is
supports
offers
nine
SQUARE
in
SQUARE
frem
which
interfaces
tions
XRM
the
which
supporting
/110/.
developed
user
relational
which
model
Of
approach
/111/.
management)
the
file.
experlments, an
management
flat
d~ta
early
vlmapping"v
algebra
neous
flat
is are
in
the
variables
quantifications
result in are
relation the
~)
predicate
applied
by
a
unary
calculus defaulT.
relatien. sense
Clearly
ever
which
61
In
SEQUELt
All
of
the
"mappln~"
FROM
P
WHERE
P,P~
IN
SELECT
P#
FROM
S
WHERE
S,SN
=
nine
systems
tion
research
data
solution
of
pointed
out
for
the
three
ing
research
Some
above~
may
model.
In a
there
First
data
effort
this
graph
is
most
DIAM
that
their
already
ZETA
first
genera-
contribution
to
significant
as
development,
we
know
system
to
manipulation called
DIAM
that
At
least
such
SEQUEL
is
on@o-called
model
oriented
entities.
formulated
P{PN}
FERAL
the
where
recently on
model
/82~.
graph
system~
medel~
A very
I~IS a n d describes
which
interesting a
query
nition
This
generated
SN
allo~ on
hierarchical
DIAM
as
languages
of
work
query of
a
the
their which
Language}
continues
language graph
{or
in
binary of
preceding
as an
/157-IS9/.
composition
the
=
rela--
relations
subsection
can
oP
very
and to
QUEL
a
data
In
query
data
to
another form
Nice~
the
system
given
computer~ can
DBTG
then
in to
be
a
and
students
one
comparison.
Implements
the
with
similar
language
to
model
/123/.
is
definition
data
least
languase
developed
map
at
manipulation
similar
a
professors need
developed
offers a
VM1 ;
between
IS/I
research
language
language
possibly
is
some
CRM
Independent
its
for
query
connection
descrlbed of
language
PS
where
top
usin~
follows:
identifier
Mcgee
The
as
property
example
for
the
/72/.
FERAL
query
as
discuss
(Representation
interestin~
FERAL
establishes
single
with
The
in
RIL
will
are
model,
languase II
activities
we
data
with
tional}
form,
called
follow--on and
follow-on
subsection
between
fers
be
though
by
research
oriented
has
es
might
means
INGREST The
;
Systems
FERAL
a
This
Increased
planned.
Senko's
be
be
what
problems~
SEQUEL~
mentloned~
usin G
its
systems. base
systems
Non-CRM
already
are
base data
is
represent
'M';
R,
3.3.
data
obtain:
PN
the
As
we
SELECT
the
System
approach~
a
SIMS
/194/
language.
The
their
internal
hierarchical
accessed
by
the
with A
graph
advantagconceptual
which data
ofdefi-
form
and
conceptual query
lan--
62
gua~e
without
actually
tures
ob~ee%ives~
which
though
SIMS
report
generation,
reports
is
computer level
0.4.
User
%his
A
a
by
design
SIMS
most
of
p~oblem Dana
data~
meets
other
wlth
these
experimental
fea-
systems~
implementations.
i~you%s
which
and
specifically
to
Presser
of
with
report
designed
computer
solve
for
about
this
generated
the an
task
help
of
a
interestln~
/46/.
Aspects
we
apply
missed
earlier
the
natural.
interface
of
Into
access
will
discuss
specific
some
data
technique
manlpulation
with
have
as
CRM
efforts
has
~eneral
purpose
p~og~ammlng
a
powerful
their
75/.
build
an
question, of
%rac% the
of
might
groups.
respect
is
system
management imental
languages
%o
the
whose
interface
to
Further
approach
The
feasibillty
language"
is
Thompson,
found
of
the
subject
in
at
a
as
Is
and
systems
offered fo
for
traditional
best
by at
way
a
It
Kraegeloh
in
natural
report
~nd
lan--
some
is
ZETA
user
R~ND~Z-a
as
natural its
about
±he
data
exper-
/184,
Implemented language
at-
called
TORUS
uses
%o
believe
least
system~
already
of
proposes system
the
the
/42/.
natural
base
reseamchers
language
are
protection inclusion
/149/.
whether
Some
data
efforts
Schauer
data
APL
is
Toronto.
147,
lin~uis-
/156/.
~tcommunicatinn
with
TO r a t h e r
sceptical
data
manipulation
languages
many
applicable~
since
"universe
of
%he
in of
implemented
which %o
be
top
/59/.
attractive
Petrlck
systems,
the
syntax
natural
being
study
language
more
developed
references can
a
and for
lan-
the
these
develop
language
query
combine of
proposal
computer.
in
to Two
the
%o
open~
formal the
freedom
such
being
/131/.
a
embed
a
language on
s%lll
to
to
language
llke
data
computer
currently
languase
tLc
a %he
proposes
which
language
%ha%
~oal
query
currently
target
facilities.
ALGOL
C~M
groups
make
Codd
an
rigorously
end--user
possibility
guage~
I02/.
is
defining
Its
describes
measumement
which
all
VOUS~
into
Interactlve
evaluation
way
research
Earley
structures
as
computational
specific
/44~
data
the
research
with
mechanisms
of
i~e=
the
use~®
guage
A
the
non--trivial
section
series
to
a~e
of
langua@e
deslgners the
a
one
seems
hl~h
In
Is
converting
of
the
computer
considerations. these
discourse"
considerations is
essentially
in In
natural the are
case not
restrict-
63
ed
to
the
simply
A
objects
completely
IS
to
119,
graphical
form.
Into
spaces
free
formulated
easily used
a
menue
extended method.
and has
It
by
Is
to
wlth
geographic
can
point
user
obtain
The
questlon~
cessful Tigations chology of
the
(or
slight
3.5,
As
pointed
ferent
out
data
languages. one
(CRM)
guages
are
attribute name
the
form
the
of
McDonald
display
device,
CRM
by
is
a
a
can
be
McDonald's
query°
such
the
help
Sehauerts of
ZloofWs is
system
asso--
in
displayed
approach
abstract
which map
while
/143,
that
questions
the
contents)
to
questlons
skill
more
to
suc-
Inves-
experimental
question
indicate
Is
reasoning.
Of
methods
posed
opposed
users
In
(with
information
within
the
to
The
some
"examples"
modiflcatlon 19
in
entities.
the
as
of
and
which
queries
error.
oriented
employ
fills
plctuPe
device
answered
to
user
of
/2S/
relations
Simple
and
graphic
unskilled the
Zloof~
llke
subareas
seems
the
are
easily this
CRM
very names
followed
be
which
and
illustrated
{or
"can
schemata.
To
a
of
psy-
183/. of
One
syntax
are
semantics
o~
a
are
/143/.
Equivalence
can
for
in
described
a
the
semantics
user-lnterface
answers
earlier~
corresponding
diagram
GADS
to
for
models
equivalence
and
by
stored
probability
display
or
independent
Model
a
the
extension
locations
of
taken
Example
the an
use
experiments
significance
pate
is
way,
generally
slgniflcan%
flew
unbiased
reported
more
low
cannot
under
find
a a
one
another
are to
wlth
related
wether
than
base
description.
locations.
%o
information
By
draw
to
a
data
requires
of
Query
expresses
natural
is
relation
example
clated
the
method
descr~ptlon
the
) which
query
Their
ZloofWs of
CUPID,
the
the
In
in
approach
149/o
display
stored
dictionary,
different
/198,
used
verbs
data
structured
Schauer
of
and
by
similar and a
we
the to
introduce
SEQUEL.
followed
In A
will
the
for CaM
we
variable by
an
the
briefly
two graph
deal is
that
dif-
respect
indicate of query
with
name.
query
languages, Both
relation by
to that
the
model,
denoted
attribute
know with
equivalence
informally
(GRAPH)
variables.
period
to
we
equivalent
we
extended
other
examplesv
made")
Subsequently be
end
and
by
be
a
lan-
names, relation
64
Example
S
eelation
SN
SoSN
In
GRAPH
we
~elatlons), a
period
name
~ttribute variable
deal
with
A
denotation
set
followed
by
relation
name
a
~elatlon
denotation.
with
obvious
the
the
name
or
sets
(unary is
relation
a
relation A
meaning
set
name
that
A
followed
denotation
the
and or
denotation.
name
set
relations)
a
variable
name
relation
by
may
relations
set
a
also
be
'W~unsU
is
a
followed
by
a
used over
by
denotation
period
a
(binary
followed
as all
a
varlaDle
elements
of
set.
Example S~
S.SC~
PS~
It
should
while
The
CRM
be
noted
is
bound
period
ls
composition
A
query
in
~rom
is
of
the
CR}{
names~ with
In
and fhe
GRAPH
set
is the
both
arise.
to
to
in
the
definition
of
sets
levels,
FROM
llst
a
The
recursive
used
as
the
operator
for
functional
right.
of
predicate
is and help
languages
variables)
form:
list[
~he
is
languages
listl
a
(or
relations
GRAPH
two
both
in
with
ambiguity.
that
~elatlons
denotations
built
In
llstl
sets
PSoSC.CN
left
SELECT
In
S. S C , C N
FSoSC~
llst2
WHERE
attribute is
over
predicate:
names~
variables
list2
is
a
which
can
list be
of
relation
built
startin@
lls%2®
list the of
the
of
relation
predicate relations
use
subsequent
of
is
denotationst over
starting
subscripts
examples
are
set wlth
may such
list2
is
denotations the
sets
a
list
of
which
can
he
in
llst20
be
necessary
to
that
ambiguity
does
avold not
65
Query
1 of
Name
the
who
professort
advises
student
M,
CRM:
SELECT
PN
FROM
P,S
SELECT
PN
FROM
P
WHERE
P.P~
=
S.P~
and
S.SN
=
IMt;
GRAPH
Thls
simple
the
two
query
data
lationshlp of
may
used
be
GRAPH
illustrates
models.
between
pertles
these
will
Query
2
C~M
uses
Names
are
of
the
while
in
graph
the
CRM
apparent
courses
essential
wlth
the
by
attended
do
a
between
logical
re-
unique
por--
of
these
make
we
some
help
model
as
in
difference
%hat
%he
to
has
composition
more
VMt
requires
encoded
Therefore,
functional even
=
normalization
entities
become
P.PS.SN
already
entities
directly.
simply
This
WHERE
relationships
comparison natural
in
where
langua@e.
query°
next
students
which
are
advised
by
vBm°
CRM:
SELECT and
CN S.S@
FROM =
P~
$7
SCoS~
C~
and
SC
WHERE
SC.C~
=
P.PN
=
'B t and
P.P~
=
S.P#
CRM
form
C.C~;
GRAPH:
SELECT
The
brevity
should~ the
and
graph
model with
between can
then
ies
in
a
implement user
a
be
over
CRM.
macro in
accept
GRAPH
simple the
P
terms
their and
on
advantages
Is
top of
of the
as
these
a
%0
the
superiority extend
definitions
encodings.
algorlthm.
to
essential
possible
accepts
CRM
WBt ;
compared
an
convert
forward
language
it
which
=
form
conclude
fact,
queries
the
P.PN
GRAPH
to
In
of
WHERE
the
used
processor
straight
GRAPH all
of
not
entities
has
FROM
elegance
however~
language
the
PS.SC.CN
Thls queries
With
other
macro
into
model.
the
The
CRM
relations processor
CRM
words,
CRM i m p l e m e n t a t i o n
graph
of
querwe
such
can that
differences
66
of
the
is
primarily
away
languages
with
the
and
of the
other
are
of
of
sections
underlying
syntactical
help
implementation. quent
a
thei~
nature
syntax
little
other
deserve
and
they
Issues
of
relevance
questions a~e
appear
since
macros.
practical
Many
models
in
like
the
on can
one
level
be
data
glven those
process
a
the
transformed model
versus
right
sort
discussed
of
which
in
receivlng
of
subse-
more
atten-
tion.
4o
SYSTEM
4 , i,
Introduction
The
major
PROBLEMS
peoblems
concurrent
access
gram
management
with
Iocklng
and
last7
enough
in to
a data
and and
hut
data
shared
not
leasfT times
system
by
schedu!Ing~
like
many
with error
with
high
enough
the
whole
to
make
IMS
users~
system
or
recovery
response
base
are
with
data
with
pro-
Integrity~
independence
data
transaction system
with
application
enforced
isolation~
connected
rates
short
and
attractive
for
the
user.
The
implementation
may
turn
natural full
out that
data
in
wlth
fact
outside
the
area
nection
to
provide
does
and
high
reference
of
data
independence
attached
at
a
purposes~ is
Among
and to
of
the was
all
least
stora@e
though of
the
systems at
supported
SEQUEL~
nearly
therefore
portion
conception
activity in
It
larse
functions,
independence
solutions
so
systems
models
to
data
experimental
manpower,
original
follow--on
will
with
Its
for
and aim
system in
%o
the
we
references the
ce~pect
even time
the
be[n~ problem
above.
base
data
sections
in
projects
DIAM
eesearchers
that
data
system~
management 3~
R7
plans
a
costly
mesearch
base
mentioned
tional
few
System
expePimental~
The
such
quite
section
ambitious
structures.
areas
of be
only
of
set
mentioned very
to
multi--user research bibliography,
a
far
have
not
not
mean
that
level
and
systems. in
query
conslderable
the
area
data In
developed they
languages. number
integrlty
addition~ Of
full
have
security
of
In
and
subsequent
recovery mender
opera--
problems
relevant
and the
size
ignored
pafers in will
authorization
in confind in
67
4.2.
A
data
it
/175[
Data,,,Independence
base
allows
without
system
transformations
also
dence
{a
affecting
correctly
is
in
model
it
for
is
with
widely
organization
and
to
changes
form
fact
/182/.
tha%
The of
its
links
or
inverted
5).
Every
such
direct
mix
of
application
there
will
be
a
The
need
new
types
schema.
of In
affected~
ers
for
base.
or
example~
There
since may
data
the
old
may
main.
Certalnlys
ments
into
many
are
be
be
a
designed
a
many
least
s may
suchs
or
indepento
Is
data
not
the data
of
a
affecl
they
rely
%o
which
the of
the
In
on
CRM
of
which
conceptual a
binary
programs Consid-
only new
data
read
and
new
model~ relation
programsy
if
many
one
to
the
insert
the
data~
model.
Informationv
otherwise
of
conceptual
conceptual
the
update
programs
old
a
tlme~
additions
the
unaffected.
which
the
the
ad-
for
wlth
to In
domains
only
see stoP-
base
application
a
of read
In
data
due
changed
since
structures;
changes
remain
{projection}
(for
organization.
programs~
programs
some
mlx
%he data
depends
organization
existing may
internal
program
changes
domain
alter
constraint
even
the
data~ of
implemented
The
arises
of
the
redundancy
internal the
of
best
storage
data
The
application
consequence
activities,
some
to
form
a
means
the
part
of
directly
other
generallyv
changes
one
that
data
application
ape
Since
aftected~
Other
an
independence
between
are
¢o
that
independence,
absolutely
more
subview fop
Is
internal
large
need
dependency
entered.
constPalnt
a
no Is
at
the
least
different
relatlons
the
files
addition
he
model
be
there
cannot
the
should
already
relaxing
at
stlll
no
paths
programs.
conceptual
while
internal
optimize
Informations
generals
are
the
update
to
adapt
for
independence
implementation
to
need
data
i.eo
of
access
additional
attempt
given
data
conceptual
conceptual
of
performance
via
will
of
sense
respect
independence
is
section
requires
T
wlth
certain
there
example
mlnls±rator
~ence
a
which
transformatlonsT
Of
invariantT
on
i.e,
many
the
data
stays
heavily
In~s
how
that
forms
in %he
in
claimed,
internal
respect
before
programs
selection
extent
conceptt~l
transformations. of
the
programs
correctly
clear
the
%o
or
existing
Independencev
conceputal
recognized
run
the
of
data
internal
of}
makes
between
the
the
independence
independence
sometimes
internal
progeams while
the
This
We d i s t i n g u i s h need
after
consequence
as
of
which
effect
transformations. automatic
data
maximum
programs~
non-affected run
supports
This since
new new
do-ele-
Information for to
these
examplev a
many
%o
programs
constraint.
68
Support
of
capable
of
is
conceptual de±ermlng
affected
which
is
data
or not
trict
the
solvable.
data
each
This
solvable
mains
extensively
for
not.
independence
in
mappln~ Thls
applied
of
its
involves general,
languages requires
requires application
a
very
[%
is
such
a
programst decision
fherefome
necessary
that
appear~
the
complex
of
type
/exceptions
that
the
decision
theory~ in
Is
whether
I%
problemT #o
res-
pmoblem
which
other
system
has
re--
not
eontexts~
been
53
in
and
65/.
Support
of
Internal
1.
data
A data
the
definition
Internal
schema. al
any
nal
schema~
process
cess
to
all
of
in
section
following
be
there exploits
inversions
A more
of
the
lor~
and
such
languages
with
a a
called
a
given
by
the
conceptu-
supported
system paths
the
for
reduetlon
.
the
and
program,
external This
words$
in
ac-
optimizbut
what
results
capainter-
needs~
independent
other
which
be the
the
the
without
data
In
in
of
of
system
must
program's
execution
"logically"
purpose
Is
prac-neces-
pemformance
user.
query
data
a
user
(in
a
his
he
Inverslens
or
offered on
is
not.
the
Implementations to
a
"data
may
query
advantages burden
language
independence
relation
However~
any
the
n
A
independence
new
limited
base
specify
During
user
except
attributes
independently
execution
inversions
in
administration which
formulated
by
des-
de~ree
of
and
a
query
maintains
unavoidable
of The the
stomate
/175/.
comprehensive
development
no
When
exist
overhead
specifies
allowed
access
during
access,
be
support
a
wi±hou%
time
"optimally
paths
expe~ImenTal 3
inverted.
system
may
the
the
meet
been
data
way:
whether
and
a
introduces
should
to
mappings
program
predeflned
internal
se~ves
for
cribed the
has
the
is
Almost
which
form
independence
different
the
access
reduction
galns
data Of
following:
language~
conceptual
application
which
these
This
sary
of set
~ecognlzing
exploit
role")
given
of
tically
mapping
every
the
schema,
ble
ing
the
~equires
and
to
degree
is
conceptual
To
form
The
schema
2,
independence
approach
%o d a t a
flexible
data
slightly
modified
for
a
data
model
and
mapping
motivation~ very
starts
independence
definition
close
Smith %o
the
with
language. have
DBTG
the Tay--
developed model
/1797
69
169/.
As
enables
it
whole of
pointed to
I194/o
a
out
operate The
by
and
the
form
programs
operating
The
evaluation data
probably of
is
Data
processed
cesslng
in
practical mappin~
anothe~
Ramlrez
in
et
al.
from
tioned
less
have
data
data
of
tion
has
tions
language nition
fop
Sraph~
a
whichT In iS
for a of
Desautels
oriented
towards
the
a
describe this
of
to
a
been
area
created pro-
combines
the
small
has
right,
allows
projects
of
full
the
own
which
remaln
a fact
in
its
have
these
in
power
of
enough
to
This
be
plan
to
translatlon~
approach IS3/.
such
to
as and
data
is the Lam~
data
ori-
transla-a
negative descrlp--
developed
for
hlerarchleal
of
•
the
pro-
DBTG
data
have
implement
model
at
continuing
with
structure) between
is
men-
Smith
runnning
as
Shu
P,
during
/133/
and
D.
pro--
the
are
data
grammars Lum
their
Su
of
Merten
hierarchical
importance.
work
projects
form
of
activity
currently
both
(mappins
by
major
the
contextfree
conversion use
developed
126/.
and
and
makes
another
~ouseIT
of
/177~
generates work
in
computers~
an use
which
which
Fry)
Again
purpose
particular
illos-
in
This
in
evaluation
language
definition
of
dataT
internal
used
to
a
application
languages
organizatlon
This
Navathe
/95 v 165/o
network
such
justification
models
CONVERT
translation
with
into
/i08/.
{mapping
is
the
task,
built
as
by
have
all
language
still
used
Merten~
level
language
prototype
rarchles. version
Heller
record
DEFINE
a
is
investigated
and
the
and
tures} in
been
mapping
CRY
means
convert
complex
data
/142/.
data
of
as
by
~roup,
being
underlying
usefulness
Liu at
/
functions
The
resul±,
and
Michigan
increased
The
and
whole to
mapplns
compiler~
language
a
very
a
projects
a
as
which
orientation
large
built
definition
University
totypes.
a
data
which
da~a
data
and
experimental
the
these
access
rewriting
of a
The
descriptions
Taylorls
ented,
than
a
also
%o
wlth while
Similarly~
with
system.
lan~ua~est
is
conversion
systemT
orientatlon
conducted
grams
%he
one
language,
/166/.
experiments has
to
the
definition
to
a
conve~ting
possibility
than
system
T which
such
collectlons~
date
data
is in
data
these
impetus
translatlon
translation
and
es
a
also
without
the
expensive
on
has
converting
of
management
given
data
data
of
without
mope
of
base
SIMS
given
existence
standard
slze
on
importance
description
trated
earlier,
these is
data
a
defl-struc-
languagthat
o~
a
decomposed
Into
ble--
ARPA
net~
data
con--
and
also
Schneider
translation
specifically
70
4.3.
Data
Though
Intefl=Xlt[_and
the
recovery
pcoblems~
are
increased
in
multl--user adequate system
by
solution,
The
notion
assertions
example~
state
require
that
different the which
a
person
a
A
allows
a
enforced
A straightforward
I~
be
data
the
a
the
a
own
supports
specify
wlth
the
as
stay
a
then
data
or~ the
during
rules.
complex
mope
its
Such
that
integrity
{for
mannummay
rule the
budget
sum
of
allocated
to
rulesT
of of
complete
A
consistency
the for
notion
invariant
person~
exceed
be
collection
sense
ancestorv
not
may
system
consistency
a
without
responslbility
certain
known},
may
fact~
multi-user
viewed
which
about
are Its
base %o
the
which
extent
are
sub--
system.
approach
Provide
2.
specifying
to
the or
the
who%her
This
approach~
proposed
/66/
has
considered
%o b e
language
undeoidable
tent.
Second?
it
base
checked~
is
has
modAfica%ions
transformed plex baser
into
cons[s±ency which lar~er
to a
the
for
such
rules
could
consistent
rules
may
can
range
from
data
bases
in
has
consls%
of
been
user
base
/1~78/
and
Firs%~
state.
hours processing
defined
for
a
small tame.
in
when
in
general
consistent
Third~ access
it
are
must
before
require
modlfied~ hold,
assertions
checking to
a
data
llke
predicate
assertions.
still
in
carefully a
language
specify
to
caution.
the
be
general
base
example
since data
a
assertions
with
to
a
with
language
data
whether
a
data
user query
a
Whenever
checks~
for
known
In
called
in
In
and
enormously
following.
calculus
of
is
department
user
by
also is
birthdate
cannot
in
department.
sequently
the
and
expenses
it
are
are
responsibility
of
be
integrity
they
connected
may
contents
Information
something
address
some
closely
base
assertions that
whenever
name~
is
data
users. used
problems, over
to
systems~
under
schema
Systems
respect
concurrent
take
A
User
user
means
these to
data
wlth
many
schema.
the
These
may
with
integrity
the
abou%
processing0
to
data
and
MuI~I
single
traditional with
in
exist in
necessity
of
consistency
in
system
dealing
for
the
ber~
a
whlch
present
suppor%~
has
Pules
also
Recovery
large bases
the
system
recently is
also
for
a
general
themselves the
consls-
consistency
perform
a
state of
a
portion to
in
Is set of
several
of
number a~aln
Of
com-
a
data weeks
71
The
first
problem
consistency tency
rules
of
the
certainly
The tlon
of
the
to
checking
The
of
end
of
can
take
third
of
a
the
to
for
the
be
enforced
llke
the
assurance
of
analysis
of
the
an
analysis
The
problem
than do
one
not
end
isms
The
of
The
interfere system to
for
a
situation
have
wlth
part
is
the
without
be
is
at
increased
has~
each
the
and
in
other a
data
access
analyzed
during
is
cycle
free.
execution
tlme
in
a
technical
of
purpose ef~i-
situations
time
needed
for
necessary
compiled
query
addi-
and
most
time
A
to
approach
modificatlon~
as
comparative courser
such
level.
concurrent to
access
ensure
by
that
are
a
time. well
users
To
user
this
excIustve
Basic Known
more
the
operations. gives
limited
or
same
easily
Of
user
in
blrthdate
the
the
update
a
the
the
that
participatln~
contains
constraints.
which
fop
an
is
allows
with
with
father
It
addltlon~
to
a
by
variation
state
In
since
source
locking
of
in
the
facility base
names
help
a
costly
may
change.
Of
Given
Illustrated
that
assurance.
sesup-
Be~in--
how
The
objects
if
a
consistency
one.
be.
serves
sensey
constraints may
first
however,
function
to
exclusive
under
linear
rule sony
base only
state. Now
rule
of
canv
integrity
provide
of
granting
systems
a
he
with
the
data
is are
information
Pule
makes
with
system
must
IMS~
introduc-
which
determlngT
that
the
the
transaction
will
number
of
/176/~
integrity
user.
the
access
ating
not
the stored
This
by
queries
does
Ls
integrity
Stonebrecker
consis-
llke
user~
the of
may
labeled
hirthdate~
without
by
the
the
complete,
consistency
algorithm
the
every
bound
is
¢o
rule
to
consistent
capable
a
n
ruleo
proposed
which
that
control.
transformations
birthdate
for
processing
the
he
an
integrity" is
connected
If
the
a
user
way
edges
where
A one
transaction
consistency
only
person
enforcement
perform
so
in
systemsy
lead
by
into
under a
has
661.
base
must
requires
vePfled
/65,
relationy
cycle
"system
enough
and
state
base
a
n*#3,
previous
clently
the
some
father
precedes
language
Practical
data
are
system
relation.
every
the
whenever
data
rule
father
father
as
transaction of
containing
this
proportional
tion
a
in
the
the
slm~le
recognized
consistent
checking
subgraph
Checking
been
place
during
is
if
decldable.
transaction
is
Given
the
a
a
ruler
costs
example:
in
of
problem
checking
the
has
notion
only~
criterion.
transformations
consistency Its
solved
expressed~
this
transform
and
be
remains
problem
of
posed nlng
are
rules
satisfy
second
quence
c~n
from
mechanoper-
semaphores.
report
by
Eswaran
et
al
?2
/65/. in
A
/65/~
being
complication
of
locking
is
to
lock
the
created
potentially
/30/. be
finlte).
locks
such
the
formulation
Locking
has
systems
there
of
such
%hat
i.e.
taking
the
other
back
to
the
the
by
The
of
preempted did
record
internal
/83/.
This
method
is
noted
that
these
files
and
state
discussed are
hold
by
a
then
to
pre-
preemptions resources
be
is
l.e.
possi-
data
its
/29/.
to
positioned
This
during
systems
The
With
is
files~
al
operating
transactions
the
resources.
e%
most
in
deadlocks.
user's
has
process
in
on
/6S/.
As
give
Chamberlin
required
restrictions
Solution
checkpoint
of
for
with
the
infinite
decided
system
%o
the
an
be
may
always
is
preclaimingo
second process
process
not
journals
which
one
Is
schedule
The
from
the
deal
/67/
can
can
from
which
with
it
deadlocks.
to
explained
exis%,
objects
imposes
by
of
yet
objects
predicates
This
ways
of
that
also
not
created
handled
Everest
system
it
by
danger
appears.
which
help the
be
systemss may
number of
dictate
two
resources
in
set
overlap.
to
example
deadlock
state
with
ble
for
process. a
they
essentially
resources no
infinite the
the
base
which
described
consequence
away
an
predicates
are
proposed
claiming
be
data
requirements whether
of
as
are
may
predlcates
in
ob~ectss
(though
Performance
two
fi~st~
There
created
Such
extension.
need
sets
execution
It
should
for
recovery
a
transaction
be pur-
pose.
Recovery
is
terminate
necessary
normally.
error
in
check
failure~
the
transaction livered
isolation data
be the
failures
second a
If
posslbleo
much
such
solutions placing
expected beginnin~
no
Bjork
of
as
this
MULTICS
to an
feom
the
and
work
is
be
a
failureT
first an
to
deadlockt
error
the
all
Thls
described Sayanl
deIs
propagate
in
which
that
is
been
by
base) recovery
such
162/ a n d
to
a
by
they large
Genton
/148/.
recovery
a
of
transactions
cause
laid
logical
consistency
data
not
does
to
failure
Of
the
have
a
a
the
appeared.
Edelberg been
or
(via
being
had
or
objective
restart
without
has
in
The
operating
/81/.
integrity
and burden
application the
that
for
exception
indirectly The
algorithms
unnecessary
an
or
failure
/iS/s
may
hardware
such
failure
Recovery
/50/,
thls
objective
by
for
avoid
directly
impossible
zerodivide
transaction.
as
basis
All
v a
affected
Davies
is
for
example)
execution
/83/,
systems
a
A
cause (a
the
of
i%
program
has
tO
base.
been
extent
for
which
the
continue
The
userWs
input
the have
whenever
end
of
recovery on
the
programmer trsnsactlonso
problems user.
is
The to
The
must most
inform
of
course
that
should
the
interactive
system
of
problem
73
solver as
should
far
were
as
the
As
with
at
the
for
not
the
only
data
be
required
query
language
user
to b e
improvement
(a)
reduce
STRUCTURES
from
from
the
the
subsections
5.1.
Strra~e
of
the
search
with
Inverted
~-Trees
answers describe rithmic
searches list
of
indices
In
of
I151.
hlgher
there
of
allow
to
to
These
topics,
two
section
in
are room
partleularv
derives
techniques otherwise
utilize
these
are to
records.
techniques the
cases
the
Its
which and
structures.
~ data
we
(b)
s%ore~e The
next
If
hierarchical known
as
HaerdeP
Index
and
quicker
and
Bentley
supportln~ describes
reduce
the
In-
allow
Finkel
trees
a
the
IB--Treel.
update
which
/112/.
to q u a d
of
file.
retrlevalt
Indexes
llsts t } to
Iogamethods
storage
costs
/90/,
time
Lum
number
is
with have
d l v l s l o n I is
of
methods
hashing.
Hashing
its application recently
shown
in general
addltionalt
organization
These
an and
for
/77/°
acceleration
inverted
obtain
tlme
parameters (Ibit
the
an
McCrelgh%
trees
search
and
is
of
complexity
in connection
by
help
and
binary
two
Ghosh
lhashlng
applied
which
search
reducing
studied
splitting
this
section
required
multl--a%trlbute
of
with
extenslvely
niques
if h e
signlfican%
organize%Ion
time
inverted
compressions
meat
course~
of
certain
method
assumpTionsv
preceding
storage
Hayer
introduced
address
Of
by
extension
/113,
in
solutionsv
programs
wlth
a logarithmic
of
Another
should~
as
performances
the
search
to
repeatedly
queries
an
of
employed
parameter
is
Lum
to
He act
ALGORITHMS
in
the
devoted
existing
SEARCH
algorlthms
binding
described
allow
/9/.
AND
frequently
file
to
Structure§
one
organization
able
is ce~talnly
improved
enormous
are
transactions, be
discussed
There
and
discussed
of
without
about
problems
with
existence
existence
tWO
sert
the
sometimes
structures
the
understood, proposed
as
know
is concerned~
system.
functions
Independence
value
One
over
The
STORAGE
Data
the
independencet
beginning
In providin~
5°
of
to
may
to d a t a that
best
essentially
such
as
be
combined
ltnks in
has
been
manase--
under
Their
basic
tech-
/8~/.
between
or
various
modlfI--
The
74
cations°
Storage
and
well
are
120y
13S,
tures
to
I.~°
described
!60/.
The
programs
to
which
structures
offer
the
without
the
responsibility
Attempts
to
solve
Reduction
Reduction
the
problem
the
relationship
tations
are I)o
during
given
is a
by
Reduction
too
of
to
Internal
in
99,
specific
storage
the
next
t00,
118,
the
Struc-
case it
stractures
in
past
structures)
In
organizationl
discussed
the
of
richness
independence.
of
reducing
the
a
of
in
is
the
optimally° subsection.
llke
or
evoke
and
is
loaded
the
objectives
an
external
%o
a
to
application
with
program.
problems
form
the
secondary
expectations;
similar
internal represen--
conceptual
Woptlmizatlon"
accesses
unrealistic with
and
forms
an of
to
accesses
internal
these
number
query
not
external
between
somethin~
reduce
~' s h o u l d
complex
/54,
this
programs
the
are
mappings
is
to
execution
"optimizatlon
the
Surveys offer
data
studied
Problem
is
objective
the
utilize
problem
where
(fig.
to
know
access)
mary
and
with
to
this
extensively
remains
binding
not
does
been
textbooks
structures
program
The
in
problem
system)s
~.2.
have
as
pri-
storage The
the
term
problem
optimization
In
compiler.
Variations
in
handlln@
Of
intermediate
example,
the
expression
opl
(A
where
A 9 By
C)
the
relational
two
intermediate
evaluate amounts liary
AB of
of
in
A v B)
data
execution.
this data sets
AB
the
=
A
far
of
{i.e.
opl
opl
indices)
By
and
also
D*
op3
CD
with
the
fOr
Conslder~
are
an the
On of
be
D
and
to
enormous
amount occupied
other
hand)
queries can
oriented
can
op5
storage
some
and
C
addlton
be
of
the
then
auxiby
the
there
are
dorin@
built.
towards
consequently built
if
in
construct
enormous
the
slze
operators
might =
in
inversions
primarily
modest
to
requires
evaluation
is
connected sets.
evaluation
exceed
and
temporary
area
base
and
accesses
C)
are data
D}
algorithm
by
some
over
straightforward
an
may
in
optimization
relations
A
storage
least
a
op3
large
Such
which
a%
research use
(C
relations CD.
improvements
auxiliary query
op2
relations
evaluation
tive
are
secondary
underlyln@
the
D
op2
algebra.
stomagey
drastic
B)
of
expressions
thelr Most
interac-
assumes
temporarily
of
for
that one
75
One
of
~he
due
to
Palermo
queries by
earlles%
no
lus
of
system
is
and
by %he
consists
assume ies.
that
mentary
queries
has
not time
Into As
than
by
• GPeenfeld
implemented
verslon
and
Chamberlln
advantage
of
calculus of
/6/e
a
Their but
inversions,
intermediary
of
seDIAM
lists
attentlont
search
earllert
To
stategles
the
researchers
reduce are
problem
of
under
CPU
tlmet
access
module has
however, also
eleput--
the
assump--
less
dynamic
can per
be
taken
other
in
com-
appllca--
been
for
Taylor
assump-
Thls
which
approach
primarlly~ Conway~
the
system.
required,
or
compiler
Pernandez9
the
quer-
to
/19S/.
perhaps in
module
the
form
They
elementary
according
essentially
standard
valid.
over
organized
bottleneck
always
to W o n g / C h i a n g .
expression be
some
is due
or
reasons
125,
75,
44
180/.
should
respect be
research CPU
6.
Research
be
clear
that
de$crlbed
as
long
are
the In
as
there which
Is
AND
in
area
no
of
are
a not
constraints
of
generaEly respected
deserve
"minimizing"
is v e r y
number
generally
potentially
%o
problem
to make which
Pecognltlon
addition
MODELLING
the
reduction has
structureT
Questionst
time
the
above
to s y s t e m
soy
tectuPee
and
84/
in
construction
can
much
seaTch
IMehl,
/5,
the the
quant~fication
Astrahan
becomes
the
efficient
several
efficiency
algorithm
to
a CPU
Senko
of
achieved in c a l c u -
to
problem
base
into
is n o t not
variables for
and
type
is
applicable
a boolean
reduction
mentioned
proposed
iS
data
interpretive
tlon.
reduction
received
hoverers
piled
It
and
of
This
is
InversiOnSo
expression
CPU
As~
the
the
investigated
once.
principle"
taking
also
problem
growth
Another
by
thls
algorithm
handllng
547/.
into
the
domains
Ghosh of
primarily
the
that
less
/89,
of
the
"least
problem
query
usage
tion
and
to
case
a boollan
tlon
and
of
each
a
is described
merglng
thls
for than
reduction
involves
related
In
tlng
by
that mope
Astrahan,
efficiently
algorithm
claims
restricting
A
at
look
investigations
accessed
applying
algorithm
{indexes}
CPU
and
described
expressions
paper
and
operations.
reduction
their
to b e
indices
Rothnie
problem
Palermo
has
expressions
quence
A
/140/.
tuple
building
comprehensive
more in
secondary
complex.
~very
assumptions valid° data
attention storage stoeage
wlth
This
has
base
archl-
in
future
requirements accesses.
ANALYSIS
of
modeling
and
analysis
has
as
Its
objective
to
76
learn
about
velop
slmple
management changes
existing
probabillstic
system,
in a
management
Such
system
system
primary
have
data
management
of II
with
report
structed Their cesses plex
in
possible system
way
Data
base
itles
though
may
be
Tools.
the A
base
event
model
this
performed
the tool
tools
/91/.
organlzatlonst
which
they
of
/132/, pro-
so
com-
critical.
comprehensive Is
con-
the
are
become
but
Nakamura have
package
processes
direction
to
a compara-
these
slmula±ion,
may
of
lead
simulator
These
systems
follow-on
simulation
performance
in
of base
should
base
and a
using
system
development
step
data
and
driven
system.
simulation
and
storaze
conventional
a
management
of
proposed also
they
of
tool by
mention
A
data
base
by
Rel-
proposed
a
is
they
server
of
complex
analysis
FOREM
in
~22/,
Yao
analytical
to
restrict
the
the
be
IS an
example
storage
in
/196/
modeling
analytically
themselves
of
analytically
IMS for
level,
treatable
therefore
to
well
of a determlnlstlc~
structures.
and
activ-
The in
Wedeklnd
methods
/193/
are
tractable
developed "r~Ther
For
organization of
It
queueing
Lavenberg
general"
example,
Extensions
by
model
and
distribut!onsT not
does
and
total
I/O
the
model
are,
their
expllcltly is
of
Shedler
model
represent
represented however,
by
likely
the
/103/.
a
Is the
sin@le %o
make
necessary,
Perhaps
the
indices
%o a
this
objects
too
studies
system,
of
storage
simulation
clearly
the
also
queue.
been
for
allow
gross
physical
also
deterministic.
component
Though
are
Cardenas
essentially
DL/I
have
Analytical
parts
analytical
at
of
detailed,
base
data
a
administrator
recently
simulation
base
systems
a whole,
defined
To
has
techniques
to e v a l u a t e
base
data
Influence of
colleagues
FOREM
data
help
simulating
hls
to o v e r a l l
fairly
out
system
a data
a
data
called
useful
the
questions
and
tool ~aerder
desIsner
base
a
activities.
Senko
indexlns
of
de-
to
/~4/®
ier
as
about
is
a
that
II a p e
respect
with
model
138/.
current
PHASE
e%
by
analysis
/154~
of
limited
early
an
the
and
the
predict
data
of
analysis
al.
the
behavlour
components to
Thus
modeling
recognized
and
the
research
been
FOREM
deslgno motet
their
help
these
has
PHASE
even
for
may
in
for
tive
models
models
system
and
need
called
or
analyzing
interest
The
development
by
systems
problem
most
frequently
flat under
file,
investigated AuThors~
varying
who
assumptions
question have are
Is
the
contributed Lum
and
selection %o
Ling
research
/114/,
of on
Palermo
77
/139/, Yue
Stonebraker and
tigated
Data
Wong
/197/
the
question
may
be
tempt
to
have
position
in
data
the
Chen
and
have
given
]l16y
Lum
21].
an
and
Chen's
al.
model
Into
response
The
second
%Istlcs~ Easton
finds
the
takes
in
sets
has and
heuristic
been
a
60/.
approaches
are
to
queuelng
arm
of
given
Is and
no
and
of
the
to
an
at
target
and
by
data
and
Buzen
the
hierarchy
to
minimize
distri-
usage
recently
The
sets
allocation
their
for
storage
suitably
algorithmic
bounds
a
cost.
ls
given Wong
best
some
Buzen
usage
effects
drives
Chandra
by There
ac-
as
minimal
%o
Their
ARPA
within
a
have
the
data
contention
disk
well
at-
improve
(like to
function
an
given
network
allocation
disk
data
network
the
constralnTs.
T
in
a
as
inves-
categories:
the
cost
storage
considered
/31~
Wong
on
total
algorlthmWs
number
second
etal.
Their
minimizing
over
Lum
of
hierarchy
contention
of In
has
/164/o
variety
allocating
addition
given
a
time r
information
which
Shneiderman
storage
case
of
a
problem~
data
a
nodes
costs.
specify
under
over
/15G/~
Schkolnlck
levels
minimize
in
to
problem
consideration.
tlme
buting
line the
/71/.
access
to
%hlrd v
statistical et
and
devices
reduce
algorithm
levels
costs
/23/,
different
within
assigned
considered
hlerarchyT
and
be
at
destributed
allocated
hierarchy, to
Cardenas
Stewart
size
or
to be
and
index
physical
have
cessibility
Farley
between
to
t98/,
Kins
of
to
balance
assigned
net}~
and
allocated
first 7 data
be
/174/,
sta-
also
by
solution,
but
optimality
ape
derived.
Casey
and
within 32/. al
Chang
a
have
simplified
Chang
has
function.
considered
network
extended
Both
the
of
Casey's
specify
third
computers linear
problem to
cost
algorithms~
of
reduce
allocating
line
costs
data
/26,
functions
to
a
attempt
to
minimize
which
more
27,
generline
costs.
With
the
open: how
analysis
what is
ape
the
Nakamura
etal. of
the
(tO ences
and
data
can
be
reported
of
a
data
far~
base
their
simulation
model.
Answers
describe over
system}
collected
in
a
to
a
at
least
one
input
data?
system
statistically
raise %o
operational
Hildebrand messages~ base
so
characteristic
observing
userVs
the
In
their
actually
Rodrlguez of
The
workload
validity by
work
such
systems how
the
Oft
and
data
of
the
trace
of
physlcal
systems
other
words~
question
can
only
ranging
disk
of be
the
found
statistics.
appllcation
/145/.
remains
characterlzed~
collecting
trace
operational
with
further
questions
relevant
question
from
program address
Lewis
and
a
log
calls refer-Shedler
78
derive
from
such
tmansactions process
In
a
(i.eo
a
be
the
model
Poisson
flt
mine
the
used
To
%0
a
model
%o by
Ghosh
/86~
model
blocks
on
Ghosh
with
for
also
and
Easton~ to
Tuel
to
sequence
of
behaviour
determake
model
an
deter-
are
also
extension
references
09
of
and
again
has
a
large
data
of
the
cer%aln
and
which
use
storage
the
/I07/.
model
measurements~
secondary
between Polsson
rate)
relationships
proposed
the
with
and
linear
has
times
non-statlonary
dependent
Tuelv
by
Easton
a
established
61/®
comparison
comparison
interarPival by
time
and
system
model.
reference
programs
this
a
theoretlcally
base by
%he
with
data
data
coeffleien%s validate
pllcatio,
a
empirical in
independent
dated
of
the
modeled
process
approach~
parameters
interactions
that
satisfactorily
semi--empirlcal
mine
the
observations
can
ap-
valibase
system.
It
is
clear
valldated met. art
The of
next
that
reasons
data
analysis
a
7.
data
SUMMARY
least
we
AND
two
major
systems
our
opinion
by
integrity
system
has
ventional
to
The only but
goal
in
also
that
have
in
over
to
their
%hat
described continuation
models
been
be
state
summarized
this is
and
convincln@ly
current
research in
research
on
of
the
in
the
modelln@
section
and
has
extremely
be
than
a
a
made
Important
the
base
of
of
part
system
tREes
the
complexity
of
Consldec
past~
user~ of
may
in
this be
that on
adwanta@e
the of
the
at
data
on
integei--
In
a
the con-
independent
due
different
devices.
the
userms
same
(or
cumrent
in
data
system
responsibility. devlce
of
the
base
is
language
goals
sys±ems~ a
structures
requires
that
programming
alone
conventional
storage
storage
a
complexity
the
program
different
system
In of
of
consldered:
systems.
large
userVs
activities
objectives
equivalent
data
the
clear
the will
its
independence.
take
of a
yet
wlth which
as
%he
responsibility
independent
not
view.
operating
system~
Implementation
also
greater
data
the
is
that
of
ape
or
and
remains
connected
systems
representative
has
general~
summarize
far
Implementatlon~
obtain
in
and
factors
base
to
CONCLUSIONS
to
Data
fly
base
poin%
try
are
it
pPogress~
practical
Before
this
research
~owever~
of
slgnlflcant
objective
characterizations
for
base
section.
from
the
workload
program a,otheP)
structures
is
%0
not
device~ during
79
access base
where
admlnis%rator
The
area
ed
restructuring
only
of a
amount
base
years
time;
of
expensive
are
assessment, in
This
researchers
that
question~
of
not
a
should
cussing
of
one
A
of
promising
design
Into are
the
system
Data
With
respect
than
can
Research be
be
base the
model
data
be
have
under
driven
two
interface
branch
was
against
the data
[mpor--
are
but
%he
researchv
level
that
d{s-It
mope
is the
other.
will
for
of
lan-
now
attitude:
and
by
research
programming
of
started
Imple-
reduced
around
mode~s
top
requires
certainly
changed
on
man--machine
a
is
lar@e
contlnuev
the
interac-
Investlgatlons prototype
efforts
way.
by
%0
storage
data
description
and
increased
power
wlth
stmuctures
intelligently
handled
%0
a
repmesented
reached
of
researchers
wlth
been
be
held
which
base
a
large
war"
takenT
between
have
to
amount
questionT
start-
takes
such
"religious
data
another
investigated
put
mope
there by
is
data
emphasis
langua@esv
of
lan~ua@es.
already
base on
mapping
how
the more
available
management
systems.
these
structures
can
utilized,
systems
in
administrator research
into
a is
which
combined
his
solution
sometimes
is
models
a
has
falr is
a
selecting
the
In
has
in
the
activities
now
probably
as
can
of
Before
a
model
However~
be
efficiently
Modeling
data
of
aspects
translatlon~ to
in
activities
problems
fa[lume
It
engaged
evaluate
and
of
that
different
solvem.
justified
continues
is
and
problem
user
A|ajor
viability
rlsk
nature
q~stlon
how
tive
%he
Peal
efforts.
exampleT
which
the
of
whlch
are
supported.
question
number
the
For
they
problem
much
of
new.
the
the
Justifies
similar
be
the
so
t
clarification.
and
is
implementation
models. %ant
research
Understanding
performed
spent
The
control
demonstrating
prior
guage
systems
ago.
prototype
mentations
under
role.
data
few
Is
has a
way in
that
its
already
useful
set
significant
beginning. been of
help
It
conducted
tools
results
will
for
take and
the
for
some
has
system
to
the
timer be
data
before
continued
deslgner
or
ad--
ministrator.
Comparing
first signed
the
a
fop
prlmarily rent
obtained
difference
state
and
employed
designed of
of
art~
for it
results
emphasis, by the is
with
parametric interactive llkely
industry
Systems
that
~ctivities
llke
users
while
problem research
we
Iris a r e research solver. changes
may
prlmar{ly systems With
the
priority
see
de-are cursome-
80
what
in
favour
described ningt
in
of
the
section
productive
parametric 6
is
user.
already
The
now
modeling
primarily
and
analysis
work
oriented
towards
run-
systems.
Conclusions
With what any
the
wealth
are
among
trends
tion?
Major
~iI
are
these
heartedly
research these
existing~
results
recognlzable
Whet
answer
of
with
currently
we
the
to
major
are
becomes
major
respect
the
quesflons~
it
%o
achievements?
a
change
problems?
well
meaningful
aware
of
While
that
Are
research we
the
are
ask: there
direc-
tryin8
reader
may
to
whole--
disagree.
resplts
I.
Model
One
Development
of
the
a~reement shown
primary
on
in
deal
a
~®
at
internRtv
of
problem ~dmlnistrating)
data
b~se
administrator
2.
to
Multiple
Due
to
lem
solver,
been
the
%ures
ture
in
many
to
the
we
user
control his
to
roles
the
has
is
have
programmin@~
of
the
{conceptual~
different
that
and
in
over
his
storage
installation.
Logic to
the
record
power
and
commercially of
that
finally
research
in
is which
application
multiple
notions
solvin@
Storage
Time
of level
importance
views
interactive
at
a
time
use
of
has
the
~ea--
systems.
predicete
parametric
logic
flexibility available
and
prob-
locks data
In are
of
bases
as
to
more
gener-
use®
Structures structures
"what
textbooks
a
means
assume
performance
exceedin~
the
problem
Storage ally
at
offered
similar
3,
Records
research structure~
information
users
and
orientation
particular
the
the
high
of
function
tune
developed
system this
levels
solving,
base
pest
of
base
that
data
structures
data
three
external}
{parametrlcs
~ndependence
particulart
In
least
Data
achievements
type
fig.
with
for
can
like be
represent
research
found
the in
important
~ctivlties.
B--trees Knuth
or
VOlo results
to 3t
say chapter
and
are
it
6" basic
or
other to
fu-
81
Recognizable
I,
Trends
Data
After that the
find
models
area
of
the
the
data
base
this
area
has
respect
contain that
solutions men%
3.
Data
current
management
system
systemo
ent
types
of
the
management
in
one
likely
to
and
to
functlons.
It
and
data
in
can
of
solved experi-
which
need
in
recovery T be
Increased
systemsv
the
problems
cannot
system°
sharing
of
arises
a large
is
more
data
into
number
a
consistency a
much
be
ex-
integrated
base
mann@e--
of
and
data
places
where
programs to combine
central among
simpler
descriptive
system
and
recognizable
offerln~ the
operating
about
trend
ensuring
is
the
in
differ--
dictlonaryv
descriptive to
interface
des--
stored
these
data
base
the
data user
for
information.
~
Performance
constitutes felt
that
tn
constitute research
the
sense
currently current
performance~
performing
and
many
makes
merge
descriptions
generally
Performance
ble
A of
maintaining
l0
and
within
apparent,
information
the
problem
problem
functlons~
system
time
more
models
the OS
lead
research
even
different
that
management already
realized
Dictlonary/Directory
the
thereby
even
increasingly
justi~icatlon of
resource system
to operatlng~
criptive
and
Into
apparent
operating
further
systems
With
DBMS
operating
ence
pected
future.
made
outside in
the
scheduling~
classic
their
is
superimposition
in
has
of
have
it
coexistence
the
attention
research
i.eo
The
is called
Integration
%he
controversies~
system°
more
Past
Major
Coexistence of
different same
system
2.
Model
years
of the
systems
do
though
%hls
alternatives= a bottleneck is
throughput ma3or
necessary
In has in
not can
and
transaction
problem. offer
the
only
be
It
level
this
been area.
recognized
rate generally
of
proved
partlculavltha% not
is
achlewa-by
CPU In
better
time
may
the
past
82
2.
Integrity)
It
is
Data
necessary
system
can
phasis
be
handled
here
on
is
these
functions
users
installation
niques
which
desirable
3.
Concurrency
in
in
&
%ribu%ed
on
network
Design
Tools
todays
systems)
%he
to
he
is
these is
Data
In
a
given
data
from
logical
time
order~
to ~
process
extreme-
schedul-not
have
bases)
so
in
been
increase which
oP
how the
the
for
in
are
dls-
systems~
llke:
to
how
select
current
of
helps
reported
development
to
the model
hardware
state
which
research
the
future
decisions
InfoPmatlon) of
for the
to
time=
such
de!etion
and
and
the
in in
of
inevitable
clock) (The
which
The
degrades
and
ant)
making
section
6
tools,
~eload Is
range
A
solution avoids
order)
significant
large
interruption
time
a
is to
not data may
from the
interruption.
reorganize
type
of
does
performance
therfore
For
is which
and it
parts
generalv
of
the
duping
too %o
utl--
in
which
become
a~fect
storage
available
hours
pbyslcal
not
is)
bases)
of
update
physically
reason
physical
use. the
%o
fragmentation)
which
normal
addition)
is
but
dump
Peorganlzation}o necessary
wlth i%
reestablish
wholel
tolerable.
ks
tech-
are
problems
data
more of
With
storage
To
necessary
around
Simllarly
the
computers.
even
much
system
like
llzation.
as
provide that
prevention)
These
with
number
Some
information
b~se
em-
Reorganization
disorder the
of
relevant
dynamic
stored the
a
decisions.
S®
so
faliures
efficiency)
way.
and
Information)
certainly
benefit.
The
(to
tPlvlal}
~deadlonk wlth
repmesentatlon. not
representationst
efficiency
from
specifying
efficiency.
is
has
of
data
wlth and
recovery
and
make
conceptu~l
physical
whole
systems
In
has
a
Papld
concurrency
4,
user
and
efficiency
satisfactory
a
Pules system
functlons
connection
multlprocessing
posslbllties
lacking,
of
again
solved
as
Recovery
more
the
mode
and
problems
by
ignoring
allow
ly
ing)
provide
integrity
enforceable
which
The
Independencev to
reoPganlzatlon
this
are long
weeks
data
used to
be
fop
a
problem
83
Acknpwledgement
The
are
authors
Scientific Heights
grateful
Centerv and
San
Jose
Pope
and
North
they
are
grateful
of
preversion
a
8.
to
and
America ±o of
their
members to
Eo
F.
collegues the
IBM
many
Codd
helpful
and
M.
E.
at
the
Research
representatives
for
thls
of
from
IBM
Heldelberg
staff
at
Yorktown
Universities
discussions. Senko
for
the
basis
lh
~u-
Specifically a
crltlcal
review
for
status
report°
BIBLIOGRAPHY
The
subsequent
report.
It
research
reader
the
list
is
hoped
results.
critical
in
entries
in
Re~erences
Definition
also
be
lists
II~
They
37
82
169
179
194
142
152
166
12S
152
175
65
66
78
17
18
Tndependence
47
48
55
82
180
181
182
194
4S
Data
Integrity
1
29
30
129
163
176
Data
1
Manlpulation
3
6
Languages
13
16
a
should
iS
a
partially
author.
this
reference
to
are
intended
not
he
in
recent to
help
considered
as
Subsection
references
first
95
as
elsewhere,
cross
which to
value
annotations
found
of
according
of
the
Languages
35
Data
can
subsection
8.Io
ig
literature.
which
ordered
it
contains
presen%~
selecting
ordered
Data
references
that
references,
Cross
o~
Where
revlews~
alphebetically to
to
numbers annotated
I contains referring
list
of
84
19
20
2S
28
3S
36
40 72
42
46
59
68
69
70
73
74
79
87
93
I01
102
lOS
106
I09
II0
119
123
!28
131
136
141
143
147
149
15S
158
173
183
I~4
185
194
198
Data
Hodel
17
Da%a
Equ£valence
82
122
134
167
Models
1
2
4
7
8
14
20
34
35
38
39
41
43
52
56
57
58
63
68
69
70
79
!21
124
133
151
15~
157
178
190
7S
94
171
142
153
165
177
Data
Security
30
Dat~
44
T~ansla%ion
95
108
Modelling
and
126
Analysis
-- G e n e r a l
-
24
61
8S
~6
91
103
107
I13
115
117
127
132
137
144
161
188
i~3
196
22
138
14S
154
170
31
32
Tools
12
-- O p t i m i z a t i o n
2t
23
Algorithms
26
2~
33
85
60
71
88
97
98
114
139
1SO
162
164
174
197
94
171
187
S0
62
81
83
148
116
Privacy
76
Recovery
IS
Resource
29
Search
S
Storage
Allocation
30
65
and
Scheduling
67
Algorithms
6
84
92
140
147
1~
Structures
9
I0
II
S1
77
80
90
96
III
112
130
146
ISS
182
186
189
Surveys
and
Textbooks
8
49
$4
64
99
I00
104
118
120
13S
IS6
160
172
191
192
86
8e2o
References
1.
Abrlal~ W0rk.
J.~°
sterdam
y
paper
ing
the
is
W.
cessoP !44
L~t
and
156
Deductive (1968)
is
the
terms
of
father
00.2200
6.
to
~educe
M°
l%hm
for
the
i@74
ACM
Astrahan~
M~
scope
exceed-
advocates
a
on
Data
Base
!975.
binary
Associative
ACM
Natl.
Pro-
Confe~ence~
relational
model
definitions
the tO
of
grandfather
deductlve
Manipulation. T
as
in
a
relations a
function
capabilities.
The
Division
and
M.
a
Connection
Matrix
Poughkeepsie
v
TwRo
English
algorithm
employed
Co
of
W.
S.
is
P.
by
The
ACH v
in
RIL
Chamberlin~ Language. the
data
attrlbutes~ is
matrix
true
the
rows
and
with
techniques
it
A
SEQUEL to
make
accessing
Programmer
Search
Accessing New
described
query
Query
essentially
where
a
1
in
respect
to
have
be
To
requlrementSo
Gosh~
and
mat~ix~
represent
Sparse
Workshop~
M.~
binary
attrlbute
Independenf
given
11minlmization"
the
false.
algorlthm %0
a
columns
that
Data
Describes
Bachmann~
a
(e.g.
as
sto~age
SIGFIDET
heuristic
Structured
7.
Am-
and
Interpretive
accepts
leads
Data
the
otherwise
As%rahan~
path
of
represented
indicates
entity,
applies
a
Committee
1968
Development
entlties~
position
cess
IFIP
1971 is
represent
An
It
which
of
System
June
Study
relations
Concepts
Informatlon
A
of
Holland~
entities.
Newsletter~
TRAMP:
wlth
describes
Report:
Capabllitleso
mother)
IBM
it between
SIG}~OD
system.
othe~
and
R.
area.
implementation
in
Method.
5.
North
•
of
Ashany~
ACM
Sibley.
answerlng
the
Proc.
Management~ 1974.
philosophical
relations
in%trim
question
a
B~se
April
and
management
binary
Systems.
with
-
TRAMP
4.
base
with
ANSI/X3/~PARC.
Ash~
Data
Corsica
mathematical
data
model
Management
3.
Semantlcs®
Cargese~
1874.
The
data
2.
Data
Conf.
York~ whlch
Path
Selection
Hodel
{DIAM)
Alger•
Proc.
!974. constructs
a
DIAM
ac-
(Fehder).
D. CACM
D.
Implementation
1By
5@0
Interpretem use
of
-- 5 8 8
and
a
{1975}.
the
secondary
of
feductlon
indexes
for
operations.
as
Navigator.
CACM
16,
653
-
658
87
(1973). C.
8.
Wo
Bachmannt Proc.
C.
vol. ape:
evolution
The
The
of
Rot
Large
and
Ordered
used
by
data
Lecture.
Management.
data
AFIPS
description
NCC
1~75
(conceptualt
ANSI/X3/SPARCo
structured %0
the
introduction
of
Bayert
Re
model
[graph~
understanding
new
hardware
Symmetric
Binary
structure
described
Bayer~
Storage
network)
of
to
the
vs
nature
support
data
%ual
Data 1,
are
a
and
290
189
search
%he
has and
be-
ef~i--
method.
Structure -
of
{1972}.
{B-tree)
Logarithmic
B-treesv
Bayer
-
173
of
Maintenance
306
and
Mainte-
{1972},
modlflcatLon
of
the
storage
McCvelgh%o
Characteristics
and
Processing
Methods
74,
440
for
--
444t
Searching North
and
Hollandt
19740
paper
access~
by
I~
characte¢istics
B--trees
and
organlza%lon
Informatica
Information
Amsterdam~
Informatlca index
Binary Acta
Organization
structure.
are
Symmetric
Addressing.
Acta
storage delete
Algorithms,
R.
Eo
hierarchical
Insertl
nance
12.
as
Indexes.
a standard
clent
The
trlpartlte
McCreighiv
described
come
11.
Base
Award
{1975).
contributes
The
Turlng
algorithms.
Bayert
The
3.
a
debate
model
data.
base
of
ACM
in D a t a
-- 5 7 6
external)
current
relational
1973
Trends $69
I.
20
10.
W.
44,
Trends
internal~
9.
famous
Bachmannls
contains
pseudo
a
discussion
random
of
access
hashing
{l.e.
B-trees
and
indexed
In
sequential)
random
and
vlr--
memoeies.
Bennet
t
Systems.
~e
To
Traditional
and
K~uskalt
appear
in
stack
large
average
large
number
gorithm
Tot
to
J°
Joof
handle
thls
Processing
and
Dev.
algorithms
distances
distinct
Stack
Res°
processing
stack of
IBM
Vo
as
pages,
situation
they The
Data
Base
(IS75).
are
inefficient
appear authors
wlth
for
in
the
describe
drastically
for
case a
o~ new
improved
a
al-
effl--
clency.
13.
Bergenv
Mot
Environment and
Its
Erbet for
R.t the
Application
Pistor~
P-t
Interactive in Computer
Schauer~ Evaluation Aided
U., of
Design.
and
Walch~
Go
Scientific Proc.
Workshop
An
Data on
88
data
fo~
bases
interao%ive
dams s editors), ble
14.
from
Blller~
ACM.
~®s
and
15.
North
BjoPk,
L~
National This
16.
17.
paper
Eo
Formal
is
the
in
papers.
a data
Gamma--Zero
n-ary
of
and
J®
C.
Decker~ Data
a
View
on
74,
Proe,
DB/DC
papers
See
IBM
o~
and
[5--16~
J.
G®
1979v
Lln-
availa-
Schema-Subschema of
IF[P
System.
Con--
1973
ACM
T.
K.
describing Davies
L.,
Base
Interface: Report
level
for
query
recovery
the
Tralger,
Research
low
a
first
I.
L.
of
The
Speclfiaca%ions RJ
1200~
language
1978.
accessln@
a
base.
An
(R.
two
system~
F**
Cleemput
(1973}.
of
Relational
data
R.
for a
Operations.
descmiptlon
Sytems
Scenario 142--146
Eo
relational
Base
1974.
base
A de±ailed
Bobrow~
Processing
second
Codd~
D.~
Objects
Jo
Amsterdam,
PPOC.,
the
BJo~ne~
Neuhold~
Recovery
concept two
/149/.
~ollandv
A.
M.
September
Schauer
[nformation
Conf.
(W.
Canada~
also
See
Correspondence, gress~
design
Waterloo,
Experimental
RUstln
Data
editoP)~
Management
System.
Prentice--Hallv
In
Englewood
Data
Cliffs,
1972. The
paper
describes
It
contains
(hierarchy
18.
Boyee~
a or
R.
as
Management~
Proc.
of
1974,
North
Holland,
SQUARE
iS
a
Bracch!~
D.
IFIP
G.~
the fop
Fedeli~
System.
ettrotecnica~
vs*
D.~
implemented
dlsc~sslon Codd*s
King,
Work.
W.
Conf.
of
relational
F.,
Expressions:
AmstePdam~
on
system
excellent
Relatlonal
syntatically
based
Management
but
approach
Chambevlin~
Chamberlln/Boyce
19.
brief
Queries
language
experimental
network)
F.~
Specifying
an
and
LISP.
the
EDMS
approach.
Eammer~
SQUARE.
Car~ese,
In
MQ
Data
Corslca,
M.
Base April
1974®
te~se,
so--called
set
omiented,
"concept
hlgh
of
level
mapping"°
query
See
also
Date
Base
"SEQUEL".
A.~
and
Laboratorio
Poli%echnica
di
Paolini~ di
P.
A
~elatlonal
Calcolatorl,
Milano,
Internal
Instituto Report
di
No.
EI-
72--5,
1972. ~|ORIS is
a Codd
pulatlon
language.
hierarchical
relational The
structures
system
wlth
a
users
wlew
{i.e.
uanormallmed
calculus
(external
oriented
schema}
data}.
may
manl-include
89
20.
Bracchl,
G.~
Model
for
Prec.
of
Fedeliv
Data
Base
IFIP
Holland,
A,~
the
ceptual
Conf.~
schema
P.
Systems.
Cargese,
A Multilevel In
Data
Corsica,
Relational
Base
Management~
April
1974,
North
1974.
binary
{hierarchical,
Paolinit
Management
Work.
Amsterdam,
Advocates
and
relational
and Codd
many
(graph
model
models
relational,
for
etc.}
the
as
fOm
model)
the
external
well
as
con-
schema
Internal
sche-
ma,
21.
Buzen~ ry
queuing Is
offems
a
Model program
and
A.
costs
also
and
Fo
CACM
E.
The
play land~ GADS
F. t
1,
System.
is
an
North
Hol-
sets of
in a
data
memory
sets.
hier-
The
paper
be
tlme
of
File
Organization
--
1973,
used
to
given
estimate
the
data
total
sto-
organization
Performance
of
Inverted
Data
Base
197S. SchkolnIek
and
Yue/Wong
for
re-
J.
P.
Doubly
Modeling
Chained
and
Tree
Analysis
of
Structure.
In--
1975.
J.
L.t
Evaluation
Giddlngs~
of
an
Go
M.~
Interactive
Processing
74,
and
Manteyt
Analysis
10SS
-
and
1061v
North
P. DisHol-
1974o
and
provides a
graphics intended a
data
in
ence
gained
with
GADS a n d
this
kind.
of
-$48,
263,
The
--67t
stored
system
271-27So
in M e m o -
subject.
data
a
Balancing
results.
may
Sagamangt
interactive
It
data
Selection
540
and -
Information
locations
grammers.
and 16y
Bennet,
and
Amsterdamt
graphic
and
57
P-T
Design
Chen#s
access
thls
Organization:
E.
to
Farley/Stewartt of
Data
Carlsont
74,
allocation
which
2S3
18,
A.
Systems
Load
specifications.
Cardenas~
Base
Processing
the
Analysis
CACM
Klng,
Optimal
access
of
average
related
treatments
form.
2S°
the
descrlbed~
A.
P.-S.
Evaluation
System.
Structures.
24.
F.
is
Cardenast
See
for
to a n a l y z e
generalization
device
cent
P.
1974.
model
and
a
A
Chen~
Information
used
Cardenast
rage
23.
and
Amsterdam,
archy
22.
P.t
Hierarchies°
land, A
J.
variety
of the
system as
a
for tool
extraction files.
The
requirements,
data to
related
be
used
technique paper
by for
to
accessin~
discusses
which
must
geo-
non--pro-
experibe
met
by
90
26.
Casey~
R®
Network.
27.
The
author
lem
of
The
costs
28.
1973
SJCC
gives
an
storing
G.
Design
Free.
D.
Query
of
Copies
1972
Prec.,
exact
and
data
of
Chamberlln~ lish
Allocations
allocating
R.
Casey~ NCC
G. AFIPS
sets
at
of
and
Tree
a File
40,
heuristic a
in
617
an
-- 2 2 5 ,
of
to
Networks
Distributed
-- 2 5 7 ,
D.~
and
Boyce~
for
the
prob-
computers,
between
251
Information 1972.
solution
network
%ransmlssion
42,
ACH
a
within
vol.
Language~
of
vol.
~Iven
nodes.
Data.
AFIPS
1973®
R.
F.
SIGFIDET
SEQUEL
-
Workshop
a
STructured
1974,
ACM~
Eng-
New
York,
1974. SEQUEL
Is
SQUARE, Boyce/
29.
a
however,
D.
Free
Scheme
for
tion
Processing authors
processes ite
in
delays
zatlon NCC h
and
view
The
cussed.
North
case
to
Of
those
English.
See
deadlocks.
of
-
use
Traiger,
a
Data
In
Base
~olland,
A
Deadlock
System.
Informa-
Amsterdam,
deadlock--detection Their
L®
and
algorlthm
1974o
baekout
of
avoids
indefin-
Viswst
Authorl--
process.
D.~
Gray~ in
44,
a
virtual
J.
a
425
can
the
in
343.
%o
language
restrict
similar
%o n a t u r a l
and
Locking
propose
a
closer
F®,
340
Locking
Views
R®
Resource
vol. is
query
syntax
very
SQUARE.
Boyce,
D.~
D®
Proco
a
for
semantics
74,
of
Chamber!Int
with
with
Chamberlin
Chamberlin~
The
30.
language
N.t
Tralger~
Relatlon~l -
430,
Data
I. Base
relation
derived
form
The
problem
of
be
fop
authorization.
access
%o
a
SysTem.
1975
AFIPS
1975.
SEQUEL. used
L.
view
for
the
other
updating
relations
via
is
dis--
views
Locks
exclusive
temporarily use
of
one
user.
31.
Chandrav ment
to
disk one
32.
System.
S.K®
to
related
specJ
drives dlsk
algorithm
Chang7
Wong~
K.
C.
Worst
Storage
Case
Analysis
Allocation.
To
of appear
Place-
a
in
SIA~
Computing.
authors
of the
on
and
Ko~
algorithm
Journal The
A.
fy
such drive is
a
heuristic the
that
is
ACM S t G M O D
probability
minimized.
analyzed.
Data
algorithm
Base 1975
See
The also
Conf.
of
worst
allocate
data
simultaneous case
sets access
performance
of
EasTon/Wongo
Decomposition InT.
to
in on
a
Hgmt.
Hierarchic of
Data~
Compute~ San
Josev
91
1975. The
author
cost
33.
Chenv
P.
tem.
1973
A
34.
S.
Optimal
AFIPS
Caseyls
results
the
hierarchy
CODASYL
Development
and
deflnitlon
section
sets
can
CODASYL
CACM
by
allowing
&
non-llnear
of
an
n--tuples
CODASYL
also
2821
problem
Language
many
or on
the
taking
Structure
Sys-
queu|n~
Group.
An
In--
1962.
ideas. idea
which
Storage
1973.
BuzenfCheno
1 9 0 -- 2 0 4 y
entity
Multilevel -
allocation See
fop
of
in 277
Contains~
that
then
fop
files
jolns~
may
example T be
union
and
interInter--
performed°
from
original
St
source
Programming
Available The
be
42~
Committee.
Algebra.
as
vol.
considerations°
"oldtlmer"
the
Allocation
Proco
into
preted
Flle
NCC
of
treatment
formation
36.
extended
effects
An
3S.
has
function.
Language
Committee.
1971.
DBTG-Report.
ACM. DBTG
proposal.
Programming
Language
Committee.
DBLTG proposal,
Febru-
1973.
ary
Contains nltlon
the
COBOL
language°
data
The
manipulation are
languages
and
suvschema
essentially
data
those
of
defi-
ref.
3S.
37.
CODASYL
Data
Language. Essentially
38.
39.
CODASYL
the
Systems
Base
Management
from
ACM.
of
same
data
data
model.
F.
Relational
Codd~
The of
E.
CACM
paper
in
Feature
Systems.
compares
A 13~
377
which
Technical
commercially
Model
-- 3 8 7 y Codd
Committee. June
definition
Committee.
a
network
Language
Development~
Primarily
Banks.
40.
description
Journal
language
Analysis Report,
available
of
Data
Description
1973o
Data
as
of
in
Generalized
May
1971.
systems~
for
35.
Available
contains
Large
Data
Shared
also
Data
1970.
introduced
%he
{Codd)
relational
model
data.
Codd~
E.
F.
A Data
Base
Suhlan~uage
Founded
on
the
Relational
92
Calculus.
41.
E.
Codd~ Model~ Data
1971.
Fo
and
Base
CllffsT
42.
Codd, Data
Further
Systems
E.
F.
Base
of
Information
F.
Amsterdam,
Recent
Base
Relational
Sublangua~es.
Prentice--Hall~
R.
mentation 211
--
The
main
of
W®,
220,
In
Englewood
multiple
In
User.
Cargesev
Corsl--
1974.
are:
natural simple
data
dlalogue~
choice
lan@uage model,
query
Pes-and
Interrogation
a
[n Relational
74~
1017
-
Data
1021,
Base
North
Sys--
Holland 9
Codd's
relational
data
topics
sublansuage
including
types.
superimposition
needing
Maxwell,
model
W.
L.~
The
and
discussion
author
storage
lists access
investigation.
and
Measures
a
Morgan,
in
H.
L.
Information
On
The
Systems.
[mp!elSv
CACM
1972.
in
at
as
a
R.
File
W.~
The
Maxwell~
W0
by
file
which accesses
ve[llance
progPam~
automatic
functions.
which
contains
ls
also
a
security
conscious
of
discusslon
of
1972.
L.~
and
Morgany
H.
Processing
L,
A Technique
74,
988
-
992.
1974. by
has
are %o
of
checking
approach
Information
Ams%erdamv
Each
To p e r f o r m
an
paper
implemented
implemented
declarations~
is
%ime"~
Surveillance.
technique
All
paper
resource.
~olland~
described.
this
compile
systems
Conw~y~
Casual
proposed
steps
Inves%igatlons
Security
idea
"once CPU
the
Conf.
clarification
queDy~
and
the
of
security
Work.
%o a
The
logic v
with
Amsterdamy
steps
performance,
among
Conway,
IFIP
of
Processing
normalization
gram.
Data Base
1971®
1974.
survey
concu~encyv
A
the
Yorkl
capability.
E.
for
New
Data
Rendezvous
system.
declara±ive
tems~
North
seven
Internal
Codd~
A brief
45.
North
answering
%atemen%~
to
Holland,
level
theory
Steps
Proc.
descriptlon
the
of
editor).
ID747
The
only
Rustln
ACM~
of
Completeness
Hana~emen%~
definition
44.
NormalizatiQn
(Ro
Seven
Aprll
of
Workshop,
1971.
question
43.
SIGFIDET
Relational
ca~
high
ACM
the authors
associated complled
the
which
file
can
in
with
into
a
have
To
then
be
their £t
a set
file pass
used
system
to
of
ASAP
is
function
surveillance
pro--
through
suP--
perform
the
certain
93
46°
DanaT
Co~
and
and
Device
~o10
41t
The
paper
Date,
-
of
J.,
An
InforamTion
Report
1116~
Structure
Generation.
AFIPS
for
FJCC
Data
Base
1972
Prec.
1972. high
describes
Co
Data
L.
Independent
1111
manipulation
47°
Presser~
level
elements
for
The
generatlon
and
reports.
and
Hopewellt
Independence.
P.
1971ACM
STorage
SIGFIDET
Structure
and
Workshop,
ACM~
Physical New Y o r k ~
1971.
48.
Date,
J.,
C.
and
Independence.
49.
Dater ley~
Co
J.
An
Readlng,
Similar
Hopewell,
1971ACM
Introduction
Flle
to
book~
one
to
introduction
Deflnltlon
and
Workshopv
ACM~
Data
Systems.
Base
New
Loglcal York,
Data
1971o
Addison--Wes--
197"5.
Massachusetts~
To Wedeklndes
prehensive
P.
SIGPIDET
of
data
the
first
base
attempts
systems.
of
Many
a com-
annotated
references=
~0°
Davies, Natlo
C.
Together to
51°
52.
a
T.
Confo
Recovery
Prec.,
with
Dearnley~
P.
System°
others %he
Delobel, The
Theory
17,
374
Deals
-
as
Comp.
for
a DB/DC
System.
1973
ACM
1973. an
easy
To
of
a
Model
Self
20~
-- 2 1 0 ,
system
Journal observes
accordingly°
and
Caseyw
of
Boolean
386,
1973o
the
R0
E.e.
original
G.
of
Into
17,
understand
patterns
Slmulatlon
introduction
set
of
without
flat To
are
a
I B M J.
decomposition a
of
a
files
derive
allowlnS
Data
1974.
usage
of
Functions.
allowing
file
of
Organlzlng
results
DecomposDtlon
Switching
problem
property, The
paper
Opera%fen
redundancy
cover
Tion
The
with
{enormous) mal
A.
data
C.,
141~
concept.
Management
tures
-
BJorkls
recovery
Among
Semantics
136
and
Data
Base
Res.
Develop.
flat
the
same
further
and
with
file
having The
restruc-
reported.
mlnl-
Informa-
decompo~i-
tlon°
53.
DI
Paola,
Classes Santa The
of
R.
Monlca~
paper
A.
The
Proper
Callf.
deals
with
Solvabillty
Formulas Technical the
and
of
the
Related
Report
solvability
Declslon Results.
Problem
Rand
R--803--PR, A u g u s t of
The
decision
for
Corp.,
1971o problem
of
94
class File.
54.
of
See
Storage.
55.
M.
Dl%fmann~
deP
E.
Annual
Press~
den
%0 be
Data
Structures
Review
L~
and
in Automatic
Rends
Rela±lonal
%help
Data
Representation
Programmlngt
Klasslfizlerung
System-Entwurf. Infomm~%ik-
A~
GrundsTruktur
elnes
notes
yon
Technlsche
vol.
5,
in
PeP@a-
in
Des
Konzept
Darmstadt.
Berlehte
DV75--[
des
ObJektbeschrelbungsbaumes
gPaphenorlentlerten
computer
fuer
Datenunabhaeng£gkelt
Hochschule
FoPsehungsgruppen
Doerrscheidt,
ture
by
1969°
E®
Berlin,
processed
Levien/Marono
D~Imperio~
mon
questions
science
26,
als
Datenbankmodells.
532
-
541,
Springer
LecVerlagv
[975.
Describes
a
Typically
graph
o~iented
data
model
based
on
LISP
ideas.
57.
Durchholz~
R.~
Systems°
Data
Corsica~
April
Influenced the
58.
to
J.
guages~
Aeta
M.
s%Paints
on
%he
Work°
Conf.
Feature
model
of
Management CaPgese,
1974.
"CODASYL
data
Base
and
Data
Analysis"
schema.
Structures.
CACM
related %henry
Level
the
C~
2,
to a of
Theory
formal
Data 293
and
data
Structures -
incorpoPatlon
llke
of
string
structures
s[ml--
309, of
languages.
for
Programming
Lan-
1973.
relational
level
data
struc-
languages°
Wong,
%he Minimal
Co
Cost
K.
The
of
a
Effect
of
Partition.
Capacity
JACM~
22,
Con-441
-
1975. algorithm
proposed,
Easton~ IBM
of
Data
AmsTerdam~
Understandlng
Informal[ca
ALGOL
EasTon~
449~
ideas
for
into
A new
61.
an
model
fop IFIP
Proc,
~oll~nd,
hierarchical
Relational
proposal
tures
Concep±s
1971®
available
Earleyz
A
data
Towards
some the
North
a
J.
Go
Management
discuss
-- 6 2 8 ~
Sketches
60.
the
14,
617
Rich±er~
1974.
Earley~
lap
59 °
Base
by
authors
and
whlch
M.
C~
Research
%0
%he
accep±s
~odel Repnm%
for PC
problem capacity
considered
Chandra/Wo.g
is
constraln%So
Interactive 5050,
by
Sept.
Data 1974.
Base
~eference
STring.
95
Describes
a
of
modification
which
describes
model
Is
measured
the
independent
behavlour
its analytical
well,
tractahilltly
references
An
under
model,
advantase working
of
set
The
assump-
tions.
62.
Edelberg,
M.
SIGFIDET The
of
descrlbed~
Ehrlch~
H.
D,
InformaTlca graph
The
and
which
201
--211, data
for
(i.e°
log)
data
blocks.
einer
Recovery.
1~74
ACM
1974.
restores
Grundlagen
4,
York,
transfers
oriented
model
New
and
algorithm v which
processes
is a l s o
A
an
data
Into
Contamination
ACM~
describes
set
pagatlon
63.
Base
Workshop,
paper
given
Data
a
given
de%ermlnes
blocks
Theorle
A and
der
error
The
error
recovery reruns
and
a
pPo--
algorithm
processes.
Datenstrukturen.
Acta
1975.
model
are
investigated
W°
A
and
graph
from
a
Data
Base
orlen%ed
more
schemata
mathematical
within
point
of
view.
64.
Engles, view
R°
in
Tutorial
on
Programming
Automatic
vol.
Organization.
part
7
It
Annual
Pergamon
Re-
Press,
1972.
65.
Eswaran, The
Ko
P.~
Notions
System.
IBM
paper
The
of
Research
defines
concurrency~ guage is
Gray~
and
presented
N°~
Loriev and
Report
The
RJ
1487~
locks
determines
A.,
and
Tralger~
Locks
December
and
Their
is
I.
On
Base
within
consequences.
Two
L.
Data
consls±ency
proposed~
whether
in a
i974.
transaction,
specification
which
R°
Predica%e
of
no%Ion
predlca%e
predicate
fOr
J.
Consistency
and
such
an
A
lan-
algorithm
predicates
over-
lap.
66.
EswaranT of
1601~
Po,
and
Chamberlin,
for
Data
a
rules
interpreted
are
data
Everest~ rity.
Base
D.
D.
Specifications
Functional
Integrity.
IBM
Report
Research
RJ
1975.
Con%alns
the
67.
K.
a Subsystem
as
of
consistency
routines
To b e
rules.
invoked
Consls%ency
after
changes
Of
base.
G.
Data
Cargese,
classification
C. Base
Concurrent
Corsica~
Preclalmln~
of
Update
ManagemenTT April resources
241
1974. to
--
Control 270,
North prevent
and
Data
Base
Proc.
IFIP
Work°
Holland~
Ams%erdamt
deadlocks
is
Integ-Cent.
1974.
advocated
by
96
the
68.
author.
rende
I,
Informal and
a
of
Sprln~er
hlgh
Falkenberg
%he
from
T1
of
der
E.
language
The
of
Farley,
72.
also
Fehder, search The
73.
computer
a
data
model,
Management
Systems.
lnformatik,
Internal
A
employee
of
B
manipulation
dlmenslon.
und
Dars%ellung
Datenhankbenutze~
a
is
a data
data
closely
of
{and
yon
Informatlon
und
Detenbank--Man--
Stuttgart,
1975.
model
and
a data
related
to
concepts
though
graphically
It
an
manlpulation in for
allows
are}
natural n--ary
loterpreted
as
relations.
Cardenas
papers
DIAM
be
and
Stewart~
Relational
L®
Base
extends
Is g r a p h o r l e n t e d
of T o r o n t o ,
P.
in
it.
example:
and
tlme
of
can
G.,
S.
Data March
for
The
Reports
fo~
fuer
Unlversity
are
model
for
Unlversl%y
Data
{for
to
zwlschen
both
H.
J.
Selection
See
the
Thesisy
binary
in
Instltut
time
with
which
Notes
1973.
language
relatlons
description
relations
Resultatspezlflzie-
"Gegens%andsmodell"T
S%rukturlerung
where
lanEuage.
71.
of
Schnl%%s%elle
A detailed
Heidelberg,
%he
manipulation
stored
to c o p e
J.
Lecture
1974.
dimenslon
agement--System.
Joins
of
Stuttgart,
07/74,
Falkenberg,
Schneider,
Time--Handlln~
to T 2 )
language
and
Da%ensystemen.
Verla~,
level
T E®
Universlty
Adds
B.,
yon
discussion
CIS--Repor%
70.
~eyer,
Handhabung
science
6S.
E=,
Faikenbe~,
recent
A.
Query
Bases.
Technical
investigations
Independent RJ
descmibe
RIL,
the
Report
into
1121
(1972)
and
Index
CSRG-53v
1975.
Representation
RJ
Execution
and data
12Sl
this
subject.
Language.
IBM
~e--
to
the
i.
IBM
(1973|.
manipulation
language
system.
Fehder~ ~esearch Describes
Pc
L.
The
Report a
RJ
query
Hierarchic
Query
1307,
1973.
Nov.
language
to
Language
operate
on
(HQL)
IMS
part
like
hlerarchlc
datao
74.
Feldman, Language.
The
high
J.
A®~
CACM
level~
and
12,
439
ALGOL
Rovner, -- 4 4 9 ,
llke
P,
P.
An
ALGOL
based
Assoclatlve
1969.
programming
language
LEAP
is based
on
97
binary
associatlons~
which
are
implemented
uslng
a
hash
coding
P.
An
Author--
technique.
7S°
Fernandez~ Izatlon Conf.
E.
B. t Summers~
Model on
for
M~mt,
of
Authorization data
76.
base
purer A
77.
governed
Ro
and
vol.
The
by
und
and on
Coleman,
Base.
C.
ACM
SIGMOD
1975
Intl.
197S.
predicates
enforced
26,
and
Joset
over
prlmarily
at
applications
compile
Lecture
Gesellschaft.
and
time.
Notes
in
Com--
1975
discussions
A.,
Retrieval
C.~ Data
San
Datensehutz
of
Finkel~ for
is
Science,
survey
Shared
Data~
contents
H.
Fledler7
a
R.
on
Bentley,
privacy.
J.
L.
Ouad-trees:
Composite
Keys.
Acta
of
trees
for
a
Data
Informatica
Structure
4~
1
-
9t
1974. A
generallzatlon
binary
the
search
on
composite
keys.
78.
Florentln, nal
17,
J. 52
-
Consistency data
J.
Consistency
$8,
of
Data
Bases.
Compo
Jour-
1974.
rules
base
Auditing
are
contents,
predicate
Problems
calculus
of
their
expressions
over
implementation
the
are
dis--
cussed.
79.
Frank~
R.
L.s
University
Shows
detail
in
Franks
R.
L.t
Access
Method.
Describes
81.
o9
the
and
steps,
and
AFIPS a
the
users
Frasert
A.
G.
Integrity
Journal
12,
C.
archical
Structure.
(GI
1975}s
A
Report:
-- w o ~ k l n g
have
to
the
DBTG
A Proc.
Illustrative
papeP to
made
-- 7, get
a
COBOL
approach,
Method vol.
oriented
be
An
for
a
43t
45
language
Generalized
Data
-- 5 2 I 1 9 7 4 , to
tailor
access
of
a
Mass
Storage
Filing
System.
Comp.
1969o
System
Springer
NCC
DBTG
ISDOS
specifications.
~ecovery
Frasson,
The
Ko
keyword
to
Ss
in
Yamaguchis
1974
I -
H.
which
runnlng
ideas
the
E.
Michigan,
methods
Describes
82.
Sibley~
program
application
80.
and
Example.
in
to
MULTICSo
IncPease
Lecture Verla~
Notes
Data in
Heidelberg
Independence Computer s
I~75o
in
Sciences
an
Hier-
vol.
34
98
Descrlbes thelr
83.
Gen%on~
in
the
Recovery
Compo
Journ.
Ghosh~
P.~
S.
Base
work
is
S.
P®~
I%
iS
al
bes±.
Data
IBM
path
and
S=
P.~
and
System
-
accessed
dlrec%
126,
b{. E.
J.
Independent
of
Res.
Dev.
of
queries
An
algorithm minimum
V.
Y.
System
shown
Tuel~
that
W.
G®
Commercial
journallng
Path
1ST
is
-
Procedures
422y
access
given~
"path
in
is
of
a net-
claimed
%o
Collision
by
division"
A
of
an
Design
when
Hashing
197S.
"hashing
[B~
1974.
paths
which
fop
cardinality".
15 -- 22~
Perfromance.
Sys-
techniques.
Search
408
to
Analysis
I~
Access
1970. and
String
of
Lum~
Inform.
analytically
Base
123
for
checkpointin@
reduction
access
Divlsion~
Ghosh~
13,
considered.
an
Ghosh~ by
the
be
hierarchy.
Senko~
Systems.
DIAM
yield
and
can
Procedures
elementary
Data
86 °
A.
structures
Describes
Within
85.
[MS
position
tems.
84.
how
Research
is
in
Experiment Report
RJ
gener-
Model
to
1482,
Dec.
1S74.
87.
88.
The
authors
ate
the
model
Goldsteln~
1970
MacAims
is
C.~
and
early
{I.
e.
of
Strnad~
N=
1974
NCC
AFIPS
Galatll
is
transfer)
R.
J.
The
in
an
MacAims
ACM,
New
and
IMS
Data
York7
evalu-
system.
Management
1970o
system.
%hat a
Data
Base
Report
qC
of
~eorganiza%lon 5063,
clustering
way
Quan%Iflcation
Proc®
vol.
op±imlzatlon
LEAP
Feldm~n/Rovner).
ten.
A.
Go
in
Discusses
Haerder~
measurements
model
as
to
Oct.
records
mlnimIze
for
a
1974. into
the
blocks
number
of
necessary.
GreenfeldT
(see
performance
Workshop~
IBm| R e s e a r c h
considered
units
with
relational
and
Hierarchy.
problem
linearlzed
SiGFIDET
S.~
The
~
comparison
ACM
an
Oorens%ein~
transfers
90.
by
R.
System.
Storage
89.
construct
T.
Die
Technlsche
Forschun~sgPuppen
43T
in 71
-
techniques
Implemen%ierung Hochschule DV74--2.
won
a
Relational
75~ fop
a
relational
Zugriffspfaden
Darms%adt~
Data
System.
1974.
Berleh%e
system
dutch der
llke
Bitl[s-
InfoPmatik--
99
The
author
vestlgates of
91°
Haerdery
Hall~
T°
Zugrlffszeitverhalten Datenbank,
of of
P.
Held~
G.
access
Ae
V.
D.~
Common
is a QUEL
IBM
UK
and
conventional
methods
der
Auswahl
In-
of
for
Saetzen Berlchte
simulation.
Identification
M°
and
R.~
Includes
a com-
indexes.
UKSC0060~
1975
yon
Darmstadt~
DV74--3.
help
Report
System.
relatlonai as
of
rity
assurance
the
Hoffmannv Tems.
Its
NCC
in General
Nov°
1974.
E°
INGRES
Wongt
AFIPS
L.
via
J°
B.
C.~
Proc.
Easllyo
ACMv
New
forms
a
-- a ~ela--
vol°
et
44,
4CS
--
Shut
Descriptlon
access at
and Los
N.
Paclflc~
DEFINE
to
control
calculus
interesting and
preprocesslng
Privacy
In
Angeles~
C°t
and
Language
of ACM
Computer
Sys-
1973.
Lum~
for
Integ-
time.
Vo
Y.
Defining
DEFINE: Informa-
San
Franc[seoy
Aprll
19751
graph
structures
%o a l l n e a r
1975°
then
map
referenced
speclflcatlon~
J.
Journ.
17~
Discusses inverted
by
written
(and in
processed the
according
language
CONVE~.
to)
a
See
Iverted 59 how
-
Indexes
&3y
to
and
Multlllst
Structures.
Comp.
1974.
use
multlllst
structures
in
order
to
maintain
files.
R°
lutions
incorporate
Pov
wlth An
al°
Inglist
Karpt
Data
D.
system
language.
modification
Companyt
Proc°
iS
which
to
Publishing
language
translation
query
Securlty
Smlth~
York~
Describes
is
query
management
level
{editor}.
A Nonprocedural tion
data
high
authors
Melville
Housel~
The
organization
to
Hochschule
Subexpress[on
StonebrakerT Base
plan
Shu
the
structures
Systems.
bel
Technlsche
wlth
storage
Data
INGRES
port
index
1975.
based
97.
an
supePior
Informatlk--Forschungsgruppen
416,
96.
as
are
der
tlonal
9S.
lists
elner
AlGebraic
94.
lists
bit
aus
p~rison
93°
blt
when
indexing°
Analysis
92.
proposes
M.~ to
a
RC 4 7 4 0 v problem
McKellar~
A.
C. v
2-dimensional ~lso considered
and
Wong~
placement
%o a p p e a r is
the
in
SIAM
placement
C.
K.
problem. Journal of
Near--optimal
so-
IBM R e s e a r c h
~e--
of
Computing.
records
in
a
2--d|men--
100
slonal
storage
eonseeu%ive
98®
Kin~
W.
search See
99.
I00.
E.
D~
E~
~*T
tO2.
539. The
Center
North
is
Lavenberg~
iOS.
S.
Levlen~
relations
for
Ott~
volo
N.~
Report
1968,
3:
C.
and
Computing
1973.
and
ZoepprJ*z~
IBM
Germanyv
a
data
1975o
manlpula--
language.
Retrieval
Concepts
Sorting
75.08,007~
tO
natural
in
a
set-theor-
Practical
Symposium
Consldera1973~
531
-
1973.
natural
a "set
language
theore±ic"
S®v
and
Shedler~
G.
IBM
Research
Report
analytically data
D.
Re
E.~ and
Introduces
LsvIt%~
into
Fundamen--
Massachusetts~
designed to
P,
I:
like
query
langua@eT
In%ermedla%e
lanBuage
base
File
S.
A
tractable
Queuing RJ
Model
1561T
of
the
DL/I
the
pro-
1975.
queuln~
model
of
On-Line
Systems,
access.
Structures
for
Spartan
1969.
Execution
1060
a
vol.
Information,
IMS.
durin~
Books~
has
a~ea.
interpretation,
of
Lefkovi%z9
Base:
Ams%erdam~
system
simpllfied~
cesses
close
Lockemann~
Data
Holland,
~oP
Introduced
International
translated
Composent
i04.
of
th~s
H.~
Technical
and
Re=
IBM
M~ssachusetts7
Readlng~
Lehmann~
very
a File.
Programmlng,
Heidelberg~ is
for
Programming~
General
Strucutred
Proco
suitable
A
P=~
K®
in
Readlng~
Computer
D®)
is
two
between
1974.
Languages:
system
distance
Indices
Computer
of
At±
expected
research
Addison-Wesley7
described
which
of
Lat%ermannT
Kraegeloh~
tionsv
103.
The
lan~uage~whlch
etically
January ~ecen%
Art
of
Addlson--Wesleyv
interactive
tlon
the
mln~mized.
Selection
for
Specialty
Scientific
that
is
13411
The
Searching,
User
An
RJ
Cardenas
D.
Kogon~
so
the
Algorithms~
Knuth,
M.
On
Report
Knuth~
and
lot.
F.
also
±al
aP~ay~
~eferences
the [see
G.~
and Data
Maron~
E.
Re%rleval.
Relational also
Stewar±~
Interactive
M.
Data
Di
A Computer
CACM
Data
[0,
Filer
71S a
System
for
721,
1967.
-
system
based
IngePence
on
binary
Paola).
D°
H.~
Analysis.
and
Yormarkv 1974
AFIPS
B.
A Prototype
System
NCC
Proc=
43,
vol.
69
101
-- 6 9 ,
1974.
Describes
an
relying
on
graphics
107.
Lewis~
implemented
standard
and
P.
statistical
A.
Transaction po=%
108.
RJ
system
analytic
W. t
1629,
and
ShedlerT In
AuGust
The
cess
with
~
Llu)
S.)
and
Go
a Data
of
varying
Heller)
Translation
of
It
measurement
makes
data
heavy
use
of
S.
Statistical
Base
System.
transaction
stream
Analysis
IBM
of
Research
Re--
1975.
modeling
time
analysis
me%hods.
Processing
Describes
for
procedures.
J.
Model.
a
as
a
Polsson
pro-
Grammar
Driven
Data
ra%eo
A
1974
Record
ACM
OrIentedv
SIGFIDET
Workshop)
ACM~
New
York)
1974. Grammars
may
grammars
mapping
as
109.
a
string
P.
men%
for
764,
1967 •
111.
of
a
strin~s
to
equivalent
string
C.)
and Data
W.
D.
Acqulsi%Ion
%o
the be
to
a
tree.
7we
are
used
frees
specification.
KnuTsen~
may
A.,
A
and
problem assembled
Data
and
Symonds~
Ao
MultlpvoGramminG Analysis.
of
Environ-
CACM
measurement
10~
data.
communicating
via
75~
Base.
PrOCo
1970
RAM
-
relations
{in
some
Lo~te)
R.
Ao
Scientific January
Prefadata
sets
J.
A
ACM
Schema
for
SIGFIDET
Describing
a
Workshop~
Rela-
ACM~
New
a
data
XRM -
Center
base
sense
an
management
llke
LEAP
Extended
Report
G 320
of
system
(n--ary) -
2096)
based
on
binary
Peldman/Rovner}.
Relational CambridGe
Memory.
IBM
~ Massaehusetts~
1974. Implements
homogeneous
flat
files
on
top
of
RAM
(see
Lorle/Symonds)o
112.
113.
-
1970.
Describes
XRM
mapping
string
tars°
R.
tional York)
mappings
programs
parame
Lorle,
as
different
approach
earlier
and
Taken
Online
bricated
110o
to
Lockemann)
An
be
Lum)
V0
CACM
13)
Yo
MulTi--aT%rlbute
660
-
Lum)
Y.
form
Techniques)
Yo~
665)
Yuen)
P, a
Retrieval
with
Combined
Indexes.
1970.
S.
Tat
Fundamental
and
Dodd)
Me
Performance
Key Study
%o
Address on
Large
TransExist-
102
Ing
114.
Formatted a
plled
large
%0
V®
Yo~
of
Secondary
356,
±he
Cardenas
Vo
Y®
1973.
Lumv
V.
ented
117.
H.
An
Optimization
Proc,
1971
Performance Using
for
and
an
E®~
Data
techniques
as
ap-
ACM
Problem
NAT1.
on
Conf.~
the
vol.
Selec26,
349
into
the
problem
considered
by
Abstract
Wang~ Set
of
File
C.
P.~
Key--To--Address Trans-
Concept.
and
Allocation
the
algorithm
CACM
Ling~
in
H,
Storage
16,
603
A Cost
-
Ori-
Hlerarcbies,
cost
for
of
data
storage7 set
CPU~
allocation
channel is
e%c,
outllned~
cost.
Smith~
Memory
Analysis
197~.
this
and
Virtual
an
combining
minimizes
K.~
M.
322,
-
function
Maruyama~
Investi~tlons
Senko~
318
defined
for
hashin 8
others°
Algorithm
A cost is
1971.
of
Keys.
General
Y.~
18,
which
Ling~
Methods
612,
CACM
4,
sets,
earlier
and
forms%ion
116.
and
vol®
evaluations
1871.
Of
Lum,
and
data
tion
One
llS.
survey
Lum,
-
CACM t4~
Files=
Con±alas
S=
E,
Analysis
IBM
Indexes,
of
Research
Design
Report
RC
Alternatives 5087,
0ct.
1974, A
number
B-trees cally
118,
are
Surveys
McDonaldt
ACM,
New
and
7~
N®,
alternatives
resulting
York7
into
for
indexes
formulas~
which
oPganlzed
as
are
numeri-
is
system.
McGee~
W®
5 -- 1 9 ~
a
See
W.
C.
Hash
Table
Methods.
ACM
Comput--
1975.
M.
Conferencev
also
data
CUPID San
-- t h e
Friendly
Query
Francisco
t April
197Sv
File
volo
flow
Fi!e S~
687
P~ocessing.
Pergamon
Structures
Processing
dlagram-llke
language
%0
the
Held.
Generalized
Programmln~
Information
G.
StonebrakerT
Pacific
grahicy
C~
T.
1975o
INGRES
McGee~
Lewis,
and
ACM
CUPID
matic
121,
analyzed
D=~
Language.
120.
Implementation
evaluated.
Maurer~W. ing
119.
of
Press~
for 1233
Annual
Generalized
-- 1 2 3 9 ,
Review
in
Auto--
t96~.
North
Data
Management.
Holland,
Amster--
103
1968.
dam,
122.
Introduces
graphs
McGee,
Co
Data
W,
Base
April The
author of
McGee~
presents
W.
C,
ACM
SIGMOD
The
paper
125.
Go
H.
and
relations
Intl.
%he
Mehl,
J.
earlier
of
information.
Data
Conf.
Equivalence,
Cargese v CorsicaT
1974.
equivalent
organizations
organizations
at
on
Network
T Proc.~ and
on
Look
papers
between
W,s
and
in
and
of
the data
a
New
proposal
network
Data.
ACM,
Data
data
Prec.
Structures.
York,
fop
1975.
a data
manl-
structures,
AFIPS
1967
FJCC
525
-
New
proposal
to
the
York,
Ao %o
C. in
P,
G°s
the
compiled
and
A
Study
IMS Data
of
information
Order
Bases,
data
1974
as
sets
Transformations ACM
independence
routines,
appllcatlon
File
to view
SIGFIDET
of
Work--
1974.
increase of
be%wren
Merten~ proach
held
York~
proposing
sets.
Wangt
ACMT
program
F r y T Jo
Po
Translation.
which and
data
A Data
1974
ACM
supported
intercept
the
by
IMS
communi-
management.
Descrlp%ion
SIGF[DET
Language
Workshop~
ACM~
ApNew
1974,
Describes
the
idea
translation
Merten~
A.
Gos
New
MeyerT
York~
B.y
and
design
behind
%he
Of M i c b i @ a n
UnivePsi±y
Severance,
through
D.
G.
Modeling.
Performance Proc.
Evaluation
1972
ACM
Natlo
File Conf,t
1972
and
Technology.
and
project,
of Organizations
the
Work°
Operations
operating
shop~
cation
128.
STudy
(CRM}
Conference
Another
stored
organizations.
requirements
Structures
ACM,
of
flle
(DBTG)
Hierarchic
data
IFIP
for
1967. of
with
127.
197S
One
A
126.
flat
Level
outlines language
MealeyT S34,
File
the
Amsterdamv
a number
language
models
%o
Proc,
Holland~
homogeneous
pualtion
124.
A Contribution
North
description
123.
conceptual
Management°
19747
class
as
Schneider~
Course
H.
Notesv
Jo
Predicate
University
of
Logic
Berlinv
and
Data
available
Base from
authors.
Reviews interface
predicate llke
logic
in Coddes
and work
Its
use
and
in
as
a
model
natural
fop
man-machine
language
question--
104
answerin~
129.
sys%ems~
Minsky~
N®
Workshop~ The He
On
!nte~act~on
ACMT
author
New
discusses
proposes
a
Vlconsls%ent
wlth
YorkT
concepts~
cons±Ductive
operators"
Data
Bases.
[974
ACM
SIGFIDET
1974. integrity approach
to
be
used
rules~
%0
as
user
integrity
prlmi%ives
views
for
etc.
deflnlng
by more
complex
opera%ions.
130.
Mul!in~ Hashed
131.
J,
K,
An
Overflow.
Mylopoulos7
J.~
Relatlonal
Improved CACM
Index
15~
301
Schus%er~
System~
-
S.~
1975
Sequential 007,
and
AFIPS
Access
Me%hod
uslng
1972,
Tslchritzis7
NCC
Prec.
D.
A
Multilevel
vol.
44~
403
fhe
prototype
-
408,
197S. The
mechanism
ZETA/TORUS system
with
language ZETA
132.
used are a
on
as
an
Nakamuma~ Base
vol.
44,
±np
of
-
base
lower
level
a
I.~
of
rel~tional %0
data
define
a
prlmi%ives.
natural
and
Performance 463,
is
capabillfy
Yoshida~
System 459
development ZETA
"intelligent"
language
Kondov
high
TORUS
system
management level
is
query
bulit
on
interface,
H.
A
Evaluation.
Slmulation 197S
AFIPS
Model NCC
for Proc,
IS7S.
of expe~Iments
DescPiptlon data
%he
defini#lon
F.~
Data
in
descrlbed.
management
simulating
sysfem
in
a
the
processes
conventional
withln
slmulatlon
a
pack--
age.
133.
Nava%he~
cation 1975 The
S® of
paper
al
Mer%enT
Relatlonal
Eo
View,
when mo~e
J.
%hat
powerful
February
The
paper
of
G.
Investigation
to
Data
into
Translation.
the
Appll-
ACM
SIGMOD
-- 1 3 8 ,
Codd~s in
the
relational
model
context
data
of
".,®
poses
ser-
tr&nslaTlen
as
a
restruc%urln@".
Mapping:
University
10,
123
used
Data
A.
~odel
Proc,~
concludes
fop
Neuhold~
and
Conf®
pmoblems
vehicle
134.
the
Intl,
ious
Bo~
A
Formal
KaPlsruheT
Hierarchical
and
Relation-
Forschungsberlch±e~
Berlcht
1973. compares
formal
notation.
tional
model
is
hlePaPchlcal In
a
and
partIculart
special
case
It of
the
relational m~Mes
clear
hierarchical
da±a
models
%ha%
%be
model.
in
rela-
105
135.
136.
Blnary
Nlever@elt
v J.
Computing
Surveys
Notleyy UK-SC
M.
G,
Search
6v
The
3~
Trees
and
File
Organization.
ACM
1974.
Peterlee
IS/I
System.
IBM
UK~
Peterleey
Report
0018.
Describes
IS/It
one
of
the
earlier
Codd
~elationnl
implementa--
tions.
137.
Olsonl
C.
cessed
Records°
A.
Random
Access
Prec.
File
of
1969
Organization
ACM
Natl.
for
Confo
Indirectly ACMt
New
AcYork~
1969.
138.
Owensl
P.
Phase
J.
Information
II
--
Processing
a
Data
71T
827
Base --
Management
832T
North
Modeling Holland~
System. Amsterdamv
1972. Phase
II
is
management
138.
Palermo~
P.
Indexes. the
of
modeling
IBM
eamlier
designed
specifically
fop
data
Approach
Research papers
Report
on
index
RJ
to
the
0730~
Selection
July
selection.
of
Sec-
Cardenas
fop
1970.
See
results.
Palermo,
F°
RJ
July
I072~ paper
P.
A Data
for
queries
Petrlckt
S.
R.
Research
REQUEST
an
one
of
in
Search
RC
the
Problem.
earlier
predicate
Semantlc
Report
is
Base
IBM
Research
Report
1972.
contains
gorithms
IBM
tool
A Quantitative
ondary
The
141.
F.
One
recent
140.
a
evaluation.
optimizing
calculus
Interpretation 4457~
July
expe~tmental~
reduction
al-
form.
in
the
REQUEST
system,
1973.
natural
language
question
answering
system°
142.
Ramlrez~ tion
of
guage.
J.
1974
Describes to
D0
P*
Reisner~ Evaluation
Rln)
N.
Ao~
Conversion ACM
an
and
Prywes~
Programs
SIGFIDET
using
Workshopv
implementation
Smlth}v
translating
143o
At~
Data
of
which
a
ACM~ data
complies
N, a
S.
Automatic
Data
New
York~
Lan-
1974.
definition
data
Genera-
Description
language
definitions
{due
Into
data
programs.
P.~
Boyee~ of
R.
two Data
For
and
Base
Ch~mberltn
Query
t
Languages
Do
P, -
Human
SQUARE
Factors and
SE--
106
QUEL~
AFIPS
1975
NCC
A psychological
analyzed.
in
144.
Data of
data
the
performance
Models
models
64
-
447
show
the
of
Data
ternational
are
452,
1975.
subjects
is
a
but
slight
language,
which
Implemented of
sequences
described
and
statistically primarily
differ
at
Rothnie,
program the
J.
low
8.¢
a Paged
for
and
of
to
be
for
used
implementations.
A
ACM
Framework European
measurement of
levels
end,
and
a
data
for
Evalu-
Chapters
In-
evaluation
of
base
commands
involve
hgih
Lozano,
To
Environment. "multiple
for
allowing
the
levels the
~97S.
197S.
different
at
objective
D.
of
Representation.
May
and
disk
system
issued
address
in
is the
reference
end.
Memory
A combina%ion nlque
at
the
Prec.
Symposium
dlfferent
1554t
different
Hlldebrand~
Systems,
events
Storage
no.
wlth
of
framework
The
application traces
and
J.,
Base
Secondary Repo~t
designed
Computing
presented.
for MRC
evaluation
Rodriguez-Rosell,
in
on
Wisconsin~
The
ation
146.
with
nonprogrammers
dependency
A~
Rei%er,
An
44,
syntax~
University
145.
vol.
experiment
Only
significant
Pros°
a
Attribute
CACM
key
reductlon
Based 63
17,
hashing" of
the
File
-- 6 9 ,
and
Organization
1974.
inverted
number
Of
page
file
tech--
faults
for
multi--key--retrieval.
147.
Rothnie~
Jo
Relational vel*
148.
44,
employed
with
every
Sayanty
~.
burg.
and
To
attempts
Restart
U.
recovery emphasts
Ein
Messdaten.
Verlag,
Expressions
1975
utilize
the
AFIPS
in
NCC
a
Prec.
1975.
Processing
puts
Schauer, chef
423~
Retrieval
System.
to
for
the
and
Recovery
System.
purpose
1974
of
gained
optimization.
in a ACM
Information
Transaction
SIGFIDET
Oriented
Workshopv
ACM~
discussed.
]he
1974.
York,
ReStart
-
Inter--Entry
Management
tuple-access
Information
149.
Base
strategy
H.
author
Evaluating
417
The
New
B.
Data
appear
policies on
System IBM
as
~eidelberg.
are
defined
and
performance.
zur
Germany, Lecture
Interaktiven Informatlk Notes
in
Bearbeltung Symposium
Computer
umfan~rel--
1979t
Science,
Bad
~om--
Springer
107
Introduces
an
Interactive storage~ "query
150.
a by
brary
or
SchkolnlckT
See
151.
graphics
M.
Conf.
also
H.
tional
A.s
Data
Datay
San authors
ism
of
The
similar
ACM
See
to
data
language
{llke
an
also
open
ended
ll-
/13/.
ACM
Optimization, Jose~
comblnln~
relational
SlGMOD
1975
1975.
research.
J.
Re
SIGMOD
On
the
1975
Semantics
Intlo
Conf.
of
the
Rela-
on
M~mt.
of
1975.
are
Codd~s
access
Data t San
Swenson~
Model.
Jose~
The
world.
for
and
Index
of
Mgmto
system
a
manipulation
wlth
subroutines.
Secondary
base
{APL)~
data
Zloof)
FORTRAN
on
data
facilities
see
Cardenas
Schmld~
measurement
oriented
example"~
PL/I
of
Intern.
interactive
computational
concerned
relational authors
wlth
the
model
and
the
a
kind
of
employ
gap
between modelled graph
the
pure
part
of
model
formalthe
to
real
fill
the
gap,
152.
Schmutzt
H.
Germanyv
74.10.004t A
Oct.
special
schema
to
153.
of
context--free
hierarchical
mapping
Go
Language
31~
1975. authors
Senkor
as
M.
M°~
and
ARPA
E.)
a
Holland~
FOREM
is
evaluation
an
Senkot Data Journo
Mt
data
E.~
Structures 12~
30
a
is Pair
for
base
IBM Peport
describe
grammars
are
internal a
or
the
used
to
external
theoretical
J.
Creation
Information
for
to
treat-
systems°
E.
Networks.
used
and
model
data
language
V.
North
evaluate
model,
Deasautelst
Yo)
(FOREM)o
1968.
to
Relations.
Technical
data
of
Systems
translation
in
a
File
I,
a
25
-
network
network.
Lum)
Model
is In
for
propose
the
and
Centerv
conceptual
system problems
Schneldery
The
Languages
grammars
data
between
described
important
Evaluation
155.
of
Translation
such
154.
a
the The
ment
Regular Scientific
1974.
form
describe view.
Parenthesis Heidelberg
and
Amsterdam~ and
and -- 9 3 ~
E.
B0~
Accessing 1973,
P.
J.
A File
Processing
Organization 687
514
-
519~
1969.
simulation
management
Altman~
Owens9
Information
tool
specifically
designed
systems.
AstrahanT in
Data
M. Base
Mo~
and
Systems.
Fehder~
P,
L.
IBM S y s t e m s
108
This
paper
descrlbes
tem T one
of
research
156.
157.
159.
Senko~
M.
tities
~nd
Senko~
M.
E.
and
ideas
behind
app?oaches
comprehensive
E.
Data
Report
RC
An
Senko~
Me
~®
Report
RC
5263v
Senko~
M®
Eo
%he
DIAM
sys-
%0
data
base
3
I~
Description Oct.
-
Pela%ions~
--
13,
Setsv
En-
1975.
in
the
DIAM
II
wlth
FERAL
for
Lan@u~ge
Description
5073~
RecordsT
Systems
Context
of
FORAL.
a
Mul--
IBM
Re-
1973.
Introduction
%0
Users,
IBM
Pesearcb
1975.
Speclfiea%len
Results on
Sys±ems:
Inform.
Structured
Output
thoughts
Information
Things.
search
ence
%he
e~rller
systems,
tilevel
158.
%he
In
Very
DIAM
Large
of
II
Stored
wlth
Data
Data
FORALo
Basesw
Structures
Proc.
of
Bos%ont
the
1975~
and
Desired
In%.
Confer-
available
from
ACM. The
last
which
is
three
references
based
on
introduce
binary
DIAM
I[~
a
and
has
FERAL
assocla%ions
proposed as
sys%em~ Its
query
language.
160.
Severance~
161.
A
D.
scheme
164.
G.
is
A
[~
descrlbed~ a
set
Shneiderman~
B®
362
-- 3 6 5 ,
Optimum
Shnelderman~
B.~ -
566
~nd
577T
paper
describes
cem%aln
classes
of
B.
Model
The
3,
p~per
93 is
-
and
Gen--
Alternative
File
StPuc-
1975o a
special
Base
Scheuermann~
Of
IJCIS
Survey
"two
dimensional
space
including
well-knewn
of
case,
ReoPganization
Points.
CACM
A
103,
P°
S%ructuPed
Data
STructures.
1974.
The
Shneiderman~
of
A 1974.
1973.
CACM
17~
3t
organizations
as
Data
55, maps
data
organlzatlons
6~
Model --
51
which of
Mechanism:
Surveys
Parametric
Systems
~v t o
Search
Computing
conven±Ional
16,
163.
ACM
Inform.
parameters
162.
Iden%[fler
Model.
Severance~ tures.
O.
D.
erallzed
an
approach
data
for
to
deal
wl%h
integrity
in
case
sTruc%ures.
Optimizing
Indexed
File
Structures0
1974.
concerned
wlth
the
selection
of
index
size
at
dlf--
109
ferent
165.
Shut
N.
C.~
SlbleyT
paper
Ho,
E.
CACM
discusses
paper
Eo
two
"data
-
750
V.
for
On
the
structured" wlth
et
al.
R.
W.
759,
goals
Y.
CONVERT
Data
of
A
a
High
Conversion.
Level
CACM
18,
and
ACMv
Data
Definition
and
Mappin~
1973. a
data
definition
mapping
Equivalence
Workshop,
independence
A
Taylor~
philosophical
Sibley,
Lum,
Language
deflnitlon
H.
translation
the
16,
data
ACM S I G F I D E T
168.
and
The
The
and
to H o u s e l
Language°
Sibley~
C°,
1975.
lustrates
167.
B.
Deflnltion
-- S 6 7 ,
A companion
166.
performance.
improve
Housel~
Translation 5S7
%0
levels
New
of York~
or
"procedural" connection
and
il-
examples.
Data
Based
1~74
Systems.
1974o
directions~
its
by
language
"relational" (DBTG)
are
to d a t a
(Codd)
and
the
compared.
Also
data
restructuring
and
dafa
Dictionaries
for
Is d i s c u s s e d .
E.
H.y
and
Information
discussion
Sayanl,
Systems
of
the
H®
H.
Data
Interface.
need
for
and
Element
NBS-Report objectives
v 1974o of
a Data
Dictionary
capability.
169.
dissertation,
One
of
the
to
Data
Description
of
Pennsylvaniav
University
earller
data
definition
and
and
mapping
Conversion.
1971. languages°
See
Ramirez.
Smithy cal
P.
Approach
Do
D.
also
170 •
An
Smithy PHo
S.
Data
E.,
and
Base
Mommens,
Structures.
J. ACM
H.
Automatic
SIGMOD
Generation
1975
intl.
Conf.
of
Physi-
San
Jose~
from
des-
1975. A
criptive into
171.
172.
design a i d
prototype
input
account
S%ahl~
Fo
A.
AFIPS
NCC
Steel~
To
SIGMOD
1975
IMS
and
A Homophonic vol.
Data Intl.
described
physical
constralnts
Prec.
B~
is
42,
Base Conf.
data
for
- 568v
Standardization on
Mgmt.
generates
structure
objective
Cipher 565
which
of
def[nitlons
taklns
functions.
Computational
Cryptogvaby.
1973.
--
Datay
A
San
Status
Jose~
Report, 197S.
ACM
110
173.
Steuertt tem:
J. ~ a n d
Goldman~
A Perspective.
J®
!974
The
ACM
Relational
Data
SIGFIDET
Workshop,
RDMSv
system
Management ACM,
Sys-
New
York,
1874. An
in±roduc%ory
and
174.
175.
based
on
deecrlptlon
Codd's
Stonebraker~
M.
The
Indices.
IJCIS
See
Cardenas
else
3,
of
-- 1 8 8 ,
for
~esearch
Stonebraker,
M.
SIGFIDET
Workshop
The
paper
first
Partial
on
ACM,
the
unfortunately of
used
and
Inversions
%hls
View
Proe.~
analyzes
being
at
MIT
Combined
1974.
A Functional
which
It d e s c P l b e s
Choice
a
model.
167
ACM
approach,
of
relational
%ople.
of New
Data
problem ks
%he
types
~®
Implementation
not
data
Independence.
YorkT with
kept
1874
1974. a
promising
through
independence
up
%o
to be
form~l The
end.
provided
in
INGRES.
176.
Stonebraker7 Views San
177.
by
Jose,
The
also
et
Su~
Held
S.
1974
Y.
The
Data
tation~
which
Taylo~
W,
Constraints
1975
is
R,
have
it T which
their
Sharing
in
ACM~
York,
a
in
New
of
Intl.
Conf.
and Prec.,
in
more
detail.
Data
Base
Translation
a Ne±work
See
the
a
corresponding
deals
%o
the
of
IFIP
Work.
a conceptual datalogical data
Storage.
Arbor,
Conf.
1974. data
model
approach
forms.
hlanagemen%
Physical
Approach
Amsterdam,
internal
Base
Ann
data
of
with
Data
Infological
~olland~
a kind
Environment.
[974,
Proco
North
MichiganT
for
used
Semiautomatic
is
)(appln G
of
proposal being
See
inte~rity
Management
Generalized
and
a
to
1974.
approach
University
Contains
Integrity
SIGMOD
Foundation
Base
April
It m a y
R.
STructures
180.
Data
with
A
Data
Conceptual
Base.
associated
H.
Workshop,
[nfolo~ical
ments.
Lam,
Corsica,
Taylor~
approach
Achieving
philosophy.
t79.
and
B.
Cargese,
of
ACM
al.
SIGFIDET
Sundgren, to
INGRES
W.T
for
ACM
qodificaTlon.
19U5.
Describes
System
178.
Query
System Ph.
D.
Data
diseer--
1971,
definition
and
~[ichigan data
mapping
translation
languase, experi-
MeDten/Fry.
W®
Data
Administration
and
±he
DBTG
Report.
1974
ACM
111
SIGFIDET Among
others~
taln
181.
Workshop
data
Taylors Base
the
Ro
Cargese~
Wo7
TeichroewT Proc.
D,
of
ACMy
essential
of
about
slstance
183.
J°
Thomas~ by
given ple
185o
F. A
So
186.
Tslchritzls~
of
Turn~
R.I
Research
Development of
IFIP
of Work.
Amsterdamv a
data
in
on
paper:
the
J.
PJ
File
Data Conf,
1974°
base
at
a
user
Organtzatlon.
Informations
there
is
as
data
Storage
no a
and
have
absolutely
Proco
vol°
44~
to he
to be
439
with
of
made
of
as-
Query
197S.
subJects~ into
know-
wlth
Study
- 44ST
3S
translated
best
function
A Psychological
an experiment
of how
Interactions.
Van
der
417~
who
query
by
were exam-
Pool~
of
the
Toronto7 CoddWs
experimental
deverill~
R.
1969
Nail,
ACM
No
IBM
system
Framework
(i.eQ
o~
A.
Overview.
Technical
AFIPS
B° v and System,
UKSC
Peterleev
1975o
relational
Shapiroy
Dos±erty
Language
Technical
007S~
A Network
J.
Coy
1969.
A
UKSC
- Measures
der
-
networks
and
P°
Extensihle
PRTV:
P°
physical
Systems
188~
P°
description
University
187.
399
Report
Discusses
to
o~
B° v Lockemann~
J.
Technical A new
of
Changes
English
Rapidly
Proc*T
Todd~
In
NCC
of
oh--
to
Zloo~)o
REL:
Conf.
use
AFIPS
the
Proco
programsQ
this
Gould~
results
questions
Thompsont S.
and
On
on
Information.
computer.
preprocessor
a
1971~
in
future
use
Holland~
Symposium
Yorkv
the
197S
the
(see
SIGIR
the
C°t
Example°
Reports
184.
of
North evolution
Approach
New
W°
the
Impact
message
representation
D°
1974.
time.
1974.
is
1971
to
Managementv
April
An
the
Retrieval~
ledge
Base
Its
Yorkl
precomptle
Stemple~
~ concern and
New
proposes at
and
Corslca~
authors
The
author
Data
installation
182.
ACM)
Independence
Editions.
The
Proc.t
Eeport model
linked
Z.
for
Optimum
Relation
Implementation.
CSRG-49~
February
can
be
1975.
Implemented
on
top
structures)o
Privacy
Ef~ectlvenessy 1972
IS/1.
FJCC~
Storage
and
Security
Costs
and
vol.
41y
435
Allocation
in
Data
Bank
Protection--Intru-
444.
for
a
File
in
112
Steady
State.
Files
with
overflow
189.
Vose~
M.
Wang~
R.,
C.
Data
and
a
set
of
specify
minimal
without of
set
are
to
the
19,
cover
to
-- 7 7 ,
Inverted
Index
set
of
cover
which
relations
is
again
Given
third
Logical
calculates with
a
the
normal
in
1975.
algorithm,
given
CoddVs
rate
state.
Synthesis
71
dependencies.
in
with
steady
Approach
Segment
Dev®
a
and
overflow
!972.
minimal
tPansltive
melatlons
1973. {hashing)
for
An
H.B,
minimal
Each
S,
May
Res.
covers
38v
utilization,
given
J.
16,
J°
a
dependencies.
floss
Storage
Wedekind~ IBM
-
27
transformations
analyzed.
Bull.
and
17,
Dlv.
Richardson~
Design.
authors
tive
Res.
factors
Comp.
P.,
Base
The
are
relevant
Maintenance.
190 •
J.
key--to--address
areas
other
and
IBM
transl-
set
of
minimum
form
can
velacovert
easily
be
constructed.
H.
191.
Wedeklnd,
1£2.
Wedekind~
B.
Mannhelm,
1974.
193 •
Wedekind~ System. esev
W.
paperlS
tion
of
Wellis~
the
Base April
efficient
M.
E.~
Katke,
1117-
SIMS
is
interesting
data
normal
form
and
tion
been
paid
%o
Based
File
Organizations.
Each
query
queries. to
the
is In
Olsont
assumed
elementary
a
IFIP
in
Work.
Data
a
Base
Conf.,
Amsterdam,
analysis
J,t
number
and
Carg-
IB74.
for
Yang,
the
S.
System.
in
case
of and
mapped
determina-
C,
SIMS
AFIPS
to
the
reasons.
T.
a
FJCC
be
the
queries
a
14,
of
593
boolean
data and
high
language.
Canonical
CACM to
offers
-
an
1972,
ba@e access
and
597,
be
blgh Data hier-
atten-
programs.
In
Attribute
1971.
expression can
level
PartlculaP
data
Structure -
a
language.
conceptual
query
C.
I%
manipulation
transferability
Chiang~
this
and
Information
be
used
and
of
Holland~
mapping
go,
Paths
Access
Instltut
1872, for
may
1972.
paths.
definition,
archical
Wong,
North
W®,
1131,
files
has
of
Berlin,
Bibliographlsches
Proc.
modeling
User-Oriented
41~
on
1974.
access
vol.
level
Selection
is
Gruytem~
I.
Management,
concern
Integrateds
195.
On
Data
de
Datenbanksys%eme
Corsicav
The
194.
Datenorganlsatlon.
over
elementary
organized
according
becomes
essentially
the
113
problem
of
pu±%ing
a
boolean
expression
In%o
some
s%~ndard
~orm,
196.
Yao~
S.
±hrou~h
B.
Michlgan~
197.
Y u e 7 P. ondary also For
198.
The
basic
user
Wongt
C.
Selec%lon,
recen%
M.
-
frame
Op%Imlz~%ion Pho
D°
of F i l e
dlsser%a%ion~
Organization of
Universlty
K.
S%orage
IBM
Cos%
Research
Consldera%ions
Repot%
RC
5070~
in %o
Sec-
appear
IJCIS,
431
%hat
and
Index
other
M.
437,
userWs
and
Modeling.
1974.
C.~
in
Zloof,
Evalua%ion
Analy%ie
resul%s
Query
in
this
By
Example.
of
query
area
of
rese&~ch
197S
AFIPS
NCC
see
Cardenas,
Proc.
vol.
44,
1975.
features
pe~cep%ion
of
of
manipula%ing
of
reference
fills
da%a
example
processing
%ables
consis%ing
informa%ion.
by
in of
in
are %his
illustrated. query
a graphically table
skele%onsv
language
The is
pre--estebllshed in%o
which
%he
Grundlegendes
zur Speicherhierarchie
Claus Sch~nemann~
1.
IBM B6blingen
EINLEITUNG
Das Thema dieses Beitrags ist die konkrete Daten-Speicherung und -Adressierung unter Zugrundelegung eines hierarchischen Aufbaus des Speichersystems. Soweit Datenbankaspekte dabei berahrt werden~ sind sie aus der Sicht der Hardware-Implementierung
und vorwiegend unter Leistungsgesichtspunkten
gesehen. Heutige Computer-Speichersysteme
sind bereits weitgehend hierarchisch
strukturiert. Dabei soll unterschieden werden zwischen einer lediglich dutch Kapazit~tsabstufung gekennzeichneten und einer strengen Hierarchie, bei der auf jeder Stufe wahlfreier Zugriff m~glich ist und der Datenflug keine Stufe ~berspringt. Die Kombination Hauptspeicher - Pufferspeicher stellt eine strenge Hierarchie dar, bei der der Hierarchiebegriff fiberhaupt erst ins Bewugtsein ger@ckt wurde
[11. Der Pufferspeicher
(Cache) ist far die Maschinenar-
chitektur transparent und pagt die Geschwindigkeit des Hauptspeichers an die noch h~here des ~rozessors an. Ebenso ist die Folge Hauptspeicher Magnetplattenspeicher
als strenge Hierarchie anzusprechen, auch wenn
diese Betrachtungsseite
(mit Ausnahme von Programm-Paging im Rahmen des
virtuellen Speichers) bislang nicht im Vordergrund stand und der Plattenspeicher mehr als Ein/Ausgabeger~t aufgefagt und so yon der Maschinenarchitektur behandelt wurde. Der Magnetbandspeicher
ist wegen seiner langen Zugriffszeit
(incl. Band-
laden) nicht mehr im strengen Sinne zur Hierarchie zu rechnen.
115
Ans~tze,
die gro~e und billige Bandspeicherkapazit~t als echte oberste
Datenflu~-Hierarchiestufe
zu integrieren,
sind mit der j~ngeren Entwick-
lung yon automatischen Bandtransportsystemen, Kassettenspeicher,
wie z.B. beim IBM 3850-
sichtbar geworden. Dabei k6nnte beispielsweise dem
Bandspeicher die Funktion eines Archivs und dem Plattenspeicher die Funktion eines Arbeitsspeichers groSer Kapazit~t zugeordnet werden, wobei der Inhalt ganzer virtueller Plattenstapel automatisch auf Verlangen auf das Plattensystem @bertragen wird [2]. In Abbildung ] i s t
das Schema
dieses Hierarchiekonzepts skizziert. Der schwache Punkt der gegenw~rtigen Speicherhierarchie ist das Verh~Itnis der Zugriffszeiten des Hauptspeichers
zum Plattenspeicher yon mehr
als 1:1OOOO, die sog. Zugriffsl~cke. Auch ein Dazwischenschalten von Trommelspeichern bzw. Plattenspeichern mit festem Lesekopf ~ndert die Situation nicht wesentlich. Man versucht daher bekanntlich, h~Itnis durch Programmumschaltung
das Mi~ver-
im Rahmen yon Multiprogrammierung
zu
fiberbr~cken. Mit fortschreitender Prozessor- und Hauptspeichergeschwindigkeit, aber gleichbleibender Zugriffszeit der mechanisch arbeitenden Massenspeicher,
muB der Multiprogrammierungsgrad,
die Hauptspeichergr~$e
und die Zahl der Plattenspindeln immer gr6Ber werden. Damit entfernt man sich vom Kostenoptimum, au~erdem steigen die Anforderungen an das steuernde Betriebssystem und seine Komplexit~t,bei abnehmender Effizienz. Im Folgenden wird versucht,
f~r das gesamte Hierarchiespektrum die Spei-
cherparameter nach einheitlichen Gesichtspunkten zu klassifizieren und anhand solcher Parameter die Leistungsf~higkeit der Hierarchie zu diskutieren, mit besonderer Blickrichtung auf das Problem der Zugriffsl~cke. Die Anforderungen des Datenbankbetriebes werden kurz angesprochen.
2.
TECHNOLOGIE- UND OPERATIONSPARAMETER
Es sind zahlreiche Technologien bekannt, die unter Ausnutzung verschiedenster physikalischer Effekte zu sehr unterschiedlichen Speichereigenschaften f@hren. Am verbreitetsten ist heute die Halbleitertechnologie f~r die schnellen elektronischen Matrix-Speicher mit wahlweisem Zugriff und die Magnetschichttechnologie
f~r die langsameren und billigen Massen-
speicher, haupts~chlich in den Ausf~hrungen Platten- und Bandspeicher. Bine weitere Gruppe, die aber noch nicht das Stadium breiter Produktreife erreicht hat, ist die der optischen und mit Elektronenstrahl
operierenden
116
Speicher [3r4]. Auch die diversen Schieberegistertechnologien wie CCD (Charge Coupled Device)
[5,6] oder Magnetblasen (Bubbles)
[7] machen
vorerst nur tastende Schritte im kommerziellen Einsatz. Die spezifischen Arbeitsweisen der einzelnen Speicherfamilien sollen hier nicht diskutiert werdenr vielmehr wird das gesamte Speicherspektrum einheitlich durch einen Satz von invarianten technologischen und operativen Parametern beschriebenr Tabelle I. Die beiden wichtigen Operationsparameter, mittlere Zugriffszeit und Bitkostenr stehen in einer gewissen reziproken Relation zueinander. Sie bestimmen den Standort einer Technologie innerhalb des Gesamtspektrums. Im Diagramm Abb. 2 sind heutige typische Werte in Abh~ngigkeit des gewichtigsten Technologieparameters, Bitzahl pro Schreib/Lesestation, dargestellt
[8].
Die Zugriffszeit setzt sich zusammen aus der Zugriffszeit im engeren Sinner einer Art Totzeit vor der 0bertragung des ersten Bit, und der Daten~bertragungszeit. Die 0bertragungszeit ist abh~ngig yon der Datenrater gegeben durch Taktfrequenz und interne Bitbreite, und der gew~hlten ~bertragenen Blockl~nge. Zus~tzliche Verz6gerungen durch den externen 0bertragungskanal sind in der Obertragungszeit mitenthalten. Unter Modularit~t ist die Unterteilbarkeit eines Speichers bzw. einer Hierarchiestufe in Module mit eigenem parallelen Zugriff verstanden. Dadurch wird die Zugriffsrate erh~ht. Die F~higkeit zur modularen Aufteilung nimmt im allgemeinen ab mit dem Technologieparameter "Bitzahl pro Schreib/Lesestation'. Bei mechanischer Entkopplung zwischen Lesen/ Schreiben und dem Datentransport kann die Zugriffsrate dutch Oberlappung welter erh6ht werden. So wird beim Bandkassettenspeicher IBM 3850 die n~chste Kassette schon transportiert, w~hrend die vorhergehende sich noch in der Lese/Schreibstation befindet. Weitere Beispiele fur asynchronen Parallelbetrieb sind die Konfiguration mehrerer Plattenspeicher in einer DV-Anlage wie auch die Unterteilung des Hauptspeichers in unabh~ngig und parallel arbeitende Module. Auch die Bitkosten bestimmen sich in erster Linie aus der Bitzahl pro Lese/Schreibstation. Sie sind auger yon den spezifisch technologischkonstruktiven Faktoren vom allgemeinen Miniaturisierungsstand der Technik abh~ngig. Abb. 3 zeigt beispielsweise die historische Entwicklung der Bitdichte beim Magnetplattenspeicher. Entsprechend sind die Zahlenangaben
117
in Abb. 2 nur zeitbezogen zu verstehen.
Die relativen Zuordnungen dOrf-
ten hingegen weitgehend invariant zum allgemeinen Stand der Technik sein, da fortschreitende Miniaturisierung allen Technologien zugute kommt. Die Speicherkapazit~t pro Hierarchiestufe ergibt sich in einer ausgewogenen Konfiguration nach einer Art reziproker Funktion der jeweiligen Bitkosten Ein weiterer operativer Parameter ist die Zuverl~ssigkeit des Speichers, d.h. die mittlere Zahl yon gelesenen Bits pro fehlerhaftem Bit. Dieses Merkmal ist eine Funktion der natOrlichen Fehlerfreiheit des Mediums, des Sortierungsgrades nach guten Einheiten und des Aufwands an gezielter Redundanz mit nachfolgender Fehlerkorrektur. Die Fehlerdichte des Mediums nimmt n a t u r g e m ~
mit der Homogenit~t ab. Typische Zuverl~ssigkeitswerte
sind (nach entsprechendem Sortierprozess) z.B. beim fabrikneuen Plattenspeicher 10 9 und 1012 nach erfolgter Korrektur. Die physikalische Natur der Speicherung bestimmt den Grad der Fl~chtigkeit der eingeschriebenen Information. Bei einem Arbeitsspeicher kann man eine gewisse Fl@chtigkeit mit periodischem Wiederauffrischen zulassen, bei einem Archiv- oder Journalspeicher mud nat~rlich ein dauerhaftes Speichern gefordert werden. In gewisser Verwandtschaft
zur FiOchtigkeit steht die Eigenschaft des
ON-line oder OFF-line Einschreibens, ROM verstanden.
letzteres auch allgemein unter
Bei verschiedenen Anwendungen,
kumenten mit geringer ~nderungsfrequenz,
z.B. Speicherung yon Do-
kann der ROM-Speicher durchaus
sinnvoll und, da entsprechend billig, von Interesse sein. Ein Obergang zwischen dem normalen schreibbaren Speicher und dem ROM stellt der PROM bzw. EAROM (Programmable bzw. Electrically Alterable Read Only Memory) dar. Der ROM-Speicher wird bier nicht weiter behandelt. Der letzte Operationsparameter
ist die adressierbare Einheit, die im
Verein mit der eigentlichen Zugriffszeit die Komplexit~t der Zugriffsmethode und Effizienz des Datensuchens bestimmt. Man unterscheidet zwischen Orts- und Inhaltsadressierung. sierung ist auf Hauptspeicherebene
Die Ortsadres-
die dominierende Adressierungsart:
Die physische Lokation jedes Datenelementes ist vom Programm definiert und wird Ober die Adresse direkt gefunden. Dieses Konzept ist auf den h6heren Speicherebenen f~r das Aufsuchen yon Datens~tzen nicht mehr zweckm~6ig, wenn die S~tze z.B. in Form einer Datenbank organisiert,
118
programmunabh~ngig und vielen Benutzern verf~gbar sein sollen. Sie m~ssen also letztlich durch ihren Inha!t, gegeben durch ein oder mehrere Merkmale, gekennzeichnet sein. Innerhalb eines Satzes sind die Daten im allgemeinen wieder formatiert, d.h. ihre semantische Bedeutung ist durch ihren relativen Ort bestimmt. Die heutige Suchtechnik bei inhaltsadressierten Datens~tzen bedient sich Indextabellen,
in denen z~B. die Hauptmerkmale numerisch oder alphabe-
tisch geordnet und die reale Speicheradresse direkt zugeordnet ist. Beim Vorliegen weiterer
(Neben-) Merkmale k6nnen diese in eigenen Ta-
bellen gelistet werden, wobei die Speicheradressen aller S~tze, die dieses Merkmal enthalten, wieder zugeordnet werden. Mit diesen invertierten Listen kann bekanntlich der Prozess des Suchens nach mehrfachen Merkmalen schnell, d.h. ohne alle S~tze sequentiell prozessieren zu m~ssen, durchgef~hrt werden. Mit Hilfe der Indextabellen wird also die Inhaltsadresse eines Datensatzes
in eine Ortsadresse umgewandelt.
Letz-
tere wird dann beim Speichern mit wahlfreiem Zugriff schnell und direkt angesteuert. Das Durchsuchen der Indextabellen nach dem gew@nschten Merkmal stellt in sich nun wiederum einen Proze~ mit sequentieller Schrittfolge dar. Ein weiteres Parallelisieren w~re das Abspeichern der Indextabellen in Assoziativspeichern,
mit folgenden Vorteilen:
Fortfall der numerischen oder alphabetischen Merkmalsordnung. Dadurch einfache Aufarbeitung durch direktes Zuf~gen/Entfernen neuer Indizes. Fortfall der invertierten Listen, da gleichzeitig auf mehrfache Merkmale assoziiert werden kanno Direktes gleichzeitiges statt sequentielles Suchen. Die Eigenart des Assoziativspeichers,
eine Formatierung der Daten zu
verlangen, w~re in diesem Fall kein Nachteil. Ein Sonderfall der Ortsadressierung
ist die Adressierung mit Zeigern.
Dabei wird auch eine Entkopplung yon Benutzerprogramm und Datenadresse erreicht. Nachteilig ist das sequentielle Durchlaufen der Zeigerkette. Die einzelnen Speichertechnologien unterscheiden sich nun hinsichtlich der GrS~e der h a r d w a r e - m ~ i g
adressierbaren Einheit. Diese ist z.B. ein
119
Byte beim (Halbleiter-) Matrixspeicher,
ca. 10-20 KBytes beim Platten-
speicher und Millionen yon Bytes beim konventionellen Bandspeicher. Wenn diese adressierbare Einheit nun gleich oder kleiner als die gewfinschte zu fibertragene Blockl~nge ist, soll von wahlfreiem Zugriff gesprochen werden. Der Plattenspeicher hat nur einen semi-wahlfreien Zugriff, da seine Adressiereinheit
(die Spur) um ein Vielfaches grS~er als eine bequeme
logische Satzl~nge bzw. eine ffir diese Hierarchiestufe optimale Blockl~nge ist. Der konkrete Block mu~ dann wieder sequentiell auf der Spur gesucht werden. Die sogenannten Zugriffsmethoden,
also die praktischen Prozeduren zum
Aufsuchen von Datens~tzen spiegeln die jeweils zugrundeliegenden technologischen Adressierparameter wider. Ein Beispiel ist die index-sequentielle Zugriffsmethode ffir "direkten wahlfreien" Zugriff zum Plattenspeicher:
Dabei sind die Hauptmerkmale
der Datens~tze in einer Indextabelle nach aufsteigender Ordnungszahl geordnet. Die Tabelle ordnet jeweils einer Gruppe von S~tzen die zugeh~rende Spuradresse auf der Platte zu° Auch die S~tze selbst sind nach der gleichen Ordnungszahl geordnet, um im Falle sequentiellen Zugriffs die gro~e Zugriffszeit ffir jeden individuellen Satz zu eliminieren. Beim Rotieren der Platte werden die ausgelesenen Satzmerkmale mit dem Suchmerkmal verglichen, his 0bereinstimmung herrscht. Beim Aufarbeiten,
z.B.
Zuffigen eines weiteren Satzes in die m6glicherweise physisch lfickenlose Satzfolge, weist ein Zeiger zu einer neuen Spuradresse auf einer 0berlaufspur. Die Methode kombiniert also die Suchelemente Indextabelle, sequentielles Suchen und Zeigertechnik zu einer den spezifischen Plattenspeicherbedingungen angepa~ten Prozedur, Abb. 4a. Bei einem anderen Speicher mit auch homogenem Medium, dem Elektronenstrahl-Speicher,
ist die Adressiereinheit
frei w~hlbar zwischen einem
und Zehntausenden yon Bytes. Das Zugriffsverfahren kann rein indexorientiert und entsprechend einfach gehalten werden: Das sequentielle Suchen entf~llt. Ein 0berlaufproblem existiert nicht. Dank der kurzen eigentlichen
(elektronischen)
Zugriffszeit kann auf eine sequentielle
Satzordnung verzichtet und der Satz an beliebiger Stelle gespeichert werden, Abb. 4b. Die gr6~ere Adressiereinheit,
d.h. die geringere "Wahlfreiheit", bei
!20
den kosteng~nstigen Technologien ist an sich kein prinzipieller Nachteil, da innerhalb einer Hierarchie ohnehin mit Block@bertragung gearbeitet wird. Ein gradueller Nachteil ist nur dann festzustellen, wenn wie beim Plattenspeicher optimale Blockl~nge und technologische Adressiereinheit nicht ~bereinstimmen.
Diese Diskrepanz schl~gt sich dann in aufwendigen
und zeitraubend ab!aufenden "Zugriffsmethoden" nieder.
3.
SPE ICHERHIERARCHIE
Aufgabe eines Speichersystems
ist neben der Speicherung,
dem Prozessor
die ben6tigten Daten in gen~gend kurzer Zeit und in der angeforderten Menge pro Zeiteinheit zur Verf@gung zu stellen. Analog zu den SystemLeistungsparametern Antwortzeit und Durchsatz l ~ t
sich die Speicher-
leistung durch die Parameter Zugriffszeit und Zugriffsrate definieren. Wenn ein Speicher nur einen Zugriff gleichzeitig gestattet,
kann die
Zugriffsrate etwa gleich dem reziproken Wert der Zugriffszeit gesetzt werden. Bei gleichzeitig mehreren Zugriffen,
d.h. Modularit~t gr6~er
als I, erh~ht sich die maximale Zugriffsrate entsprechend. Wie weir die maximale Zugriffsrate ausgenutzt werden kann, h~ngt yon Parametern wie Systemsteuerung,
Programmprofil, Multiprogrammierungsgrad
und Zahl der
Parallelprozessoren etc. ab. In einer Hierarchie
ist eine gewisse Grundmodularit~t der einzelnen
Stufen schon im Interesse eines gleichzeitigen Datenverkehrs nach oben und unten w~nschenswert.
Dies wird steuerungsm~6ig z.B. auf Hauptspeicher-
ebene durch das unabh~ngige Operieren yon Prozessor und Kan~len erreicht. F~r effektive Multiprogrammierung tenspeicherstufe
ist ausreichende Nodularit~t der Plat-
zwingend Voraussetzung.
Zweck der Multiprogrammierung
ist es, die resultierende Zugriffsrate - gemessen an der Schnittstelle zum Prozessor - und damit den Systemdurchsatz
zu erh6hen.
Bekanntlich liegt dessenungeachtet der Engpa~ f@r den Durchsatz heutiger DV-Systeme immer noch bei der Zugriffszeit und Zugriffsrate des Plattenspeichers. Da weitere Geschwindigkeitsfortschritte Halbleiterspeicher
f@r Prozessor und
in Zukunft durchaus erwartet werden d~rfen, die Plat-
tenspeicher-Zugriffszeit
abet kaum noch verbesserungsf~hig ist, wird
dieses Problem immer dr~ngender: Multiprogrammiergrades,
Eine L~sung Qber weitere Erh6hung des
d,h. der Zahl der gleichzeitig operierenden
Programme, mit entsprechender Erh6hung von H a u p t s p e i c h e r g r ~ e tenspeichermodularit~t
und Plat-
erscheint aus Kosten- und Komplexit~tsgrfinden
121
unpraktikabel. Au~erdem leidet bei zu hohem Multiprogrammierungsgrad die Effizienz: Die Systemverwaltung nimmt relativ zur Wirkarbeit zu, die Chance, mit einer Plattenarmposition mehrfache Zugriffe abzudecken, nimmt ab usw. Eine andere L6sung dieses Problems ist der weitere Ausbau des Speicherhierarchiekonzeptes,
bei beschr~nktem Multiprogrammierungsgrad.
(nicht realisierbare)
Der
ideale Speicher, d.h. der Speicher mit der Zu-
griffszeit des Pufferspeichers und den Kosten des Bandspeichers, l ~ t sich durch eine ausgewogene Hierarchie mit gen@gend feiner Stufung ann~hern. Gl~cklicherweise verspricht die technologische Entwicklung Speicherprodukte, die leistungs- und k o s t e n m ~ i g
gerade das Gebiet der "L~cke" aus-
f~llen und sich so gut in das Spektrum einf~gen. M~gliche Technologien f~r die "L@cke" sind z.B. der CCD-Schieberegisterspeicher,
der Schiebe-
registerspeicher mit verschiebbaren magnetischen Blasen (Bubbles) sowie die Elektronenstrahlspeicherr~hre,
Abb. 5. Diese Technologien sollen im
Folgenden elektronische Massenspeicher genannt werden.
3.1
Hierarchiemechanismus
Die Speicherhierarchie besteht also aus der Hintereinanderschaltung yon Speicherstufen, wobei mit zunehmender Stufenordnungszahl
die Zugriffszeit
und Speicherkapazit~t zunimmt. Bei einem Speicherzugriff des Prozessors versucht dieser zun~chst, die Daten auf der untersten schnellsten Ebene zu finden. Bei Mi~erfolg wird zur n~chsten Ebene zugegriffen und so fort. Bei einer Daten@bertragung auf die jeweils niedere Ebene wird nun nicht nur das verlangte Wort oder Byte, sondern gleich ein ganzer Block ~bertragen. Auf jeder unteren Ebene wird ein
Teil
des Blocks abgelagert.
Die 0bertragungszeit ist bei den gew~hlten Blockl~ngen meist klein gegen die eigentliche Zugriffszeit. Das Wesen der Speicherhierarchie dr~ckt sich also darin aus, da~ unter Zulassung yon geringfOgig mehr Zugriffszeit (n~mlich incl. 0bertragungszeit) @bertragen werden,
ganze Daten- oder Programmbl6cke
in der Annahme, da~ davon ein Yell in n~chster Zukunft
ohnehin zum Prozessieren angefordert wird. Es liegt also ein prophylaktischer Zugriff (look ahead) unter Ausnutzung der (gegen die eigentliche Zugriffszeit) kurzen 0bertragungszeit vor. Unterst@tzt wird dieser Mechanismus dadurch, da~ die Daten oftmals in kurzem Zeitraum mehrfach zugegriffen werden,
z.B. bei Programmschleifen,
abet auch beim Operieren
122
auf h~ufig benutzte Arbeitsdaten Die Trefferrate, gegriffenen Ebene,
d~ho die Wahrscheinlichkeit,
Ebene anzufinden,
ferner im allgemeinen
sie nat~rlich
folgt im einfachsten
kann selbstverst~ndlich
bei denen jeder Zugriff software-implementiert
Datenteile
und entsprechend
Einspeichern z.B.
usw. Auf den h6heren Ebenen, eingeht,
ist die Steuerung
"intelligenter".
fiber einen das Gesamtspeichersystem
L~fassenden
erfolgen. enthielte
ordnung der virtuellen
Entwicklung
in einer Speicherhierarchie: speicheradresse Hauptspeicher
gibt es meist mehrere Adressr~ume wird die reale Haupt-
Platz im Pufferspeicher
Indextabellen
umfa~t,
Zu-
zur lokalen Ebenenadresse.
Auf Pufferspeicherebene
einem bestimmten
den inhaltsadressierten
der realen Adresse
h6heren Hierarchiestufen die Datenlokalisierung:
zugeordnet.
Beim
die also bereits zugeerdnet.
Bei
fibernehmen die vorer-
Logisches
und hierarchie-
Suchen wird identisch.
Die Zuordnungstabellen Ebenen gespeichert~
werden
Beim
entweder auf der gleichen oder auf unteren
(schnellen)
einem eigenen mehr oder weniger
Pufferspeicher
assoziativ
eines Archivspeichers~
der alle Daten im 0N-line
einen magnetischen
Bandspeicher
und einem Prozessorsystem, und einer Hierarchie
wird die Tabelle
arbeitenden
Man kann sich so das gesamte DV-System vorstellen spielsweise
fQr die dynamische
wird die heute meist virtuelle Adresse,
einen grS~eren Adressraum
spezifisches
dann eine Tabelle
Gesamtspeicheradresse
Aufgrund der histerischen
transport~
Algo-
Dieser Mechanismus
in untere schnelle Ebenen,
im Hauptspeicher
er-
und das Suchen yon Daten auf einer Ebene kSnnte kon-
Jede Hierarchiestu£e
Prozessor
h~ngt
ab.
nach den gebr~uchlichen
(Least Recently Used).
in die Leistungsbilanz
zeptuell am einfachsten
w~hnten
dieser
Davon unabhgngig
unterstfitzt werden durch residentes
Teile des Betriebssystems
zu-
auf einer geffillten Hierarchiestufe
Fall selbstregelnd
gewisser hgufig gebrauchter
Die Adre~steuerung
zu mit der Speicherkapazit~t
Daten- und Programmprofil
yon Speicherplatz
rithmen wie FIFO oder LRU
Adressraum
nimmt
Kataloge usw.
Daten auf der jeweils
mit der Blockl~nge.
vom jeweiligen
Das Freimachen
wie Indextabellen~
Speicher
in
gehalten.
als die Kombination Zugriff enth~it,
mit automatischem
bei-
Band-
das wiederum aus dem eigentlichen
yon Arbeitsspeichern
besteht.
Die vet-
123
schiedenen,
teilweise im vorigen Abschnitt diskutierten Technologie-
und Steuerungsparameter variieren entlang der Hierarchieachse wie in Abb. 6 skizziert.
3.2
Leistungsbetrachtung
Das wichtigste Kriterium der Speicherhierarchie ist die Gesamtzugriffszeit bzw. Gesamtzugriffsrate,
absolut gesehen als auch kostenbezogen.
Diese Zusammenh~nge sollen im folgenden anhand eines sehr einfachen Modells diskutiert werden. Das Modell orientiert sich an "typischen" Werten f@r die verschiedenen Parameter und extrapoliert bei nicht bekannten Daten. Wie das Technologiediagramm Abb. 2 bereits indiziert, scheint eine nat~rlich einfache G e s e t z m ~ i g k e i t
zwischen den Bitkosten und der Spektrums-
variablen Zugriffszeit zu bestehen. Diese und die Zuordnung der Trefferrate und Speicherkapazit~t diagramm Abb. Gerade
zur Zugriffszeit sind im Modellparameter-
7 aufgetragen. Die Kapazit~tsverteilungskurve
ist als
(im log. Ma~stab) angenommen, mit den Endpunkten Puffer- und
Archivspeicher. Die gew~hlte Archivkapazit~t ist 1012 b, die Pufferkapazit~t 200 Kb. Die auf der Geraden liegenden Punkte f@r Haupt- und Plattenspeicher entsprechen etwa realen Werten. Die Kapazit~tsverteilungskurve ist an sich nat@rlich innerhalb des technologisch verf~gbaren Spektrums frei w~hlbar. Mit wachsender Prozessorleistung und Datenmenge wird sie nach oben verschoben werden. F~r die Trefferrate im multiprogrammierten Stapelbetrieb liegen als Funktion der Kapazit~t und Blockl~nge einige Erfahrungsdaten im Bereich Puffer - Hauptspeicher vor [9]. Typische Werte daf~r wurden der Modellkurve zugrundegelegt.
Zu den oberen Hierarchieebenen ~in wurde extrapoliert.
Das Modell ber~cksichtigt nicht die gegenseitigen Abh~ngigkeiten von Blockl~nge,
Zugriffszeit, Trefferrate, Multiprogrammierungsgrad usw.,
sondern nimmt starr typische Werte an. Die Gesamtzugriffszeit ist
tges = t1+(1-hl)t2+(1-h2)t3 + .... (1-hn_1)t n
GI. I
124 mit tn ~ Zugriffszeit der n-ten Stufe hn = T r e f f e r r a t e
der n-ten Stufe
Die maximale Gesamtzugriffsrate,
d.h. der Zugriffsflu~ an der Schnitt-
stelle zum Prozessor ist I
max° Zges = tt
1,hl
GI. 2
l_,hn_l
P-~I+ - ~ 2 t2+ . . . .
Pn
tn
mit Pn = Zugriffsparallelit~t auf der n-ten Stufe. Die Zugriffsparallelit~t entspricht in etwa der Modularit~t. angenommen, da~ 50% der Zugriffsparallelitgt
Es wird
sich jeweils in echter
Erh~hung der Zugriffsrate durch Multiprogrammierung niederschlagen, Peff also 0,5 po Ferner, da~ unterhalb der Plattenspeicherebene Programmumschaltung nicht mehr lohnt (p=1) und schlie~lich,
da~ Einzel-
Prozessorbetrieb vorliegt. GI. 2 modifiziert sich dann entsprechend. Einige Modellergebnisse auf der Grundlage realer Technologien sind in Tabelle II zusammengestellt.
Unterschiedliche
Speicherzugriffsraten
schlagen sich in unterschiedlicher Prozessorauslastung nieder. Es wurde ein Modeilprozessor mit 2 MIPS (Millionen Instruktionen pro Sekunde) und durchschnittlich
2 Zugriffen pro Instruktion gewghlt. Dieser Pro-
zessor kann seine volle Leistung nur entfalten, wenn das Speichersystem 4 Millionen Zugriffe pro Sekunde z u l ~ t . Die schlechte Auslastung dieses 2-MIPS-Prozessors bei heutiger Konfiguration ohne Multiprogram~ierung ~berrascht nicht. Auch mit Multiprogrammierung ist die Auslastung nur mg~ig. Erst die Einf@hrung des elektronischen Massenspeichers erbringt eine Verbesserung auf eine vern@nftige Gr6~enordnung.
Bei Multiprogrammierung
verlagert sich jetzt der Engpa~ f@r die Zugriffsrate vom Plattenspeicher (mit seiner hohen Modularit~t)
zum Bandspeicher. Dieser Engpa~ k6nnte
~berwunden werden durch weitere Erh6hung der Hierarchiestufenzahl,
kon-
kret durch Einbau einer Zwischenstufe zwischen Platten- und Bandspeicher.
125
Technologisch liegt eine solche Stufe im Bereich des Sichtbaren, n~mlich ~ber eine Modifizierung des konventionellen Plattenspeichers
zu
einem Satz yon flexiblem Platten mit sehr hoher Bit-Volumendichte
[9].
Die Zugriffsrate der Hierarchiekonfiguration
liegt dann oberhalb yon
4 Millionen pro Sekunde. Die Ergebnisse aus Tabelle II werfen die Frage nach der optimalen Hierarchiestufung auf, bei festgehaltenen Endpunkten.
Ffir diese Analyse wird
ohne Bezug auf reale Technologien eine g l e i c h m ~ i g e
Stufung vorgesehen
und die Stufenzahl variiert. Multiprogrammierung wird jetzt nicht ber@cksichtigt. Ergebnisse sind in Abb. 8 aufgetragen:
Bei ca. 16 Stufen
stellt sich ein Sgttigungswert fur die Zugriffsrate ein (die in diesem einfachen Fall der reziproke Wert der mittleren Zugriffszeit ist). Diese Zugriffsrate ist nur etwa 2 mal kleiner als die der reinen Pufferspeicherstufe. In Abb. 8 ist weiterhin die Preisleistungszahl, pro Gesamtbitkosten,
n~mlich Zugriffsrate
aufgetragen.
Hier liegt das Optimum bei ca. 8-10 Stufen. Die Verbesserung gegenfiber einer 4-stufigen Hierarchie ist g r ~ e r
als Faktor 6. Auf der Grundlage
der realeren Daten in Tabelle II ist der Gewinn bei einem Schritt von heutigen 4 Stufen auf (die durchgespielten)
6 Stufen noch wesentlich
h6her, da dort nicht von einer gleichmg~igen Stufung ausgegangen wurde. Ein weiterer Vorteil der feineren Hierarchiestufung ist die Verbesserung des Prozessor-"Wirkungsgrades":
Die Zahl der Zugriffe zum Platten- und
Bandspeicher nimmt ab. Damit nimmt auch die Zahl der prozessierten Instruktionen
(der Zugriffsroutinen) pro Zugriff zur Speicherhierarchie
ab, und der Prozessor-"Wirkungsgrad"
nimmt zu. Schlie~lich kann das Be-
triebssystem einfacher gehalten werden. In diesem Modell ist der Zuverl~ssigkeitsaspekt nicht enthalten, der mit wachsender Stufenzahl kritischer wird. Ebenso sind die Kosten der Steuerungen, Adresstabellen, Trefferratenkurve
etc. nicht ber@cksichtigt.
Die Extrapolation der
ist v611ig hypothetisch. All dessert ungeachtet d~rfen
die Modellergebnisse als Indiz daffir verstanden werden, dab eine feinere Hierarchiestufung noch erhebliches Leistungspotential
enth~it.
126
4.
SPEICHERASPEKTE BEI DATENBANKBETRIEB
Auch der Datenbankbetrieb kann grunds~tzlich in die bisherige Modellbetrachtung eingenordnet werden° Derjenige Parameter, der sich m~glicherweise
(in Richtung ungQnstiger Werte) ~ndert, ist die Trefferrate,
insbesondere auf den hohen Ebenen. Erfahrungen dar~ber m@ssen abet erst gewonnen werden, sodag hier die Modellwerte beibehalten werden,
zumal
auch bei der Datenbank ein gewisses "Nachbarschafts"-Verh~Itnis
yon
Anfragen festzustellen sein dQrfte. Praktisch-anschaulich k~nnte man sich eine Funktionsverteilung
auf die einzelnen Hierarchiestufen wie in
Tabelle III skizziert, vorstellen. Zugriffsrate m~ssen v o n d e r
Datengruppen mit hoher professioneller
Archivstufe auf die Plattenspeicherstufe
resident ausgelagert werden. Der spezifische Datenbank-Leistungsparameter die zul~ssige Anfragenrate.
ist, neben der Datenmenge,
Diese sollte mit wachsender Datenbankkapa-
zit~t auch ansteigen. Die folgende 0berschlagsrechnung m~ge einige Veranschaulichung bringen: Nach Tabelle II ist bei heutiger Hierarchie und Multiprogrammierung die Modellzugriffsrate
~85 M/s. Wenn wir einen Programmablauf von durch-
schnittlich 100 K Instruktionen pro Datenbank-Anfrage
annehmen, w~rde
das System 4.25 Anfragen pro Sekunde erlauben. Dieser Wert dfirfte bei einer Datenbank-Kapazit~t yon 1012 b nicht ausreichen. Nach BinfQhrung des elektronischen Massenspeichers
erh~ht sich die Anfragenrate auf 14
pro Sekunde, Mit einer zus~tzlichen Zwischenstufe zwischen Platten- und Bandspeicher erh6ht sie sich auf ca. 30 pro Sekunde - entsprechende Prozessorleistung von ca. 3 MIPS vorausgesetzt. Die letzten Endes interessierende Frage, wieviele Terminals an eine Datenbank dieser Gr6ge bei befriedigender Bedienung angeschlossen werden k6nnen, h~ngt natQrlich yon der mittleren Anfragelast pro Terminal ab. Bei einer angenommenen mittleren Last yon einer Anfrage pro Terminal und Minute errechnet sich eine Terminalzahl von 30.60=1800. Diese Anschlugm6glichkeit pro 1012 b Datenbankkapazit~t
erscheint ausreichend.
Als Schlugfolgerung aus diesen Betrachtungen soll die Feststellung getroffen werden, dag Organisation und Technologie zukQnftiger Speichersysteme das Potential haben, den Leistungsanforderungen eines breiten Datenbankbetriebes
gerecht zu werden.
127
Literatur [ I] C.W. Pugh, "Storage Hierarchies:
Gaps, Cliffs and Trends",
IEEE Transactions on Magnetics, Vol. Mag-7, No. 4, Dez. 1971 [ 2] C. Johnson, "IBM 3850-Mass Storage System", Nat. Comp. Conf.
1975, S. 509
[ 3] J. Kelly, "The Development of an Experimental Electron-BeamAddressable Memory Module", Computer, Februar 1975 [ 4] W.C. Hughes et. al., "BEAMOS, A New Electronic Digital Memory", Nat. Comp. Conf. [ 5] G.F. Amelio,
1975, S. 5-41
"Charge-Coupled Devices for Memory Application",
Nat. Comp. Conf. 1975, S. 515 [ 6] W.S. Boyle et. al., "Charge-Coupled Devices - A New Approach to MIS Device Structures", IEEE Spectrum, Juli 1971, S. 18 [ 7] A.H. Bobeck et. al., "A New Approach to Memory and Logic: Cylindrical Domain Devices", Proc. AFIPS Conf., Vol. 55, 1969 [ 8] R.R. Martin et. al., "Electronic Disks in the 1980's", Computer, Februar 1975, S. 24 [ 9] D.H. Gibson, "Considerations
in Block-Oriented Systems Design",
AFIPS Proc., Vol. 30, SJCC 1967, S. 75-80
128
I m
SPEICHERMEDIUM (HOMOGENIT~T, BITDICHTE)
BiTZAHL PRO SCHREIB-LESE-STATION ]-ECHNOLOGIE - (MATRIX-/SEQUENTIELLE ANORDNUNG) PARAMETER
-
i -
ATENTRANSPORT
ZUGRIFFSZEIT
i- OBERTRAGUNGSZEIT = F(OBERTRAGUNGSBREITE, TAKTFREQUENZ)
BLOCKL~NGE,
- MODULARITAT----ZUGRIFFSRATE )PERATIONSPARAMETER
- BITKOSTEN---KAPAZIT~T - ZUVERLASSIGKEIT - FLOCHTIGKEIT ,- ADRESSIERBARE EINHEIT (BYTE/BLOCK-ADRESSIERUNG)
TABELLE
I
SPEICHERPARAMETER
0,075 0,9
0,075 0,009
0,03 0,04
0,03 0,04
O,O3 0,04
P+H+E+SP+B
P+H+E+SP+B Multiprogr.
P+H+E+SP+F+B Multiprogr.
FP
,32 1
1,82
70
100
(0,3 4) 2,82
0,2
i
Pufferspeicher Hauptspeicher Elektronischer Nassenspeicher Starre Platte Flexible Platte Band
B
(Prozessor 2 MIPS,
TABELLE II
0 , 0 1 5 (O, 7) 5,88
2 Zugriffe/Instruktion)
1,4 1,32
47
1,87
0,53
0,3
3,2
2,1
0,67
1 ,27
21
0,85
(1,1
0,2
)
0,084
1,27
2,8
0,11
0,3
[~s]
[I06~1
GesamtKosten
9,3
Prozessor Auslastung
[%]
imax. Zges
[106/s1
B
tges
P H g SP FP
Modellhierarchie-Leistungsparameter
0,075 O,O O 9 % O O 4
0,9
O,03 0,04
SP
P+H+SP+B Multiprogr.
E 9
H
0 , 0 3 0,O4
P
t [ps]/Pelf
P+H+SP+B
KONF IGU RAT ION
~D
130
HIERARCHIEEBENE NR,
TECHNOLOGIE TYP. KAPAZITAT
FUNKTION
1
BIP PUFFER- 4-16K BYTES SCHNELLER ARBEITSSPEICHER FOR VERKNQPFUNG VON DATEN MIT SPEICHER PROGRAMMEN
2
FET HAUPTSPEICHER
5
I05-10ZB
BEREITSTELLUNGVON PROGRAMMEN UND DATENFOR OBERSCHAUBAREN OPERATIONSZEITRAUM
SCHIEBERE- I07-I09B GISTER- BZW E-ST~HLSPEICHER
HALTEN VON H~UFIGEN PROGRAMMEN Z,B. BETRIEBSSYSTEM UND ARBEITSDATEN Z.B, INDEXTABELLEN, DESKRIPTOREN, KATALOGE, ZEIGERNETZE USW.
PLATTENSPEiCHER
I08-1010B
BANDSPEiCHER (AUTOMAT, BANDTRANSPORT)
i010-i013 B DOKUMENTEN-DATENBANK DATENSICHERUNG, ARCHIVIERUNG
DATEIEN FOR PROFESSIONELLE BENUTZUNG, DATENSICHERUNG
TABELLE ZII
FUNKTIONSVERTEILUNG BEI DATENBANKBETRIEB
131 I Ill l
I
BANDSPEICHERMIT AUTOMATISCHEMLADEN
I I
rain II
1 L .......
PLATTENSPEICHER
HAUPTSPEICHER
---J PUFFERSPEICHER
I
I
~I
~ 40 ms
/~s
50 ns
STEUERKANALE
Abb. 1
SPEICHERHIERARCHIE HEUTE
i
BANDSPEICHERMIT ~~10s 1 MANUELLEMLADEN
I32
MATR ~X ~cts/bit I bits
SEQUENT|E L L
BiP FET BUBBLES
ROHRE PLATTE
E-
,
I
log i
@
I i 1 1 J I I l
o
•i
104
AUTOM. BAND
I
Xt
MITTLERE ZUGRIFFSZEtT
m
ADRESSIERBARE EINHEIT
D ×
i l
102
BITKOSTEN ( Marktpreise ) x
10-2 i _ _ |
I
I 102
104
108
106 I
1010 I t012 B!TS / LESE - SCHREIBSTATION t
J i
I
!
i
~
Abb. 2
el
~
i
Q
el
e~
mech
J
MEDIUM -
i
DATENTRANSPORT (ELEKTRONISCH / MECHANISCH )
4-I+ HOMOGENIT~T mech
OPERATIONSPARAMETER ALS FUNKTION OER TECHNOLOGIEPARAMETER
133
BITS
t 10 7-
3340 x CDC 9762 x x 3330 - 002
l o 6-
× 3330 - 0 0 1 x 2314 10 5-
10 4
× IBM 2311 I I
I
I
10 3.
II
10 2.
1960
I
!
1970
1980
x BITFL,~CHENDICHTE
BITS / INCH 2
• BITSPURDICHTE
BITS / INCH
• SPURDICHTE
SPUREN / INCH
Abb, 3
PLATTENSPEICHER -
BITDICHTE
JAHRESZAHL
134 ~NDEXTABELLE
SATZ 5 SATZ 2 SEQUENTIELLES SUCHEN
OATENSPUR
DIREKTE ADRESSE
0BERLAUFZEIGER SATZ 3 0BERLAUFSPUR
A) PLATTENSPEICHE R
I
I INDEX 2
ADR. X SATZ 3
INDEX 3
ADR. Y
i INDEX 5 ADR. Z
Abb. 4
ADRESSiERUNGSSYSTEME
SATZ 5
B) ELEKTRONENSTRAHLSPEICHER
135
SPEICHERKAPAZITAT BITS
1014 MAGN. BANDSPEICHER ( automatisch )
1012.
E-STRAHL 1010. MAGN. PLATTE,
108 -
106 -
104 -
102
I
10-8
I
10 - 6
I
10 - 4
!
10 - 2
1 ~4=,,,.-
1 ' LOCKE '
~-~ I
L Abb. 5
TECHNOLOGIE - 0BERSICHT
(ohne opt. Techn,)
I
102 ZUGRIFFSZEIT
s
136 DATENSPEICHER
5
i
AUTOMAT. BAND
1 4
HOMOG ENIT.,~T MEDIUM DATENTRANSPORT MECHANISCH ADRESSIEREINHEIT BLOC KLANG E ZUGRIFFSZEIT STEUERUNGSAUFWAND ( SOFTW,~,RE ) K.APAZITAT TREFFERRATE
'I
' L PLATT i I ~DR'TA~:S'~0J "L ......
I i '--SC"'EBEREO')4 I
2
' s I I J
]ADR. TAB. St. 3 - - 4 J
1 FET ADR. TAB. St, 2
STUFE B
1
BIP
l
,
!
m J
_jL
}..*'DATENRATE
1
TAKTFREQUENZBusBREITE HARDWARE
PROCESSOR
PROCESSORSYSTEM
Abb. 6
MODULARITAT BITKOSTEN DATENFLOCHTIGKEIT DATENTRANSPORT ELEKTRONISCH
I~
-
STEUERUNG
STEIGENDER TREND
PARAMETERTREND 0BER HtERARCHIESPEKTRUM
137 BITS PARALLELZUGRIFFE
CTS / BIT 1-h
BIT - KOSTEN
KAPAZIT,~T .1012
_ 1010
10-2
108
10-4
lO6
10-6
- 10 4
10-8
102
10-10.
I 10--8
I 10--6
I'" 10 .-4
I' 10 -2
I 1
I t0 2
ZUGRIFFSZEIT s
P H E SP FP B
Abb. 7
PUFFERSPEICHER HAUPTSPEICHER ELEKTRON. MASSENSPEICHER STARRE PLATTE FLEXIBLE PLATTE BAND
MODELLPARAMETER
138
10 6 S
~=10 6 $
1 ZUGRIFFSRATE 14-
12-
8 -3
// / //+/
GESAMTBtTKOSTEN
4.!
2-
|
2
I
I
6
I
I
10
1
l
I
14 -
Abb. 8
I
I
I
18
MODELLERGEBNISSE GLEICHM~.SSIGE STUFUNG { im log. Mal~stab )
~
STUFENZAHL
System R:
A Relational Data Base.Management System
Morton M. Astrahan, IBM Research Laboratory, San Jose, California Donald D. Chamberlin, IBM Research Laboratory, San Jose, California W. Frank King, IBM Research Laboratory, San Jose, California Irving L. Traiger, IBM Research Laboratory, San Jose, California INTRODUCTION System R is a data base management system which provides a high-level, non-procedural relational data interface. The system provides a high level of data independence by isolating the end user as much as possible from underlying storage structures. The system permits definition of a variety of relational views on common underlying data. Data control assertions,
features
are
also
provided,
including
authorization,
integrity
triggered transactions, a logging and recovery subsystem, and f a c i l i t i e s
for maintaining data consistency in a shared-update environment. The relational model of data was introduced by Codd [ I ] in 1970 as an approach toward providing solutions to the various outstanding problems of current data base management systems. In particular, Codd addressed the problems of providing a data model
or view which isdivorced from various implementation considerations (the data
independence problem) and also the problem ofproviding the data very
high-level,
non-procedural
stressed here that the relational model is a framework compatible
solutions
to
base user with
data sublanguage for accessing data.
these and other
or
problems in
philosophy
a
I t should be for
finding
data base management; the
relational approach is thought to make solutions more elegant and perhaps simpler but the
approach by i t s e l f does not solve these problems.
With this caveat in mind, our
f i r s t purpose is to b r i e f l y describe a related set of data base problems which we are attempting to solve in a coherent way following the relational approach. Our solutions are embodied in an experimental prototype
data
management system called
System R which is currently being designed, implemented, and evaluated at the IBM San Jose Research Laboratory. We wish to emphasize that System R is a vehicle for research in data base architecture, and is not available as a product. Furthermore, the ideas discussed in this paper should not be considered as having product implications.
140 To a large extent, the acceptance and value of the relational approach hinges on the demonstration that a system can
be b u i l t
which is
operationally
complete (can
actually be used in a real environment to solve real problems) and has performance at least comparable to today's existing systems.
With the
present
state
of
systems
performance prediction, the only credible demonstration is to actually construct such a system, and to evaluate i t in a real environment.
The point of this
paper,
then,
is to describe the set of problems which are being studied in the System R framework, to discuss the objectives of the system (which amounts to a description or definition of
the term operationally complete), and to describe the architecture of the system,
including overall structure, interfaces, and functional design. The System R project is not the f i r s t however, we know of complete capability. related
no other
implementation
hence data
the
relational
Other efforts have demonstrated f e a s i b i l i t y in various
problem areas.
these
of
projects
the
No concurrent sharing of data was permitted
control, locking, and recovery issues were greatly simplified.
INGRES project [4] at U.C. Berkeley is also single-user oriented. of
approach;
For example, both the IS/I system [2] and the Phase/O SEQUEL
prototype [3] were single-user systems. and
of
system which is r e a l l y aimed at an operationally
In addition,
The each
has an incomplete treatment of views, i . e . , of providing various
views of data to various users. The next section describes the overall goals of System R and describes capabilities
which we believe
the
list
to be necessary in an operational environment.
of The
following section describes the architecture of the system, and describes in overview terms i t s major interfaces and the components which support these interfaces SYSTEM OBJECTIVES System R is focused on f i v e main goals: I.
To provide a high l e v e l , non-procedural relational data interface.
2.
To provide the maximum possible data independence for
the
basic
data
objects
(base relations). 3.
To support derived relational views.
4.
To provide f a c i l i t i e s for data control consistent with the high level of the data interface.
5.
To discover
the
performance trade-offs
inherent
in
this
type of data base
capability. F i r s t , each of these goals w i l l be discussed and i l l u s t r a t e d . I. High Level Non-Procedural Relational Data Interface The trend toward higher level languages has long been evident in the programming
141 domain.
Set-oriented
data
Information Algebra [5].
sublanguages were introduced
in
1962 in the CODASYL
Codd's ALPHA language [6] and Relational Algebra [7] raised
the level of data sublanguages by letting the user specify the properties of the data required without describing the access Path or detailed sequence of operations to
be
used to obtain the data. This trend toward higher level non-procedural programming [8] is aimed at reducing the number of decisions the programmer must make in order to express his problem/solution, and at making the decisions more relevant to the solution (as opposed to being relevant to the programming of a specific computer). Halstead
has examined two programs solving
the
same problem using his software
physics techniques [9], one written in ALPHA and the other in DBTG-COBOLand for this case found that the ALPHA solution required 30 times fewer mental discriminations than the lower level solution This observation should be directly translatable into increased
programmer productivity and ease of maintenance.
is one strong reason for the goal of supporting
Thus, human productivity
a high-level,
non-procedural
data
interface. The other reason for moving in the direction of non-procedural interfaces is related to the optimization of the execution of the program. to
I f the data base were dedicated
a single application, its structure could be optimized for that application only,
and the application could be written in terms of that optimized structure. in
an integrated
inefficient.
data
Hence, the
application
on a data
applications.
base environment,
application intent optimization.
such local optimization is l i k e l y to be
system must i t s e l f
optimize
base whose structure
The non-procedural, and hence is
is
high-level easier
the
execution
for
rather
much mathematical
the
sophistication
better
system to
algebra
projection,
join,
introduces division,
a collection etc.)
relational results. The need to relational languages became apparent research groups [11,12].
which
of
each
on the aggregrate
have relational
reveals
the
use as a basis for
part
particular, the ALPHA language is based on the f i r s t order predicate relational
of
a compromise among the various
specification
The available relational languages (ALPHA, Relational Algebra) were very required
However,
formal
of the user. calculus.
and In The
operators (selection, operands and produce
discover more user-oriented, non-mathematical and is currently being pursued by several
The principal external interface of System R is called the Relational Data Interface (RDI), and provides relationally complete [7] f a c i l i t i e s for data manipulation, data definition, and data control. To support high-level, non-procedural~ set-oriented applications, the RDI contains the SEQUEL data sublanguage in its entirety. SEQUEL is documented in [I0].
142 Of course, not a l l requirements can best be met through a non-procedural approach and f o r this reason the RDI
contains
single-tuple-oriented
operators
(FETCH, INSERT,
DELETE, REPLACE, e t c . ) in addition to the set-oriented c a p a b i l i t i e s of SEQUEL. We have designed the RDI to be used in two modes: (a) D i r e c t l y by an application
program
(e.g.,
a
COBOL program)
which
uses RDI
operators to access the data base. (b) As the target of a t r a n s l a t o r program (a special case of an application
program)
which is emulating some other type of user interface. 2.
Data Independence
Date [13] has defined data independence as the immunity of applications to change storage structure and access strategy.
the a b i l i t y of a data base system to provide various logical views of the data for
example to make v i s i b l e only selected records of a f i l e ,
of each record. application
By view,informally we mean a
can
access
the
data
base.
relational
The
to
distinguish
window through
which
an
term "window" is used to imply that the
these two notions of data independence.
address the only f i r s t
base;
and selected a t t r i b u t e s
changes to the data base which a f f e c t the view are v i s i b l e to wish
in
Often, however, the notion is associated with
application.
We
In t h i s subsection we
notion of data independence; the second~ which
we call
the
support of derived views, is discussed in the next subsection. Typically,
data
management systems permit two levels of data d e f i n i t i o n .
The lower
l e v e l , or "schema", describes the p r i m i t i v e data objects being managed by the system. In System R, these p r i m i t i v e objects are called base relations.
The description of a
base r e l a t i o n includes the r e l a t i o n name, a t t r i b u t e names, description of
the
units
of each a t t r i b u t e , the domain of each a t t r i b u t e , the order of the a t t r i b u t e s within a r e l a t i o n , the order ( i f any) of the tuples within a r e l a t i o n , the
definition
of
a
base table
storage or available physical access paths to the data. has
a very
direct
etc.
In
particular,
does not include any information about physical However, each base r e l a t i o n
physical representation, i . e . , each tuple of the r e l a t i o n has a
stored representation.
Data independence implies
that
the
base
relation
can
be
supported by a v a r i e t y of physical structures and access strategies. Clearly
data
independence
is important i f a system is to allow growth and meet the
changing requirements of various applications. access structures. 3.
System R provides
a
rich
set
of
Any of these can be used to support a given base r e l a t i o n .
Support of Derived Views
The higher level of data independence consists of the a b i l i t y to define a l t e r n a t i v e views in terms of the p r i m i t i v e data objects. This notion appears in most
143 contemporary data management systems and the usefulness of such systems depends in large measure on the capability of the system to support derived views. The i n a b i l i t y to support views which d i f f e r from the primitive views often leads to programs which are complex, because they are warped to use views which are not natural but can be supported, and which require extensive maintenance as changes over time.
the
system
As an example of the usefulness of derived views, consider a data base containing the following
two
types
of
records:
CATALOG (PARTNO,DESC,PRICE) and
SALES
(SALENO,PARTNO,QSOLD). The CATALOG f i l e is ordered by part number, and gives the description and price of each part. The SALES f i l e is ordered by sale number, and gives the part number and quantity sold for each sale. Suppose we wish to print out all the SALES records for parts which have a price greater than $I000. We could write a program to scan through the CATALOG f i l e , finding parts $I000;
for
with
PRICE>
each such part, a separate scan could be made through the SALES table to
find all the corresponding records.
This program would
be highly
procedural;
it
would require repeated scanning of the SALES table, and would give the system l i t t l e opportunity to optimize the query by choosing among alternate access paths. However, i f our system permits the specification of derived views, the user might specify a view consisting of the join of the two f i l e s , as follows: SALES-CAT (SALENO,PARTNO, DESC,PRICE,QSOLD). The program could then consist of a single through
the
SALES-CATview.
the system f l e x i b i l i t y
to take
scan
Besides being easier to write, this program would give advantage
of
new access paths
which
may become
available (such as a PARTNOindex on the SALES f i l e ) without requiring changes in the program. A major goal of the System R project is to develop and investigate the technology derived views. studied:
This
problem has
three
of
distinct aspects, each of which is being
(a) Exactly what set of operations on derived views is supportable? As an example of this issue, imagine a request to delete a tuple from the SALES-CAT view described above. Since this view is a join of two underlying f i l e s , i t is not obvious what actions should be taken on the f i l e s to support the deletion. (Should we delete the SALES record but retain the CATALOG record?) For some kinds of view modification requests, there may be several possible actions which would produce the desired result; for other kinds of requests, there may be no possible supporting action. Codd [18] has described some examples of the l a t t e r phenomenon. (b) How should the view be bound to the available physical structures and access paths? This aspect of the binding problem concerns the optimization of the view and
144 accesses on scan, etc.
the
view in terms of available access paths, e.g., indexes~ sequential
(c) When should binding be performed?
For dynamic view d e f i n i t i o n , the binding must
also be dynamic.
In System R, we are investigating various binding-time
dynamic
w i l l occur for dynamically defined views but for certain often-used
binding
or very demanding views, the binding w i l l be done s t a t i c a l l y
with
strategies;
(hopefully)
an
increase in performance. 4.
Data Control F a c i l i t i e s
Data Control includes those aspects of a data base system which control the access to and
use
of data.
We distinguish four types of data control, each of which is being
investigated in System R. (a) Authorization.
This
form
almost a l l current systems.
of control is the most common type, being present in
Authorization is the mechanism to
permit
or
creation and manipulation of data structures and views by various users. System R may p o t e n t i a l l y be authorized selectively
grant
to
create
new tables
and
authorizations for his objects to other users.
deny the Any user of
views,
and
to
The authorization
mechanism of System R is described more f u l l y in [14]. (b) I n t e g r i t y .
I n t e g r i t y control provides a mechanism for enforcing that the data in
the data base obeys certain rules or predicates system.
which
have been declared
is l e f t to protocols imbedded in various application programs. types
of
control
facilities
are
provided:
integrity
I n t e g r i t y assertions are expressed in the SEQUEL language data
in
the
predicates. type
to
the
This form of control is t y p i c a l l y not found in current data base systems but
of
data
b a s e [15].
The
system
then
In System R, two main
assertions as
and triggers.
predicates
guarantees
the
Exactly when the system checks an assertion is a function
assertion
and
the
transaction
about
the
truth of these of
both
the
boundary which caused the assertion to be
checked. Triggers are actions that are invoked when some triggering detected.
For
example,
this
or
action
is
suppose that the DEPT r e l a t i o n contains an a t t r i b u t e NEMPS
which represents the number of employees in the department. of
condition
To maintain the v a l i d i t y
value~ we can declare triggers to update t h i s f i e l d whenever an employee is
hired, f i r e d , or transferred. (c) Consistency.
Integrity
implies
the
static
correctness
consistency is concerned with the dynamic correctness.
of the data base and
Suppose that one
application
program is t r a n s f e r r i n g a set of employees from Dept. 48 to Dept. 50, while simultaneously another application program is giving raises to a l l employees in Dept, 50. The interaction of these programs may have the undesirable r e s u l t that some but not a l l of the transferred employees receive the raise. E v e n worse, i f the transferring program encounters a f a i l u r e and backs out i t s updates, i t may develop
t45 that a raise has been given to In
current
systems
the
someone in Dept. 48.
application would contain specific statements (e.g., "LOCK
DEPT 50") to avoid these problems. defensive
A major goal of System R is
to
eliminate
coding which is not a part of the problem being solved but is related only
to the fact that the solution is running in a certain environment. cannot
know in
advance the
exact
environment
is
not
needed),
consistency. boundaries
the
system must
provide
The approach being pursued is to of
atomic unit. environment
Since
the
the
require
in
control that
this
case
user
define
the
a transaction, which is a sequence of statements to be executed as an The system then requests whatever resources i t needs
to
guaranteed
the
needed to enforce
the
guarantee
atomicity.
in
the
run-time
Furthermore, this same atomic unit is used as
the unit of i n t e g r i t y , i . e . , i n t e g r i t y may be suspended within a transaction is
user
in which his application w i l l run
(perhaps no other users are currently updating employee records; lock
such
at the transaction endpoints.
but
it
I f a transaction violates i n t e g r i t y at
i t s endpoint, then the transaction is backed out. (d) Recovery.
The fourth
aspect
of data control is concerned with preserving the
i n t e g r i t y of the data i f the system experiences a malfunction or backs
up either
voluntarily
if
an
application
or i n v o l u n t a r i l y , (e.g., as in the case of deadlock).
The recovery c a p a b i l i t i e s of System R include the usual checkpoint/restart as well
as
functions
the a b i l i t y to back up an ongoing transaction to user-specified points.
These c a p a b i l i t i e s are examples of functions which are required in order to
have an
operationally complete c a p a b i l i t y . ARCHITECTURE AND SYSTEM STRUCTURE We w i l l describe the overall architecture of Sytem R from two viewpoints. will
describe
description. a functional
the
system
as
seen by
Second, we w i l l investigate
a
single
i t s multi-user dimensions.
Figure 1 gives
programming language,
or
used to
directly
support various other interfaces.
The
Relational Storage Interface (RSI) is the access-method-like level which handles
the
access
a
we
view of the system including i t s major interfaces and components. The
RDI, as described previously, is the external interface which can be called from
First,
transaction, i . e . , a monolithic
to single tuples of base r e l a t i o n s .
This interface and i t s supporting system
(Relational Storage System - RSS) is actually a complete storage subsystem in that i t manages devices,
space
allocation,
storage buffers (one level s t o r e ) , transaction
consistency and locking, deadlock, backout, transaction recovery and Furthermore, i t maintains indexes on selected a t t r i b u t e s of base relations.
logging.
t46 r- -"i
!
r - --~
I ! !
I I I I I
t I
I
I
Relational Data Interface (RDI)
<-----
Relational Storage Interface (RSl)
I I I I I
Relational Storage System (RSS)
I
<___
I I l
I I
Programs to support various interfaces: Stand-alone SEQUEL, Query By Example, etc.
l
Relational Data System (RSS)
!
<---
I I I l
i
I I
Figure I Architecture of System R
With this brief description of the RSS f a c i l i t i e s , we can return to the RDI and its supporting system (Relational Data System - RDS). The major functions performed by the RDS are authorization, i n t e g r i t y enforcement, and nonprimitive view support which includes all the binding issues discussed previously. the
catalogs
of
external
In addition, the RDS maintains
names, since the RSS uses only system-generated internal
names. The RDS contains a sophisticated optimizer which chooses the best access path for
any given
request
from
among the paths supported by the RSS. The operating
system enviornment for this system is VM/370 [16]. Several extensions to this virtual machine capability have been made [17] in order to support the multi-user environment of System R. ACKNOWLEDGEMENT The authors wish to acknowledge many helpful discussions with E. Fo Codd, originator of the relational model of data, and with L. Y. Liu, manager of the Computer Science Department of the IBM Research Laboratory. We also wish to acknowledge the extensive contributions to System R of Paul L. Fehder, who has transferred to another location, and Raymond F. Boyce, who served as one of the project managers until his untimely death in June of 1974.
147 REFERENCES [ I]
E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM, June 1970.
[ 2]
M. G. Notley. The Peterlee IS/I System. Report UKSC-O018, March 1972.
[ 3]
M. M. Astrahan and D. D. Chamberlin.
IBM UK Scientific Center
Implementation of a
Structured English Query Language. Presented at ACM SIGMOD conference, San Jose, California, May 1975; to be published in Communications of the ACM, October 1975. [ 4]
G. D. Held, M. R. Stonebraker, and E. Wong. INGRES: A Relational Data Base System. Proc. AFIPS National Computer Computer Conference, Anaheim, California, May 1975.
[ 5]
CODASYL Development Committee.
An Information Algebra.
Communications of the ACM, April 1962.
[ 6]
[ 7]
E. F. Codd. A Data Base Sublanguage Founded on the Relational Calculus. Proc ACM SIGFIDET Workshop, San Diego, California, November 1971. E. F. Codd. Relational Completeness of Data Base Sublanguages. Courant Computer Science Symposia, Vol. 6: Prentice Hall, New York, 1971.
Data Base Systems.
[ 8]
B. M. Leavenworth. Nonprocedural Programming. IBM Research Report RC4968, IBM Research Center, Yorktown Heights, New York., August 1974.
[ 9]
M. H. Halstead.
Software Physics Comparison of a Sample Program
in DSL Alpha and COBOL. IBM Research Report RJI460, IBM Research Laboratory, San Jose, California, October 1974. [IO]
D. D. Chamberlin and R. F. Boyce, SEQUEL: A Structured English Query Language. Proc. ACM SIGFIDET Workshop, Ann Arbor, Michigan, May ]974.
[11]
N. McDonald and M. Stonebraker. Language. Proc.
CUPID: The Friendly Query
ACM Pacific Conf., San Francisco, California,
148
April 1975. Available from Boole and Babbage, 850 Stewart Drive, Sunnyvale, California 94086.
[12]
Mo M. Zloof. Query By Example° Proco AFIPS National Conference, Anaheim, California, May 1975.
[13]
C. J. Date. Wesley, 1975.
[14]
D.
D.
An
Chamberlin~
Authorization,
Introduction
J.
N.
and Locking
to
Data Base Systems. Addison
Gray~ and in
Computer
!.
L.
a Relational
Traiger.
Views,
Data Base System.
Proc. AFIPS National Computer Conference, Anaheim, California, May 1975. [15]
K. P. Eswaran and D. Do Chamberlin. a Subsystem for Data Base integrity.
Functional Specifications of IBM Research Report RJI601,
IBM Research Laboratory, San Jose, California, June 1975.
[16]
Introduction
to
VM/370.
IBM Publication
No. GC20-1800. !BM,
White Plains, New York. [17]
J. N. Gray and V. Natson. A Shared Segment and Inter-process Communication Facility for VM/370. IBM Research Report RJ1579, IBM Research Laboratory~ San Jose, California, February 1975.
[18]
E,
F.
Codd.
Recent
Investigations
in
Relational
Data Base
Systems. Proc. IFIPS Congress, Stockholm, Sweden, August 1974.
GEOGRAPHIC BASE FILES: Applications in the Integration and Extraction of Data from Diverse Sources Patrick E. Mantey, Eric D. Carlson, IBM Research Laboratory, San Jose, California Abstract This paper
addresses
the
development
of
integrated
municipal data bases, with
consideration given to p o l i t i c a l r e a l i t i e s and to the sources of data now available in
municipalities.
First,
the
potential users and potential uses of a municipal
data base are discussed, and an information system which would and
uses is considered.
serve
these
users
Next, the "current" status of data bases in municipalities
is reviewed and i t is concluded that there is a large quantity of data available many m u n i c i p a l i t i e s ,
but
that integrated data bases fer supporting an information
system are not usually a r e a l i t y . from
the
The problem of building an integrated
permits
base
integration
of
Geographic
Base File
A
(GBF)
the construction of extracted data f i l e s from these multiple sources
to support information system applications.
The concept
of
extraction,
for
the
diverse source f i l e s via geographic references, is developed (and a
prototype implementation is described in the Appendix). data
data
v a r i e t y of data sources presented by local agencies is then addressed.
central ingredient for an integrated data base is the which
in
Using the GBF, and
source
from various municipal functions, extracted data bases can be r a p i d l y b u i l t to
serve a v a r i e t y of applications of an
information
system
in
the
decision-making
situations of municipalities. I.
APPLICATIONS OF A MUNICIPAL DATA BASE
Municipal governments are, in essence, created to d e l i v e r services to a geographical area.
There
offered. into
is an unusual v a r i e t y , in comparison to private industry, of services
Local government is often structured (or fractured) along functional lines
special
districts,
as well as by geography.
Such structuring has precluded
concentration of power, but i t has increased the complexity
of
planning,
resource
a l l o c a t i o n , or management. Many of
the
routine.
Rather, they require
problems
in
municipal the
government
professional
require insight
decisions which are not and judgment
decision makers who consider the specific conditions of each problem.
of
human
Ideally, this
150
insight and judgment would be aided and guided by appropriate information derived from a comprehensive data base. This is the objective of a municipal information system:
to
facilitate
effective
analysis
supporting human decision makers with readily
usable
form.
data
resources
and analysis
functions
in
Because a municipality provides services to a geographical
area, much of the data relevant to decision agencies
and solution of specific problems by
making or
problem solving
in
local
w i l l have geographical attributes, and can be given spatial interpretation
via maps. A key attribute of a municipal information system is the
capability
for
displaying information in the form of maps. Another requirement for such systems to be effective is that they support ready use by decision makers who know very about computers.
objectives or decision c r i t e r i a for solving a problem. decision
little
The system must help the decision makers develop their precise
making requires
exploratory
analysis,
The solution process in such
selection
meaningful data presentation in an interactive environment, such c a p a b i l i t i e s ,
called
GADS (Geo-data Analysis
of relevant data, and A system which provides
and Display System) has been
developed and evaluated in several applications, such as police manpower allocation and analysis of urban development policies [ I - 4 ] . that interactive analysis
and display
The evaluations of GADS indicate
systems have a great
potential
in
the
operations, management, and planning of municipalities. As an example,
consider
a municipality which maintains a computerized property
information f i l e (via the tax assessor function).
Such a f i l e would have data
on
each parcel, possibly including: address owner zoning improvements date constructed type construction size (area) current use area centroid assessed value If
this
data were accessed via an interactive information system, a decision maker
could readily obtain, for example, the address and assessed value of a l l
residences
constructed between 1960 and 1962 and having floor area between 1600 and 1800 square feet, on lots with 6800 to 7200 square feet. real-estate
appraiser,
this
information
I f the in
user
tabular
of
form
the
system were a
might
be of value in
determining i f a particular home is f a i r l y appraised (Figure I ) . the
distribution
A histogram giving
of assessed value of these homes would provide additional insight
151 (Figure 2) and a map r e l a t i n g the average value of such homes in the c i t y ' s planning areas to
the
city-wide
average
(Figure
3)
would
provide
the
appraiser
with
information in a spatial framework. If
the
the
user
of the information system were an assessor concerned with determining
neighborhoods
computer-aided
which
could
appraisal,
be
considered
additional
data
equivalent
would
for
purposes
be required.
of
I f recent sales
records are the basis f o r c a l i b r a t i n g the assessment model, i t may be found that the assessor's fitting
data alone cannot be used to model variations in s e l l i n g price of houses
the description above.
selling
Showing on a map the
mean and variance
of
the
price of such houses by neighborhood w i l l o f f e r the assessor a visual means
for examining the q u a l i t y of the
assessment model.
The display
may cause
the
assessor to consider other factors to explain the v a r i a t i o n s ; e.g. crime rate, level of public f a c i l i t i e s and services (such as the influence
of
an
adjacent
regional
park [ 5 ] ) or the influence of other near-by land uses. In
making
decisions
related
to
residential
zonings,
pertinent
questions
and
information displays would relate to adjacent land use, the e f f e c t that the proposed development would have on the mix of housing stock available in the community, or to the e f f e c t these new residents would make on the area.
per-capita
park
acerage
in
the
The school o f f i c i a l s (and in some areas the local zoning authority) need to
evaluate the impact such a development w i l l have on the existing school Each of
these
questions,
facilities.
and many others, are of i n t e r e s t to d i f f e r e n t decision
makers involved in the area of community development. An example of the use of a municipal data base in resource allocation to
finding
the
neighborhood. relating
to
best
approach
to
One group may advocate these
burglaries
If
the
area
is
reduction
better
show they
probably offers greater promise. examined.
the
street
are
lighting.
day-time
Certainly the level of
found
to
base contains
police
to
school
roam about
the
queries
patrols
would
be
then
some suspicion
I f a large number of burglaries occur on school days, i t
off
However,
if
the
information so that i t can be determined that a school
adjacent to the neighborhood has f l e x i b l e scheduling and that free
If
have a high percentage of two wage-earner
may appear that the school-age children are not the perpetrators. data
relate
crimes, another approach
f a m i l i e s , and the majority of burglaries are during the week, w i l l f a l l on the children.
could
of burglaries in a residential
school
school
children
are
grounds, the source of the trouble may have been
identified. As a l a s t example, consider the application f o r a building permit f o r an service
station.
municipalities institutions
are are
With
the
recent
reluctant probably
to
wary
wave of
grant about
such loaning
service permits.
automobile
station abandonments, many Similarly,
money for such ventures.
financial For a l l
152 and
concerned,
for
the
public
interest
good, would
a careful include
analysis of such a proposal is
required.
Factors of
location
and
number of
existing
stations,
t r a f f i c access and t r a f f i c patterns at the proposed s i t e , and an estimate
of the automobile ownership and disposable income in the surrounding areas. In these examples i t has been assumed that the information system has comprehensive municipal data base,
access
to
However, such data bases are a r a r i t y today.
the next section the current status of municipal data bases
is
discussed,
and
a
In in
Section I I I an approach toward the provision of integrated data is offered. I I . CURRENT STATUS OF MUNICIPAL DATA BASES The importance of a comprehensive integrated data base to support decision-making in municipal
government has been widely recognized.
There have been several d i f f e r e n t
approaches taken to the development of comprehensive municipal data bases. One approach,
which
comprehensive collection
"data
bank".
This
census~ and
often
transportation
and
or
comprehensive
was popular
in
the
data
1960's,
bank was
was carried land
was the
use
development
of
a
usually generated as a special
out
and
planning
funded
study.
studies, detailed f i e l d surveys of land use (at the parcel
as
part
of
a
As a part of such
level)
were
conducted,
and survey data also was gathered on employment, income d i s t r i b u t i o n , and commercial a c t i v i t y (Figure 4). resources,
and
The data acquisition consumed a major
these
data
banks,
their
value
was s h o r t - l i v e d
and extending these data banks.
becoming a tool computers
the
Although accurate data was often gathered in the
snap-shot of the state of a very dynamic system, and updating
of
in
the
operations
by municipalities
of
beginning
development
because they were at best a
no means were
provided
routine
transactions.
in
governments.
the
The
1960's
computer
functions which have previously been computerized accounting,
billing,
budget
status
The application
is in
in
operations
of
law
of
can be characterized as a involving
generally u t i l i z e d in those private
industry:
payroll,
reporting, personnel records, etc.
Also, the
1960's saw wide-spread use of computers in the processes associated and
for
At about the same times the computer was local
function-by-function approach, with data processing introduced into tasks high-volume
study
did not provide any information r e l a t i n g to many municipal services
(e.g. public s a f e t y ) . of
portion
enforcement agencies.
isolated from each other~ and no attempts
were
with
elections
Usually these applications were made to
make t h i s
information
available for use by other municipal functions° The
real
property assessment function of local government in the l a t e 1960's began
to recognize the potential of computer applications.
The court decisions in various
locales requiring property to be appraised at current s i g n i f i c a n t l y increased work load on assessment o f f i c i a l s .
market value placed a In numerous areas, the
assessor has turned to computer-aided appraisal to meet these demands.
If
a model
153 is
to be b u i l t and calibrated to r e l i a b l y estimate fair-market value of residential
properties, a comprehensive real property data base is a must. were constructed,
to
Someproperty data bases,
Computerized real property
the f i e l d acquisition of very comprehensive data.
systems required
Note the detailed land
but
and
used in this system. There was a significant increase in the
amount of data used and required by the introduction of this computerized system,
the
A work sheet for f i e l d surveys by
appraisers in a California county is shown in Figure 5. attributes
besides
assessor map, book and page and to situs address, also added
geographic data such as a "centroid".
building
bases
often by computerizing the data contained on the assessor's f i l e
cards which were maintained on each parcel. usual references
These data
appraisal
much of the data on this sheet would not change from year-to-year, and
the appraiser in the f i e l d would only need to correct those
data
items
which had
changed. An additional selling price
requirement for computer-aided appraisal is data relating to current of
residential
properties.
This
data
provides
the
calibration
information for the regression models, and is available, depending upon the state or local laws, from the registrar of deeds, from the collector of
transfer
taxes,
or
from t i t l e companies. Assessors in some locales obtain this data by questionnaires which buyers are required by law to complete and return (Figure 6). and/or
others
the property.
also
These sources
can be used to obtain financial data (e.g. mortgage terms) for
Clearly this data base is an integral part of a municipal
data
base
for applications such as i l l u s t r a t e d by the examples in Section I. Another approach related to the development of municipal data bases is characterized by the USAC projects [6], p a r t i c u l a r l y Wichita
Falls,
Texas.
These c i t i e s
those
of
Charlotte,
North
Carolina,
and
were funded by the Federal USAC project to
build Integrated Municipal Information Systems, (IMIS).
The concepts
of
IMIS are
[6]: "(1) Integrated data processing systems should i n t e r - r e l a t e municipal processes. (2) A fundamental analysis of municipal operations and i d e n t i f i c a t i o n of related data processing components is a precondition to the effective use of computers. (3) A systems approach is required throughout the development process. (4) The automation of municipal operations must exploit the f u l l range of computer technology. (5) Automation of routine municipal processes is a fundamental condition to the realization of an IMIS. (6) An IMIS views the municipality as a basic building block for intergovernmental information systems. (7) Municipal information systems are by-products of computer-driven, operationsbased systems.
154 (8) Adequately designed data processing systems can be transferred from one municipality to another. (9) The integrated approach to municipal systems development must proceed on the basis of a plan within which incremental i n s t a l l a t i o n may be achieved in accordance with the p r i o r i t i e s and resources of any p a r t i c u l a r c i t y . " The USAC e f f o r t s involve c i t y governments, and were the consequence of studies as
the
IBM/New Haven project
[7] and the USC/Burbank project [ 8 ] .
sought to develop a methodology, via a "systems approach", for computers
by municipalities
[6]
but
did
the
such
These groups
application
of
not r e s u l t in system implementation of
integrated municipal information systems. The USAC approach wisely focused on operational sources to provide the current required
f o r municipal decision-making.
data
In the implementations, which are s t i l l
in
progress, the c i t i e s have concentrated on building up operational uses of computers, and
on
implementing
these
applications on a central computer under an integrated
data-base management system. has
been confirmed
[9]
but
decision making are s t i l l overcome
in
providing
The value of computers to these operational the
functions
applications of IMIS in the areas of management
to be demonstrated.
One of the d i f f i c u l t i e s that must
be
a comprehensive municipal data base, (even in ci~ies with a
f u l l y integrated and operational IMIS constructed according to the USAC philosophy), is
that
complete
i n t e g r a t i o n , where a l l municipal functions use the same computer
and data base management system, governmental
structure
is
and with
not
a
likely
prospect
with
current
example the data pertinent to decision making in a c i t y may be gathered agency,
such as
the
tax
assessor
by
For
another
(and conversely), and may reside on d i f f e r e n t
computers, under d i f f e r e n t data management schemes and in In
local
the limited resources of local governments.
different
file
formats.
addition, problems of data security, c o m p a t i b i l i t y of f i l e s , and high processing
costs may make complete integration u n r e a l i s t i c for many m u n i c i p a l i t i e s . Special data c o l l e c t i o n s , such as the U.S. Census, and data sources,
available
must also be r e a d i l y incorporated into a municipal data base.
data gathered according to blocks, block groups and census
tracts,
with
from
state
With census assessors
property data coded according to assessor map, book and page, with public works data in state-plane coordinates, and school data gathered by school attendance area, building
and maintenance
of
a truly
integrated
the
municipal data base presents a
formidable task. I I I . APPROACHES PERMITTING DEVELOPMENT OF INTEGRATED FILES A completely integrated data base would have a l l data r e l a t i n g to any functions of a municipality residing on the same computer system, under the system
and
organized
and
indexed
to
same data
f a c i l i t a t e correlation.
management
This ideal is not
155 attainable,
given
present
most municipalities. functions
are
organizational structures and computing capabilities in
However, i f
"properly
such
an
computerized
structured",
benefits as i f there existed addition,
the
it
will
a completely
approach w i l l
files
various
be possible
integrated
not
of
require
to achieve the same
municipal
data
base.
re-implementation
applications, but rather leaves the application data base in
municipal
the
of
In
current
control
of
the
function responsible for i t s primary maintenance and use. The approach taken
is
to
make use of
data,
when data
structured", to develop e f f e c t i v e l y the results as i f integrated
data
are "properly
existed
a completely
base without requiring that complete integration take place.
should not be construed as an argument against politically
there
files
and technically
possible,
integration.
provides
required
economically a t t r a c t i v e , i t should be implemented. Even with
If
This
integration
data
is
security, and is
an integrated
data
base, there w i l l always be decisions which require different groupings of data than those supported by the integrated data base. (There w i l l also remain, in data
sources
which cannot be integrated.)
comprehensive data base from multiple
data
So,
practice,
the
problem of
providing
sources is
unavoidable
and is
a not
completely solved by an "integrated" data base. The
"proper
structuring"
i l l u s t r a t e d by example. wishes
to
relate
required
to
make data
integration
possible
I f one is interested in information about burglaries,
this
files. in
each beat
for
terms
each day.
of
police If
and tax
no small
area
beats,
e.g.
the
number of
one wishes to use census data for
socio-economic information, and i f the census tracts boundaries,
and
Suppose the police dispatch data is used for burglary incidence,
and that such data is available in burglaries
be
information to neighborhood conditions, data sources could
include police dispatch f i l e s , criminal justice arrest f i l e s , census data assessor
can
and beats
have few common
information is obtainable relating these data sources.
A l t e r n a t i v e l y , i f the police dispatch data is captured by the street address of
the
c a l l , and i f a directory exists for the c i t y which w i l l permit i d e n t i f i c a t i o n of the census tract for each street address, then burglaries and socio-economic data can be related at the census tract level. "Proper
structuring"
of the data ( i . e . offer
data
to
of data f i l e s only has meaning with respect to potential uses
data f i l e s are not an end in themselves). support
decision making in a wide range of problem areas, then the
data f i l e s must be as detailed as possible, within privacy
and security.
I f the objective is to
the
constraints
of
economics,
The detailed data can then make possible the development of
the widest variety of data subsets and aggregations, and is more l i k e l y
to
permit
development of the required set of integrated data for a particular decision-making context.
An additional requirement is the existence of data elements in
which w i l l
facilitate
relating
the
data to that from different f i l e s .
each f i l e (In this
156 paper,
geographical
references
function in municipal f i l e s . or
personnel
identifiers
will
be singled out as data elements serving this
Commonreferences to account numbers, project are
other
examples
r e l a t i n g of data from d i f f e r e n t source f i l e s . ) "properly
structured"
if
of
data
A set
numbers
elements permitting
of
files
will
the
be called
they contain information permitting the r e l a t i n g of data
from d i f f e r e n t source f i l e s so that integrated subsets of data
at
the
appropriate
level of detail can be developed to support the requirements of problem solvers. Because municipal
government
is a service delivery function, mutual references to
geography can often be used to relate data municipalities. these
A powerful
common geographical
file
in
references,
from
the
diverse
files
available
in
f a c i l i t a t i n g the r e l a t i n g of data, based on is
a
Geographic
Base
File
(GBF).
Functionally, the GBF contains data to support the r e l a t i n g of data from other f i l e s to geographical location and also the display of t h i s data on a map. of
The creation
a GBF for a municipality is a key requirement in the development of a municipal
data base from source f i l e s .
Several d i f f e r e n t approaches have been taken.
The simplest GBF is a f i l e sometimes called a Property Location
Index
(PLI)
which
contains a l i s t of the v a l i d addresses in the municipality and an x,y coordinate for each.
This approach is the one used in Lane County, Oregon, and by the Assessor
Santa
Clara
County , C a l i f o r n i a .
and s t r e e t intersections and t h e i r x,y coordinates is appended. is
then
possible
to
I f the GBF also
contains
the
police
beat,
census
and municipality for each address, then i t is very simple computationally to
count the number of c a l l s in each beat. also
With such a GBF i t
automatically convert addresses (in the police c a l l f i l e for
example) to x , y coordinates. tract,
in
To make t h i s more useful, a l i s t of public place
permit
consideration
of
Evaluation of c a l l s by census t r a c t
would
socio-economic data with the crime data (of course,
police o f f i c e r s could also encode c a l l s by beat and census t r a c t , but this
approach
is l i a b l e to s i g n i f i c a n t errors and seems to be a poor use of police manpower). The
most
detailed
GBF's contain
locations, building outlines, u t i l i t y along
with
land
parcel
boundaries,
placements, and even topographic
easement
information,
s t r e e t address information on a l l parcels and names of a l l public lands
and buildings. suitable
digitized
for
This GBF is at the level
of
detail
of
surveyor's
engineering applications and detailed map building.
National Capital Commission [ I I ]
data,
and
is
Ottawa, Canada's
has pioneered in the development of
this
kind
of
GBF. The most
common GBF at the present time is the r e s u l t of work by the U.S. Census
Bureau in conjunction with the 1970 Census. massive
feature Independent
Map Series,
a
labeling and d i g i t i z a t i o n was performed for 200 major metropolitan
areas in the United States. (Dual
Using the Metropolitan
The resulting computerized maps were
Map Encoding) f i l e s [12].
called
the
DIME
Each entry (record) in the DIME f i l e
157 represents
a line
limit, etc.). is
segment (a
portion of a street segment, r a i l r o a d , creek, c i t y
Figure 7 shows a sample record and the map data from which the record
derived.
The segment has a "From" node and a "To" node, as well as a Left and
Right side.
Thus, each entry describes
description
of
and
two
ends and
its
two
The nodes are labelled as f a l l i n g on a
particular
given a sequential number within map and census t r a c t .
identified
describes
the
adjacent land.
The other data
the
street
segment sides are also given.
name and low address
(features
The description
of
a side
without
High and low
address
ranges
The records are ordered by feature
addresses
have
no secondary
ordering).
overlays (e.g. beats, census t r a c t s ) can be r e a d i l y defined in terms
Administrative
of segments of this f i l e . these
Each feature
The census t r a c t number, the block number,
and the place ( c i t y ) code are included for each side. for
the
by a p r e f i x , name, s u f f i x , type (e.g., North Army Southwest Street);
only name and type are required (e.g., Coyote Creek), actually
The
map of
on each entry is feature identif#cation f o r the segment and i t s sides. is
sides.
the nodes includes x,y coordinates, produced by the Census Bureau's
map d i g i t i z a t i o n . series,
its
computerized
Used in combination
with
"point-in
polygon
routines",
overlays f a c i l i t a t e development of counts of events in areas of
any specified overlay map. As in the development of any large machine readable f i l e , errors,
and poor
high startup
standardization have hindered development of GBF's.
costs,
data
But the key
problem in the development and use of a GBF is editing (corrections and additions).* Because of
the
startup cost, accuracy, and standardization problems, editing is a
key aspect of development. and
I t is p a r t i c u l a r l y important to v e r i f y
coordinate accuracy of the f i l e .
the
topological
Even i f there were no developmental problems,
"geographic" changes, such as new streets or changing
area
boundaries,
make f i l e
editing essential to a useful GBF. The Census Bureau and related e f f o r t s have produced programs for o f f - l i n e creation and batch editing of a DIME f i l e [12-13]. data
entry
and take
large
amounts of
Although the procedures were used to editing,
and
hence l i t t l e
use,
These programs require computer
create
of
these
200 GBF's, files,
a digitizer
there
has
been l i t t l e
Some c i t i e s (e.g., [14]) have
developed t h e i r own GBF's s i m i l a r to DIME. These e f f o r t s are also characterized the
use
of
a digitizer
and
batch
cumbersome f i l e editing procedures. on-line
computer
There
by
programs for f i l e creation, and by
have been a few
efforts
to
develop
d i g i t i z a t i o n systems, (e.g., [15], and there are experimental systems which
could support on-line d i g i t i z a t i o n with visual feedback [16-17]. systems
for
and c l e r i c a l time for editing.
provide
all
of
the
Yet, none of these
c a p a b i l i t i e s required for e f f e c t i v e GBF creation and
editing. *The Census Bureau uses the word "editing" to mean topological v e r i f i c a t i o n , and uses "update" for what we define as editing in t h i s paper.
158 Conclusions drawn from the IBM study [ I 0 ] regarding the requirements f o r i n t e r a c t i v e GBF editing and maintenance were: I.
There must be a c a p a b i l i t y for projecting hard copy maps and/or photographs onto the
display
screen.
It
must
be
possible
to select a r b i t r a r y (contiguous)
sections of the maps, and to produce a range of scales. 2.
The display
system must be able to handle m u l t i p l e , non-rectangular geographic
3.
The
4.
The display system must enable selection of any addressable point on the screen,
coordinate systems. display
system
must be able to produce both t e x t and l i n e s , with at least
three colors for lines (in order to be able to distinguish two maps). whether or not anything is displayed at that point. 5.
The creation
and editing
functions
must
include:
d i g i t i z a t i o n of base and
overlay maps; labeling of points, l i n e s , and polygons in the deleting
points
maps; moving
and
and l i n e s ; display of any section of the maps, and of specific
points, lines and polygons; and checking for topological accuracy. IV. DATA EXTRACTION A.
Philosophy and Operation
In
the
previous
section,
the combination of a GBF and properly structured source
f i l e s containing geographic references were i d e n t i f i e d as the basis f o r problem
solver
an e f f e c t i v e l y
integrated
data
offering
a
base to support decision making.
Recent studies of i n t e r a c t i v e information systems applications in
the
solution
of
unstructured problems [4,18,19] have i d e n t i f i e d the need f o r reduced subsets of data for supporting the problem solving.
Data reduction is required because:
a.
the p o t e n t i a l l y useful data base w i l l be much larger
b.
used, the user w i l l want access to varying levels of d e t a i l in the data base,
c.
the relevant subset of data w i l l vary during the problem-solving process,
d.
some data (e.g, census
and
event
data
such as
than
police
the
data
calls)
actually
may not
be
compatible at the detail level of the data captured in the source f i l e s . Extraction
is a process by which an integrated subset of data is developed from the
source f i l e s relevant to a p a r t i c u l a r problem-solving application. provides
the
user
with
a capability
integrated data base, without requiring the development of such an base
at
the
detail
level
of
the
Extraction
thus
e f f e c t i v e l y indistinguishable from a f u l l y
source
files,
integrated
data
i . e . i t provides a " v i r t u a l l y "
integrated data base. The extraction approach builds a data base subset from the source f i l e s according to a p r i o r i specifications for a p a r t i c u l a r source
files,
and dynamic
application.
Total
integration
of
the
aggregation and subsetting of the data at the time the
159
items are required is of course an a l t e r n a t i v e approach.
data
This approach is not
a t t r a c t i v e in today's environment because: a.
f o r any a p p l i c a t i o n a l l the relevant source f i l e s would have to
be
on-line
to
support conversational interaction, b.
protection of the source f i l e s would be more d i f f i c u l t ,
c.
development of
conversational
standardized d a t a structures
information
systems would require additional
and codes for
the
dynamic aggregation
and
subsetting, d.
better conversational performance is possible when the problem solving accesses a smaller data base.
Clearly the
development of
a fully-integrated, on-line data base from the source
f i l e s , solely for problem-solving applications, is not (currently) economical. Such an approach would also require special procedures for keeping the duplicate records current and consistent. be relevant to
the
With the extraction approach, the subset of data thought to particular
problem is
developed and made accessable to the
problem-solving system in an extracted data base. The subset is an extract from the available source f i l e s
at
the
(that phase of) problem solving. set of tables. source f i l e s . (e.g.
level of detail desired by the decision maker for This extracted data base may be thought of
For each variable there is one value in the table for each basic unit
zone, account, employee) used for the problem solving.
New variables can be
added directly to the extracted data base as an added column of example of
as a
Each table contains values for a set of variables extracted from the
the
tables.
The extracted data tables are formed from: source f i l e s containing lO years' on
crimes,
An
an extracted data base is shown in Figure 8, for use in crime analysis. land
use,
and
population;
a special
purpose map of
data police
beat-building-blocks (basic zones); and an extraction specification for computing 20 crime categories and selecting population and number of houses by year.
The result
is lO tables (one for each year) giving crime by category, population, and number of houses for each basic zone. The extraction approach leaves control of the operational source f i l e s in the hands of the originating application. current
at
the
time of
The extracted data bases are "snapshots" which are
their development. The problem solver can re-invoke the
extraction process at any time to get a more current
extracted d a t a base.
This
process decouples the data base used in problem solving from the operational f i l e s , and assures the problem solver that the data base upon which he makes decisions under his control.
is
This user control of the extracted data base, and the potential
performance advantages offered by access to
the
smaller extracted data set
as
compared to access to the total set of data, make the extraction approach attractive even in installations where an integrated data base exists (as with a complete IMIS as in the USAC approach described in Section I I ) .
Extraction is simplified with the
160 existence
of
an
integrated data base, because there are then no d i f f i c u l t i e s with
f i l e formats and data conversion. B.
Extraction System Architecture
The architecture
of
a municipal
information
system
designed
extraction
philosophy
Figure 9).
The f i r s t set would be the source data f i l e s and
data
entry,
the
data
would have three major sets of programs and data bases (e.g.
update, and other routine processing.
structured" as defined e a r l i e r in this paper. files
using
related
programs
for
These f i l e s should be "properly
The data base management f o r
these
may be an integrated system, such as IBM's IMS, or a more t r a d i t i o n a l system
such as those provided reference
files
by
IBM's
(indices),
DOS.
such as
The the
second component includes Geographic
Base F i l e ,
accurate
Programs for
maintaining these f i l e s , and programs for providing the data extraction functions of data
matching,
subsetting,
and
aggregation.
integrating the data base of source f i l e s . possible
to
develop
general
purpose
than
through
the
complicated
the
interface
the
key
to
The GADS experience indicates that i t is through
user-invoked
processing,
data structures and accompanying processing
overhead often found in integrated data base systems. are
component is
programs for the data extraction functions.
Essentially, these functions provide integration rather
This
The data extraction
programs
between the municipal data base and the t h i r d component of the
architecture, the extracted data bases and associated decision support system.
The
GADS analysis and display functions are an example of a dec%ion support system for non-programmer users. data
bases
systems. budget etc.,
for
A data extraction interface can
provide
multiple
For example there might be decision support systems for preparation,
all
extracted
a single decision support system, or for multiple decision support urban
supported
by
planning,
cash management,
computer-assisted appraisal, crime analysis,
a common extraction
interface.
The data
management
techniques f o r the extracted data bases should be t a i l o r e d f o r each decision support system.
However, the data access techniques may be the same as those
provided
for
the source data f i l e s . The d e t a i l s of the data extraction architecture and the implementation requirements are beyond the scope of t h i s paper and there w i l l be i n s t a l l a t i o n - s p e c i f i c comments. (An
extraction
implementation
is
briefly
described in the Appendix).
however, one general requirement f o r any data extraction system. pertains
to
the
data
municipal
kinds of extracted data to be developed from these sources.
is l i m i t e d to data which can be
related
networks,
numbers,
fashion.
budget
items,
requirement
aggregation functions of extraction and can be described by
considering examples of the data sources encountered in the
This
There i s ,
part
to
points
etc.,
or
should
areas.
governments
and
Consideration here Data
related
to
be handled in an analogous
t61 I,
Compatible data
This
is
the
easiest,
and fortunately
captured as specified in Section I I I . identified
with
the
most frequent s i t u a t i o n , i f data are
The data
in
source
files
geographic points (x,y) can be d i r e c t l y related.
which
can
be
I f the extracted
data base is to be relevant to a study of slum dwellings, f o r example, and i f health cases,
fire
alarms
and
building
code v i o l a t i o n s are a l l data sources which are
available at the event l e v e l , ( i . e . by address) then an extracted data table showing incidence
of
each of these events for specified address can be d i r e c t l y developed,
Another frequently used extracted data base is the tabulation of such event data geographical
area, in terms of a specified map. (The extracted data base in Figure
8 is an example of t h i s ) . matching
by
Extracted data
bases in
such cases
are
obtained
by
coordinates of events to the corresponding map areas (via point-in-polygon
processing of the event
coordinates
against
the
map boundaries
specification).
Figure lO i l l u s t r a t e s the extraction and aggregation to relate property (assessment) data and census data to support inquiries at
compatible
levels
(e.g,)
blocks
or
block groups), and f u r t h e r aggregation to support t r a n s i t planning models. 2.
Non-compatible area data
If
data
is
available
by
areas
in
the
source
files,
and these areas are not
compatible ( i . e . one map is not a subset of the other), then the extraction is
more complicated.
For a chosen set of variables from the source f i l e s , there is
a minimum level of aggregation at which an extracted data example,
school
attendance
areas
base is
possible.
For
and police beats (and therefore the associated
data) may only be compatible at the census (different)
process
tract
f i n e r p a r t i t i o n s of census t r a c t s .
level,
i,e.
they
may both
be
The extraction process should a l e r t
the user to the non-compatibility and display f o r the
user
the
minimum level
of
aggregation necessary f o r compatibility of the data sources of i n t e r e s t , in the form of a map, and permit the user to desired. If
the
user
desires
specify
an extracted
data
further
aggregation
base at
from
this
map as
a d e t a i l level f i n e r than is
compatible with the data sources given, the user must supply additional information. For
example,
suppose the
user is studying property values vs age d i s t r i b u t i o n of
inhabitants, with the age data on citizens available from the census only at tract
levels
of aggregation.
census
Compatability exists at the census t r a c t l e v e l .
f i n e r detailed extracted data base, at the c i t y block level for example, could
Any only
be developed i f the user is w i l l i n g to make assumptions (such as homogenity of the d i s t r i b u t i o n of population ages in the census t r a c t ) . V.
SUMMARYAND CONCLUSIONS
The development
of
information
for
decision-making
in
municipalities requires
integration of data from the various operational f i l e s which are generated in
local
162 government.
E v e n when an
possible to develop conjunction
with
integrated
integrated
data
a well-maintained
municipal
from
properly
data base does not e x i s t , i t is structured
Geographic Base F i l e .
source
files
in
The current sources of
information developed in m u n i c i p a l i t i e s , in p a r t i c u l a r the property data of the assessor
function
and
the
tax
operating f i l e s of various service d e l i v e r y functions,
provide a rich source of information, augmented by special collections such as
the
U.S. Census. Data
Extraction
source f i l e s to interface
to
is
the process of developing integrated data subsets from diverse
support large
interactive
data
bases of
subsetting and aggregation functions. extraction
is
useful
when the
problem
solving.
source
files
Extraction
provides
and provides data description,
Our experience with GADS has shown that
with a decision support system.
working
on unstructured problems.
on
a variety
of
computer
professional
The data extraction interface matches the
functional and response time requirements of i n t e r a c t i v e decision implemented
response)
These characteristics are l i k e l y to be
encountered when designing decision support systems for nonprogrammer, users
data
user or problem characteristics require access to
varying amounts, d e t a i l , and selection of data, and conversational (rapid interaction
the
support,
can
be
system configurations, and can reduce the
operating costs of the decision support system. Because data extraction operations can produce multiple extracted data different
structures~
a
decision support systems.
single
data
extraction
interface
In addition, existing decision
bases,
with
can support multiple
support
systems
supported and enhanced by data extraction without major program revisions.
can
be
163 APPENDIX: An Extraction System Implementation A project
in
the
IBM Research Division
has
Analysis and Display System (GADS as a vehicle solving
[I-4].
GADS supports
developed an i n t e r a c t i v e Geo-data for
studying
interactive
where the relevant data can be related to a geographic location. problems
for
which GADS has been used include:
was recognized
during
the
first
studies
police
The need for
of
p a r t i c u l a r , the need for data aggregated to a v a r i e t y of block,
Examples of
the
land use planning, police manpower
a l l o c a t i o n , school d i s t r i c t i n g , and commercial s i t e location. extraction
problem
nonprogrammer users solving unstructured problems
the
data
use of GADS. In
geographic
levels
(e.g.,
beat, census t r a c t , neighborhood), and changing data needs expressed
by users indicated the inadequacies of the s t a t i c , special purpose data base and the one-level, integrated data base approaches. GADS data extraction is configured e s s e n t i a l l y as shown in Figure 9.
The extraction
implemented in GADS is limited to compatible event data.
a requirement
There
is
that each record of each f i l e in the large data base contain a geographic code (such as an address, x,y coordinates or block related
to
number) so
that
extracted
points, l i n e s , and polygons on a map. A u t i l i t y
data
can
transform geographic codes into x,y coordinates i f necessary for data extraction display. Figure
A data 8.
be
program is provided to or
base developed by extraction is a table; an example is shown on
Adding
another
crime
type,
acres
of
commercial
land
use,
or
re-aggregating by census tracts would take only a few minutes. In
the
GADS implementation
the
large
data
base management system is a special
purpose one designed to handle fixed format f i l e s with no hierarchies groups.
Simultaneous
are not supported. from
the
access
fixed
representations
files.
binary, can
there is a u t i l i t y
to multiple f i l e s , and shared access to single f i l e s
be
Sequential
and d i r e c t
packed decimal, used.
access
and f l o a t i n g
description are
are
allows
capabilities. lla).
The subsetting language includes constructs f o r :
subsetting creating
of
Results from subsetting can be displayed as l i s t s (Figure l l c ) or as
on a map (Figure l l d ) .
subsetting is possible. select
Seven data
conditional subsetting or creation (IF, THEN, ELSE), and function c a l l s
(Figure l l b . ) .
facility
The
d i f f e r e n t formats to be used for the same
based on any arithmetic or logical combination of the items in a f i l e ,
locations
data
The entire large data base is stored on disk, and
implementation
allowed.
new items,
provided.
(binary)
for loading f i l e s from tapes.
f i l e or the same formats to be used for d i f f e r e n t f i l e s (Figure types
I/0
point
Figure I I gives examples of the data description and subsetting data
repeating
Multiple f i l e extractions are handled by consecutive extractions
individual
Character,
or
only is
those
Using the display c a p a b i l i t i e s , two dimensional
That i s , the user can draw a polygon
on the
screen,
and
elements of a f i l e whose location is within that polygon. This
much more user-oriented than algebraic specifications for subsetting,
164 and other graphic subsetting operators would be useful (e.g., display a l l the crimes of the same type as the one being pointed a t ) . The aggregation operations in the extracted
implementation
are
restricted
data base f o r the GADS analysis and display functions.
aggregated by areas of a map.
to
forming
I t is stored on disk, and is accessed by column name.
GADS is implemented in FORTRAN, but the data extraction components were in
PL/I
The combination system runs on the
12OK.
The
Separating extraction reduces the main storage requirement
or
S/370
limiting
about
the
data
rate
to
the
terminal
are
The the
facotrs in extraction response times ( i . e . selection and aggregation times
are negligible compared to I/O times). or
to
user terminals may be IBM 2250s or storage tube display terminals.
I/O time from the large data base, and
display
an
entire f i l e ,
Although f i v e minutes
extraction.
may be required
to
the user can see the results unfolding (e.g. the
selected items are l i s t e d as they are selected). during
IBM S/360
under the Time Sharing Option (TSO). The combination requires 220K bytes of
main storage.
list
implemented
because of i t s larger set of data types, and better functional s u i t a b i l i t y
for the extraction tasks. series
the
This data base is
Thus users seem w i l l i n g
to
wait
After a l l , the batch mode equivalent c a p a b i l i t i e s have response
times of days, and manual methods have response times of weeks or months.
165 REFERENCES [ I]
P.E. Mantey, J. L. Bennett, E. D. Carlson, Information for Problem Solving: The Development of an Interactive Geographic Information System. IEEE Int. Conf. on Communication, Vol. I I . Seattle, Wash. June 1973.
[ 2]
E. Jo Cristiani, R. J. Evey, R. E. Goldman, P. E. Mantey. "An Interactive System for Aiding Evaluation of Local Government Policies," IEEE Transactions on Systems, Man & Cybernetics, Vol. SMC-3, No. 2, March 1973, pp. 141-146.
[ 3]
E.D. Carlson, J. L. Bennett, G. M. Giddings and P. E. Mantey. "The Design and Evaluation of an Interactive Geo-data Analysis and Display System," Proceedings of the IFIP Congress 74, International Federation for Information Processing, Stockholm, August 1974. North Holland Publishing Company, Amsterdam, 1974.
[ 4]
E.D. Carlson and J. A. Sutton, A Case Study of Non-programmer Interactive Problem-Solving, IBM Research Report, RJ 1382, IBM Research Laboratory, San Jose, California, April 1974.
[ 5]
T.R. Hammer, R. E. Coughlin, E. T. Horn IV, "The Effect of a Large Urban Park on Real Estate Values," Journal of the American Institute of Planners, Vol. 40, No. 4, July 1974, pp. 274-277.
[ 6]
"City Hall's Approaching Revolution in Service Delivery," Nation's Cities, January 1972.
[ 7]
Conceptsof an Urban Management Information System," a Report to the City of New Haven, Connecticut, by Advanced Systems Development Division, IBM Corporation, Yorktown, January 1967.
[ 8]
A Municipal Information and Decision System. University of Southern California, School of Public Administration, 1968.
[ 9]
R.L. Stickrod and L. C. Martin. Data Processing: Analysis of Costs, Benefits, and Resource Allocations. Lane County, Oregon, Management Report, February, 1973.
[I0]
G.M. Giddings and E. D. Carlson, An Interactive System for Creating, Editing and Displaying a Geographic Base File. IBM Research Report, IBM Research Laboratory, San Jose, California, 1973.
[II]
D.C. Symons, A Parcel Geocoding System for Urban and Rural Information, Ottawa, Ontario, National Capital Commission, 1970.
[12]
U.S. Bureau of the Census, Census Use Study, The DIME Geocoding System Report No. 4, Washington D.C., 1970.
166
[13]
U. S. Bureau of the Census: Census Use Study, The DIME Edjtin ~ S~stem Washington D.C.~ 1970.
[14]
R. J u l i , Geo-modeling:
A Local Approach, Eugene, Oregon, Lane Council of
Governments, 1972.
[15]
R. D. Hogan, Remote #raphic Terminal and Urban Geographic Information System Demonstation, Gaithersburg, Maryland, IBM Federal Systems Center, 1968.
[16]
R. D. M e r r i l l , "Representation of Contours and Regions for Efficient Computer Search," Communications of the ACM, Vol. 16, No. 2, February 1973, pp. 69-82.
[17]
B. V. Saderholm, "Paper 'Keyboard' Runs Experimental IBM System," IBM Research Division Press Release, Yorktown Heights, N. Y., March 8, 1973.
[18]
R, M. Cyert, H. A. Simon, and D. B. Throw. Observation of a Business Decision.
[19]
Journal of Business, 29, (1956), 237-248.
D. M. S. Peace and R. S. Easterby. The Evaluation of User Interaction with Computer-based Management Information Systems. Human Factors, 15, April 1973, pp. 163-177.
AR.CF I N U M B E R.
Figure 1. Tabular display of assessment data
7145 6~65 6500 7!25 7200 71R~ 6995 6q55 7095 7!55 7005 6@95 7100 7055 7020 7!65 7190
YEAR __~ CONST SIZE
ABBYWOOD CT. 196.2 A BBvwooD CT: !9~2 ABBYWOOD CT. 1962 ASBV~Q~D CT: 1961 AB!NATE LN 1960 AS!~!^TE t~!e60 AS]NATE LN. I.o 6~0 A.S!NAT~ iN: --!q60 ABINATE LN. 1860 AB!NATE L,~!. . . . !<)6! ~BINATE [N. 1960 ~8I~:ATE i N , !qAO ABINATE LN. 1960 ~N^,TE t~: !060 A F T O N CT. 1960 ~TON CT!960 A~Tf]N C T . 1961
__A r)F)RE 9,°,
123-06-174 I00 12~--06--175 !02 123-06-176 104 123-06-!./-7 106 123--06-150 112 [ P ~ - O A - ! ~I !!4 123-06-1152 116 __123--06-!53 117 123-06-154 120 !_~--06--!58---]q? 123-06-160 III ~ 3 - 0 6 -]A1 113 123-06-162 115 ]P~--06--!63 !IY 123-06-184 31 1?~-oA-I~ 59 123--06--186 35
__~
~7NNN
I ~Nr~
1700 30000 !850 - 3 1 0 0 0 1900 32000 !750 -29000 1975 34000 !650 27000 1750 30000 1780 30000 1650 29000 1600 _27n(~0 1775 29000
2 8000 29000 27000 25~ 35000
1675 1650 1600 1650 1800
FLOOR . ~ _ . S ~.P n AREA VALUE
O.
100.
UARNAM[ ~P[ S ;N OGR
111~11 • UNIT -1 19
.
US[MAP g
if!:
i'.z'"
i
Figure 2. Histogram display of housing values
RETURN
R[DR~M
~UIOSC~L(
OUPS
ZOHESYMD
ZONE
ZlliJl.
co
l
MAP
6
SUBMAP
EXPAND
SHRINK
NORMAL
STATEMENTS
Figure 3a. Map display of relative housing values
OU[RLAY
ZONESYMrB
ZONE-UALUES
CLEAR
SSMODE RETURN
FIND
ZONE
?
ZON£S
cO
M#P
I
e
SUBMAP EXPAND
-
_
NORMAL
STATEMENTS
%
"+~'F ~ l_-"~_
SHRINK
*
+
Figure 3b. Simplified display of housing values
OVERL#Y
~ONESYMB
÷+.~*+...÷
._*÷.,
ZONE-UALUES
*
+
MAPS
I
CLEAR
SSMODE
"~
RETURN
g
FIND
ZONE
?
ZONE1 t
C)
171
AVE $ IN
AVERAGE IHCOtIE PER HOUSEHOLD
ACCESS $
ACCESSABILITY TO DISPOSABLE INCOME
ACCESS
ACCESSABILtTY TO EHPLOYMENT
AVLAND-S AVLAND-M AVLAND-C AVLAND-I
ACRES ACRES ACRES ACRES
GROWl NXS GROH NXM
GROWTH FACTOR IH SINGLE FAMILY GROWTH FACTOR IH MULTIPLE FAMILY
ODWU/A-S ODWU/A-M ODNU/A-T
NO SINGLE FAMILY DWELLIt!G UNITS PER RESIDENTIAL NO MULTIPLE FAMILY DWELLING UNITS PER RESIDENTI TOTAL NO DUELLING UNITS PER ACRE OF RESIDENTIAL
EMP-MFG EHP-WHOL EMP-COMH EMP-TC&U EMP-GOVT EHP-TOTL
NO. OF Et,IPLOYEES HORKING NO. OF EHPLOYEES WORKING NO. OF EHPLOYEES WORKING NO. OF EMPLOYEES WORKING NO. OF EMPLOYEES WORKING TOTAL NO OF EtIPLOYEES
EHPDEN-C
NO. COHHERCIAL EMPLOYEES PER ACRE OF COMMERCIAL
HHOLD0-6 HHLD6-10 HHLD1015 HHLDZ5+
NO. NO. NO. NO.
ISHOPCTR IBAY
ZONES OF ANTICIPATED SHOPPING CENTERS ZONES TREATED AS BAYLANDS
OCDWUN-S OCDWUN-M OCDHUN-T
NO. OF EXISTING SINGLE FAMILY DWELLING UNITS NO. OF MULTIPLE FAMILY DWELLING UNITS TOTAL NO. OF SINGLE AND MULTIPLE FAMILY DWELLIN
OCLAND-S OCLAND-M OCLAND-C OCLAND-I
ACRES ACRES ACRES ACRES
PRODEN-S PRODEN-M
PROJECTED DEI;SITY FOR SINGLE FAMILY DEVELOPMENT PROJECTED DEHSITY FOR M U L T I P L E - F A M I L Y DEVELOPME
POPUL-S POPUL-M POPUL-T
TOTAL POPULATION IN SINGLE FAMILY D#IELLINGS TOTAL POPULATION IN MUt_TIPLE FAHtLY DWELLINGS TOTAL POPULATION
POP/HH-S POP/HH-M POP/HH-T
POPULATION PER HOUSEHOLD FOR SINGLE FAMILY DWEL POPULATION PER HOUSEHOLD FOR MULTIPLE FAMILY DW POPULATION PER HOUSEHOLD FOR ALL DWELLINGS
RES-LAND
RESERVED LAND-NOT
OF OF OF OF
AVAILABLE AVAILABLE AVAILABLE AVAILABLE
FOR FOR FOR FOR
HOUSEHOLDS HOUSEHOLDS HOUSEHOLDS HOUSEHOLDS
OCCUPIED OCCUPIED OCCUPIED OCCUPIED
BY BY BY BY
SINGLE FAMILY DEVELOPMENT MULTIPLE FAMILY DEVELOPMENT COMMERCIAL DEVELOPMENT I~DUSTRIAL DEVELOPMENT
WfTH HITH ~qlTH I'IITH
IN IN IN IN IN
MANUFACTURING WHOLESALE AND TRUCK COMMERCIAL(RETAIL) TRANS, COMMUN,AND U GOVERNMENT
IHCOME: 0- 6000 INCOME: 6 0 0 0 - 1 0 0 0 0 INCOME:lO000-15000 INCOME:IS000+
SINGLE FAMILY DWELLINGS MULTIPLE FAMILY DWELLINGS COMMERCIAL DEVELOPMENT INDUSTRIAL DEVELOPMENT
AVAILABLE FOR CURRENT DEVELOP
SLOPE
PERCENT OF LAND tilTH
TSKTR
CHANGE FACTOR IH INTRAZONAL TRAVEL TIME
DISP SIN
DISPOSABLE I I;COHE PER ItOUSEHOLD
S EI'IER FLOOD
LESS TH#N 10~ SLOPE
SEWER SERVICE DISTRICT FLOOD CONTROL AREA
Figure 4. Land use/transportation planning data base [2]
N 1 y2
~--5[
I AG06 ]
Single Faro.
._ D~ S 2 G ~
P i A 2 G3
N ~-
Alley
N ~ y2
Condominium
•
Fireplace Cost
IAK!9
PoolArea Pool Extras Mi sc, Cost
N 1 y~ N'; Y~ N I y2
Nt
y:Z
Gr.do Bank
A!~ I L I I
"
A
IL°'l
N;"Y ~
.
J
ii
IAL0i I .!~ ~3 / AL0_~ N'Y___L IA.B3 I N'Y'
TOPOGRAPHY
Misc, Stroct. Cost I AK22
IAK21
]AK20
IAK18
Patio Factor
IAK17 N 1 y2
N'~ -.r,2 L'~ M2 H3
__
J.AKml ! i i I
. ~ A K I ~
N I y2
Patio Area
~st--
A~13 AK14
AK11 I
Slope
~25
A J22
' ' ' ~ ~ - ' - -
~08
AK_~ ~
~,~t i~ i
N 1 y; N I y2
Z
I"I' l ! ! ,,,,,i - - -
I i i i
,-t1~.
!
! TT--
i i
Carport Area Pch. A~'ea Pch. Factor
N ~ y2
$
Car~ Factor
Addn. Factor
Other
Ti le Roof
Stru. Failure
Built~Ins
Fireplace
A J21
Guest House
Factor
N 1 y2 N ~ y2
P~ A2 G3 N I y2
P, A' G3 P' A' ~
Bsmt. Addn. Area
Base,ot
A,!~ ....
N 1 y2 N' Y~
N 1 y2
~20
A J18
A~m AJ17
A Jt4
A J12 AJ13 A J13
I AJ l l /
j~9/ t ~'° /
Fence
Pool
Decking
Patio
Cavort
Gar, Cony,
Gar.
Cooling (Ducted)
Heating (Ducted)
Stg. Space
~or.=nshi~
cond,,,o~
AK05
3~d Fir, Factor AK__~
AK04
3rd Flr. Area
AZ02i
~o3
Bsmt. Area
I
AK01 i } i--i-----
2~dElf. E~tor
1st Flr. Area
I
AC-~ - - ~
__SHEETS
ACO5 I i i i --
2ndFlr, Area
FIELD USE oNLY - D O NOT ENCODE
Horses
1 I
AI27 I
~i26
N 1 y2
H ~ A3 L 3
P ~A 2 G3
N 1 y2 N'Y 2
N 1 y2
N1 y N 1 y2
N ~ y2
N t Y~ N I yZ
Figure 5. Field w o r k sheet for appraisal data
AO~2E~ Venturm Ooun1¥ A i l I I I ~ ¢
l AH03l Nr'y~
N' Y;~
AH0,21
--Prop.Lot Utit,
Beach Front F
Docking Rights
P ~A ~ G3
AH01
A 25
Water Front
Prop. hnproved
At24
AI23 I
AI21 I AI2~
Traffic Flow
View Oual.
Sewer View
AI20 I
H. & B. Use
i
At 18 t AI19 I
"
A117
~i16
AI15 I
All3 I " 7 ~ A,,.___t .' Y'
~til.ooms __1~.]~ ~,.ooma IAC='i i i
A,,O I .'~ A,. I "'Y=
Common Green Common Rec.
St. Lights
Sidewalks
Curbs & Gutters
Ut,, O~G
Arch. Attr.
N ' y2
" - ' ~ Y-2
TOTAL PROPERTY
~
Trans.Trend
J AGO1 I
IMMEDIATE AREA
Res. Area
Mkt, Demand
I AF03 |
I ~;;
~
Other
TEMPORARY VALUE
Bo~rdAc,~o.
Sales Price Confirmed
~ . 0 ~
I~j__N I~H__ I-I-
~'_.
IREL. A , ~ ~ A,0. L " ~ ~ ~ . 0 .
~.---~t.TActua,} _ ~ ~
~co,,~ so Ft ,0saa.,e) Total Prop. Val. A~,, i ~" T~0ie., Land Value AB04 t ,+,,, ~ ~ l i I ,r.egata. Imp. Va,ue AB061 i i i i i New Lot Value AE02 [ ~I :I =I :I_ I I .ooT.ruS. $~ESOATA St-Frontage lee'd, Date A~__~T I l ~ I I F4-1- 4
Site Use Code
Zone
OF _ _ _
COST DATA Total Living Area
BUILDING--DATA
Appraiser No.
~L
SHEET. _ _ _ _
APN (AA01)
Quality Class
LAND ATTRILBUTES _ _
SITUS (AA10) --
RESIDENTIAL FACT SHEET
App'l. Date
EOCK__
NEIGHBORHOOD (AC17)
LOT - - - -
RECORD DATA
TRACT ( A C 0 6 ) _ _
DIS[RICT (AC16)_
t~
173 Office of the County Assessor 201 County Admirfistrat~on Building 70 West Heddin9 Street San Jose, California 95114 299-3941 AreaCode 408
County of Santa Clara California
Date Recorded Recorder's Deed # Property Description # Our records indicate you purchased this p r o p e r t y . ~ What was the full price? a.
Amount of cash down payment:
b.
Please enter details concerning any balance: (I) Ist Deed of Trust $ Duration of L o a n y e a r s (2) 2nd Deed of Trust $ Duration of Loan
at
% interest
at
% interest
years
c.
Was a trade involved?
d.
Outstanding Improvement Bonds against property
Yes
No
e.
Did price include personal property?
$
Yes
No
If yes, please estimate value of such property $ f.
If this is income property, please enter the monthly gross income as if it were 100% occupied $
2.
Remarks: Please enter on the reverse side or by attachment any information you feel may help us to make a fair appraisal of your property.
3.
If you would like mail concerning this property sent to a different address from the one above, please indicate below.*
4.
If there are questions regarding this questionnaire please contact the Assessment Standards Division at 299-3941.
5.
See
Reverse
Side.
Signature of Owner Address Telephone Number City, State, Zip
Date (~, 7047 REV, 11/71 .~
F i g u r e 6.
Assessor's questionnaire
for financial data
Block 8
~21
19
16
17 < -& 2nd Street
Block 5
1st Street
Block 2
Block 9
= 22 segments(records)
Name
12
Legend Type
Su{fix
1 2B1lOck 10
11\
D
B,ook7
Block 3
From node m~mber, x, y, map number To node number, x, y, map number Left tract, right tract, left place, right place, {eft block, right b~ock Left Low address, high address Right low address, high address
13
7 ~ 6
Figure 7. 'DIME' geographic base file structure
Contents of a Sample Record Contents Maple Av. N.W. 2 1530000 31000 18 1530100 30100 205 205 42 42 5 100 198 101 199
Number of Segments Maple Av. 5 Elm Dr, 3 Coyote Cr, 6 1st St. 4 2nd St. 4
O
2O
Block 4
1®Block 1
Source Map
Map 6 CensusTract 205 Place42
:3 4~
9
26
N
15
50 53 41
Houses...
47
146 203 192
Population
Figure 8. Example of tables in extracted data base
15 6 4
3 17 4
Crimes . . . #2
1 2 3
~[ Year 10 I Year 2 Year 1 i Crimes #1 Zone II
--4
176
r l Data I Extraction
I Functions/
Presentation
I t
Aggregation
II l Data Description
Subsetting
I
h I / I i:~X4
[°'''Ex"iracted 1 Data Base I Management
J i\\
Large Data Base Management
F
' I
;
;
©-.-©
Large Data Base
Figure 9. Interactive data extraction and problem solving system
177
Transportation Zones Data Base
DynamicallyAggregated Property and Census Data (Compatible)
II CensusCompatible Property Data
T Property I nfo. System
Added Data (Public Lands, etc.)
Figure 10. Extraction and aggregation relating property and censusdata
Z
U
m--W X.J
Wm.-$
I--
Ol.b~.Z W W2> ..JW
JO ::)
0
Z
H
W Z
~t W U
Z O H I---
Ul Z O ~ 1"~ U H
¢~
W gL
H
1E
Z O H
0
Z O ~-i I"-~t U lUL
ell
"~ Y Z~
gL E Z 0
Z L~ 2>
U~ I-Z W 2> w
_J
gl p..,~ H
Z o'1
~9
Z W
Z
H
O
Z
~J
o
E
2~
E
E
o~
o
x LU
o') LL
[RAS[
R[STOR[
SCRRTCHPRD
GET PRG[ 6 UZ(~ T I T L [ ~ ~ ' " SRU£ PAGE R(DRRU ......
Figure 11a. Data description function
]LIPUT RODE R ( P L R C [ ROD[ sON CH[CK RETURN
m
m
m
m
m
DELETE L I N E INSERT BLANK L Z N [ COPY L I N E DOWN PRZNT
DATA D [ | C R Z P T Z O N |
ZBfl [HPLOYEE F Z L [ ~ZTH X , Y [RP S2 HATCH KEY rROfl UNXflRTCH ~'14TCH CH 13 1 STR[[T HUR|[R r r N o ¢H S fLANK1CH t STRE[T NRH[ Sr~rN~R[ CH 16 9TRE[T TYP[ rTTYPE CH 2 NRfl£ L~TY CH 10 2"IP cz s COD( rLRNK2 CH 1 COORDZNAT£ CZ 7 rt.~NK3 CH 2 COORDZNRT[ TC! 6
C R [ R T [ AND [ D Z T
~0
=ON
ynJ;
X~t
;
S[L.[¢T][ONS
ERAS[
RESTORE
SCRATCHPAD
GET PAGE 15 VIEW TITLE~"-'-SAVE PAGE 15 REDRAM ........
~ 95125
TN[N
THEN
AND [ D I T
Figure 11b. Data selection function
I N P U T HODE R E P L A C E MODE CHECK RETURN
ZIP
OR Y ) l l l i J J 8 8 0
Y(=m
MNI[R[
OR X ~ 2 t t i a i l
X
~LECT
L,P
CRERT£
D E L E T E L%NE I N S E R T BLANK L I N E COPY L I N £ DONN PRINT
0
8
8
e
e~94~ ~o918 e~870 ee91% 81674 E1436
01930 01742 81348 a1551 01482 81623 015~8 81550 81060 018B8 81824 81479 ~1514 01502 02243 ~1656
~1~54
01499
81539
01845 11608 81631 I1191 81148 01487
18188
SELECTION
HUMMINGBIRD HUMMINGBIRD HUMMINGBIRD HUMMINGBIRD HUSTED HUSTED
GLENUNR GL[NUNA GLENUNA GLENWOOD GRACE GRACE GRACE GRACE GUADALAJARA HAMILTON HARMIL HAZELWOOD HERVEY HERUEY HICKS HICKS HILLSD~LE
GLENPINE
GLENPIN[
GLENN GLENPINE
GLENN
GLENEYRI[ GL£N[YRI[ GL£NFIELD GLENFI£LD
Figure 1 lc. Listing of data selected
LISTING: F I L E = £MP USING P l P 2 : NO
MT8%OB~4 MT~00004 MTgg~ge4 HT8008~4 MTee0004 MTeeeee¢
GRADDRESS
MT8580B4 MT008082 MT080804 MT008802 MT080004 MT@08084 MT~08004 MT000084 MTg00284 MT~08~84 MT000804 MTeeeeQ4 MT8808~4 MT~gO~4 MT000004
MT808084
HT000004
NTI818~2 NT~I@EB4 MT8~18~4 flT011804 MTmeeSg4 hTlleli4 MT~llil4
SANJOSE SaN/OS£ 9AN30SE 9RNJOSE SANIOSE SANIOSE
= EMP
DR DR DR DR AU ~V
AU SANJOS£ S~NJOS[ AU SANJOSE 9RNJOSE AU S A N I O S E RU 9 A N 3 0 S E AU SAN3OSE AU SAN30SE AU SAN/OSE ~Y 6ANIOSE NY SANIOSE AU S A N 3 0 S E LN SAN/OSE LN SANJOSE AU SANJOSE AU SANJOSE AU 5ANZOSE 1589186 I599225 1599225 1599225 1591335 1594501
1596148 1596628 159614# 1592291 1587514 1587719 1587514 1587687 1592633 1591109 1596299 1596056 1596632 1597340 1591625 1592833
1591723 1598839 1592581 159237? 1591593 1591758
RETURN PAGE FORWARD ~AGE 5
95125 95125 95125 95125 95125 95125
95125 85125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 95125 85125 95125
95125
95125
D~ S A N 3 O S [ DR S ~ N 3 0 9 E
S5125 85128 95125 95125 95125 95125 95125
SAN30$E SANJO8£ 9AN~OB[ SAN308E 8AN30SE SANJOSE SAN30S[
AU DR DR AU AU DR
289587 289352 289352 289352 285412 28765~
293696 292822 293698 293314 293095 294192 283085 293232 267304 292471 293944 292681 297084 296305 293923 289482
g
29|3|4 2979E1 28$780 285847 296320 296376 e
PlP2:
EXPI~ND
SNRINK
SELECTION
[MP
NORMRL
=
DISPLAY SCALE TO PIP2
Figure 11d. Map showing location of selected data
SET
M'ClP:
EMP
=
DR~W
FILE
5~ EVENTS RETURN EI~ASE CENTER
DATA BASE USER LANGOAGES FOR THE N O N - P R O G R A M M E R Peter C. Lockemann Fakultaet fuer Informatik,
Universitaet Karlsruhe
D-75 Karlsruhe
1
ADstract in
light
base These
of
systems are
the
necessary
investments,
c o m m e r c i a l l y available data
usually offer c o m p a r a t i v e l y g e n e r a l - p u r p o s e
s u i t a b l e only for the data base specialist.
interfaces.
In order for a
aata base system to attract non-programmer users, interfaces must be provided that approximate the special user terminology and conceptualizations, heterogeneous Questions should
of be
implementation for selecting
if,
in
a
variety
group, interest
are
standardized,
particular, of
these
interfaces
users be
form
a
required.
then the extent to which user interfaces the
techniques
which
allow
rapid
of new more specialized interfaces, or the p r o c e d u r e the most suitable inter£ace for a given problem. Based
on the concept of h i e r a r c h y of abstract m a c h i n e s , possible
approach
will
introduced
be
will
the paper presents a
to the solution of these questions. to critically examine
some of its merits and shortcomings.
Three examples
the concept and d e m o n s t r a t e
184
1 Introduction The
success
or
weil-conceiveo decided often
by
of
may
appear
it
by
towards
in£ormational status
of
to
to
base
the
system,
author's mind,
of the
ones.
the analysis
such
o£
as
analyzing
system
and the n e c e s s a r y integration,
structure,
adaptation
of
old
must
use the system.
the needs o£ the o r g a n i z a t i o n
are time
o£
the
resources
and
All too often, much less attention who
the
the current
a number of requirements
information
relinquishment
individuals
to a p p r e c i a t e
problems
extent
structure, new
how
is ultimately
institution or organization,
information
organizational
matter
to serve. T~lis aspect is
flow within the o r g a n i z a t i o n
the
no
p l a n n e r s wr~o devote almost their entire
an
From
as
characteristics,
to
of
it.
such
expected
syste~
in£ormation
provision
data
organizational
needs
improvements deriveG
a
the users the system is s u p p o s e d
overlooked
effort
paid
failure
is being
They are simply
and to adapt most
w i l l i n g l y to the new environment. Human to
nature~
the
same
problems
however~
is conservative.
terminology
unless
and
and
until
Human individuals will cling
m e t h o d o l o g y and try to solve the same
one can make a most c o n v i n c i n g point
for
reorientation. In m a n y cases data base systems are not even introduced to solve new Kinds o£ problems. Rather they are supposed to improve the
solution
least
use
to existing and already w e l l - u n d e r s t o o d problems,
these
circumstances
there
radical changes Unfortunatelyw one data
side by
is
no
a
point
potential
departure.
Under
or at these
represents
each
o~ a data base system this is just
For him, the d e v e l o p m e n t
of
a
a
sales
large
general-purpose
and i m p l e m e n t a t i o n of a
investment which he can only
£igures.
This
precludes
large v a r i e t y of individual
him to offer g e n e r a l - p u r p o s e
these
o~
reason w~y users snould De burdened with
for the m a n u f a c t u r e r
corresponding to
as
in style.
system
attending compels
is
of a coin°
base
justify
problems
interfaces
user who has his own special
interfaces.
him from
user needs Dut
On the other hand
it
that prove repugnant to m a n y a terminology,
conceptualizations
and a p p l i c a t i o n problems. In order to resolve the dilemma, techniques must be d e v e l o p e d that permit the a d a p t a t i o n of a data base system to various user needs. In particular, questions. (i) How
the can
operational system?
s o l u t i o n s should address user and
language management
themselves to the following
interfaces
be
separated
characteristics
o~
from
the
the data base
185
(if)
Are
there
any
the
rapid
implementation of a user language according to given
techniques that allow,
in a systematic way,
for
specifications? (iii) To (iv)
which
extent
it e c o n o m i c a l l y
feasible to c o n s t r u c t
"off-the-shelf"
Given
set of language specifications,
can
a one
define
build a
answer
between
under which c o n d i t i o n s
on
user
and d e t e r m i n e s
languages
that
formalizes these
the amount of effort required?
these q u e s t i o n s we shall define a hierarchical
user
and
user languages?
upon an already existing user language? Can one
relation
conditions To
is
stockpile
languages.
The
nature
of
the
relationship
relationship
will be
discussed
in
explicate
the approach and to point out its merits as well as some of
its
some detail. A number of examples will be introduced to
present
shortcomings.
non-procedural
interactive
The
discussion
is intended b a s i c a l l y for
languages.
2 Hier@rcni,es O f user lang,uages 2.1 Concepts The
hierarchy
of language
interfaces shall be defined as follows
[Kr
75]: -
Each interface
is defined in terms of a ("lower")
itself serve as the basis for definition -
There
is
another other Such
a
interface,
of a ("higher")
exactly one interface which cannot be defined interface
and
hence
and may
interface. in terms of
serves as the ultimate basis for all
interfaces. h i e r a r c h y of interfaces may be g r a p h i c a l l y represented
form of a tree where each node corresponds to a particular level 3
level
2
level 1
level figure
I
in the
interface.
0 (DBMS)
186
The
hierarchy
must
be
chosen
such that it reflects a h i e r a r c h y of
users. Level 0 c o r r e s p o n d s to the data base specialist, while level might cater to a user c o m p l e t e l y untrained in computer affairs. The
previous
questions
can
now
3
be restated with a little bit more
precision. (i)
Can
all
fundamental
(ii)
What are the formal c r i t e r i a that allow to c o n s t r u c t a h i e r a r c h y
solved u n d e r n e a t h
operational
and
management
functions be
the basis on level 0?
by defining new languages in terms of existing ones? (iii) Up to which level in the tree should interfaces be s t a n d a r d i z e d ? (iv) Suppose a given language s p e c i f i c a t i o n is represented as a node. Can a path
to an existing node De constructed,
path be "measured"? Can one d e t e r m i n e
the
length?
I£
the
introduced, At
tnis
point
path is too long,
and the length of
the path with m i n i m u m
should
intermediate nodes be
and what would be their s p e c i f i c a t i o n s ? in time,
"length"
is no more than an intuitive notion
for which a formal measure does not exist.
However,
a rough outline of
the a e f i n i t i o n
of one node in terms o£ another one may often give some
insight
the
into
amount
of
effort
necessary
and thus provide an
estimate of the length. Language
hierarchies
programming languages
(e.g.
programming languages
have
languagesf PL
languages [SI
741).
368
long e.g. [Wi
been
mentioned
Assembler 68],
ESPOL
-
[Bu
72])
- Very high level languages However,
except
for
being
in that
terms this
o£ a lower-level would
entail
programming
inefficient
programming -
High-level
(e.g., set o r i e n t e d
macro languages
rarely c o n f o r m to the strict d e f i n i t i o n given above defined
in c o n n e c t i o n with
Low-level
these do
(e.g. COBOL is not
language),
compilation.
the reason The
same
argument does not hold for data base languages where language analysis is but a minor part of query p r o c e s s i n g
[Kr 75].
2.2 E x p l i c a t i o n s The
notion of h i e r a r c h y as introduced above is still vague and should
be made more precise. are introducede
B e l o w several c o n c e p t s
known from the l i t e r a t u r e
Their u s e f u l n e s s as well as some of their d e f i c i e n c i e s
will be d i s c u s s e d
in the remainder of the paper.
187
(i) C h a r a c t e r i s t i c s There
exist
basis
for
these
claims
have
to
such
schools that claim to provide the just and only
base
one
meet.
concepts.
ought
It
is
Before one may pass any judgment on
to agree on the criteria that a basis would commonly
accepted
that a data base is to be
as the model of a certain reality.
that
physical authors
several data
considered
of the root.
it or
Hence a basis should be
p r o v i d e s concepts so p r i m i t i v e that any reality, conceptual,
could
be
adequately
covered
be it
by it. Some
lab 74, Su 74] have attempted to enumerate certain primitives:
elementary types),
objects,
names,
manipulating
properties,
relations,
orderings,
categories
as well as sets of operators for creating,
and
organizational
deleting
these.
In
addition,
(or
accessing,
one might consider
q u e s t i o n s such as p a r a l l e l i s m and sharing of models by
various users. (2) D e p e n d e n c i e s b e t w e e n successive nodes. Since to
it
the
is e x t r e m e l y general, average
possible
user.
realities
Users
but
the root is of little practical value are
with
invariably concerned not with all
certain classes of realities,
their models to reflect the c o r r e s p o n d i n g the
modeling
tools
on
and wish
limitations.. In other words,
level 1 will differ
from those on level 0 by
defining certain restrictions on the way the p r i m i t i v e s may interact. The same o b v i o u s l y is true for level 2 vis-a-vis level i, etc. These restrictions composed
relate
into
operations
new
mainly
to
objects,
the
manner
relations
in which objects may be
into
new
relations,
and/or
into new operations.
(3) C h a r a c t e r i z a t i o n of a node as an abstract machine. Basically, determine little
the the
bit
introduced. operators
restrictions dependencies more
An for
between successive nodes.
precise,
abstract
defined on the p e r m i s s i b l e c o m p o s i t i o n s the
machine
manipulating
concept
of
To make this a
abstract
machine
is
is a set of object types, a set of
objects
and
de£ined
on
object types,
together with a control m e c h a n i s m that allows to construct and execute sequences
of
operations.
Each node is then described
in terms of an
abstract machine. (4) D e p e n d e n c i e s between abstract machines. By
assigning
an
abstract
machine
to
each
node,
the
following
properties must hold between two successive nodes A i and Ai+ 1 [Go 73]:
188
a)
b)
The resources and the functions provided by A i form the complete Oasis on which to build Ai+ I. There is no way to use properties of Ai- 1 in building Ai+ I. Hence every A i is a complete interface description in the hierarchy. Resources of A i used in defining new resources of Ai+ 1 can no longer be present in Ai+ 1 (i.e. they may become resources of Ai+ 1 only i~ they are not part of a definition for another resource o~ Ai+l).
Keeping these rules in mina I shall attempt, as a matter o£ illustration, a tentative classification of some results discussed in the literature [Ab 74, Co 7~, We 74, Kr 75, Wo 68, Wo 73, Gr 69, Col 68]. ~
SQUARE
SEQUEL
l~re~
~
Lunar
r-----irestricted ~ restricted J Jnatural j Jnatural ~nglish ~English Relational model
]
3
--...
I
~redicate logic
!
_ _I-
~
Pharmacy
~-L-~estricted j Jnatural ~German
I jsemantic J jset ~ - Ljprimitives_ _ ! -- I theory
.......
figure 2
2.3 Consequences The concepts and rules introduced above impose a certain discipline on the design of user languages, on their application, and on the transition between them. Some of the consequences are outlined below. (i) If we strictly keep to the rules above, a new interface must be defined in terms of its immediate predecessor and not any arbitrarily chosen predecessor, i.e° immediate predecessors must not be bypassed ("stepwise abstraction"). On the other hand, given certain specifications and a suitable node in a tree, intermediate nodes that hopefully are of general usefulness should be introduced on the intervening path whenever the path proves too "long" ("stepwise refinement").
189
(2) Given
a path to the root, a user should De put into p o s i t i o n - at
least
in
principle
languages
-
to
formulate
his requests in any o~ the
that correspond to the nodes on the path. As a matter of
fact, we found this an essential p r e r e q u i s i t e testing
any desired level of detail (3) Queries
are
stated
on
some
level
between
levels
Definition
(of
abstract machine)
other:
on the higher (4) Results
an
until
and
the
must
root
successively
has
been
be
reached.
and translation r e c i p r o c a t e
The definition of the next higher
one d e t e r m i n e s
the
SYstem
[Kr 75].
translated each
for efficient
since system activities may be observed and c o n t r o l l e d to
level £rom a given
the rules that govern the translation of statements level to those on the lower level.
are p r o d u c e d on the lowest level but must be presented to
user
on
evaluation invoked
a
of
higher a
level.
query
a second
in order to propagate
As
a consequence,
("reverse")
following the
translation must be
the results to higher levels.
3 Set theoretic basis 3.1 M o t i v a t i o n The
rules of ch.2 have been applied to the construction o~ the KAIFAS
question-answering
system
this
be
system
will
practicability
of
system the reader Restrictions realities that
relations
exclusively meet
or
on
rules.
regard
are
to
to
more
[Kr 75].
the general basis are m o t i v a t e d by the
consider.
exclusively
and,
the
For a more d e t a i l e d d e s c r i p t i o n of the
is referred to the literature
wishes
relations
binary
the
with
one
and have proven highly useful there. Hence
chosen as the first example to d e m o n s t r a t e
of
In the case of KAIFAS we presume the p r o p e r t y type
important,
that
objects
(sets) or are are selected
the basis of given p r o p e r t i e s or relations which they
undergo, perhaps in logical combination.
Indeed one can show
that the set theoretic approach may be viewed as a g e n e r a l i z a t i o n of the inverted file technique [Kr 75].
190
3.2 Set theoretic machine O biect
tlpes_
I Elementary
objects
Aspirin M Sets, e.g.
city,
(individuals),
e.go ~ans Maier,
Bonn,
medication
List of individuals. R Relations,
e.g.
~ather,
contraindication
List o~ ordered pairs of individuals. Z Numbers D MeasuresF
e.g.
Ordered pairs
2 years, (number,
4 tablets/day unit expression).
F Measure functions~ e.g. age, dosage Lists of ordered n-tuples whose last components measures.
are
B Truth values ~perators On retrieval Set,
the machine
relation~
and
is supposed
function
to function
names
refer
in the £ollowing
to
objects
way.
in permanent
storage° In order to manipulate the objects they must be trans£erred into unnamed registers o£ which an unlimited number is thought to exist.
Hence
all
operations
register-to-register
except
for
the
load
operations
are
operations.
Load operator__ss Load
Mw, ev, en, ef
a
function,
set,
a
relation
(ev, en),
respectively.
Set operators MU: Mx~-~M Union Mn: MxM-~M Km: MxM->N Kz: M->Z
Intersection Relative complement Cardinality
Binary relation
{x[xeMiAx@M2}
operators
Ko: R-~R Rb: RxM-~R
Converse relation Restriction { (x,y) ~ (x,y)eRAxeM}
Rp: KxR->R RU: RxR-~R
Product Union
Reduction Vo:
{(x,y)~ 3 z:(x,z)eRIA(Z,Y)eR2}
of binar~ relations
R-~
Domain
{xI3y:(x,y)eR}
and a measure
191
Range
{xJ3y:(y,x)eR}
Na:
R-~M
Vg:
RxI-~M
Individual
domain
Ng:
RxI-~M
Individual
range
VgU:
RxM-~M
Restricted
domain
Reduction of measure Fw: FxI->D (n=2)
{xJ(x,I)eR} {x~(I,x)eR} {xl(x,y)eR^yeM}
functions
Logical 0Perators e: IXM-~B Test on set membership c:
In
MxM-~B
addition,
the
standard
Test on set inclusion
the standard arithmetic
logical operators and
comparison
are available
operators
as well as
for numbers
and
measures. Control m e c h a n i s m Sequencing
of operations
"Programs"
for the set theoretic machine
notation. Operations are performed nested argument, from inside out. Example:
A
question
such
would take the following c(Mw(Mcity),
are expressed
in a functional
from left to right and,
as "Are cities birthplaces
~or each
of engineers?"
form in the set theoretic machine
VgU(en(Rbirthplace),
Mw(Mengineer)))
Loops Loops are introduced three arguments:
by
resulting
the
use of bounded quanti£iers
i)
An expression
2)
An
3)
The name of a bound variable; invocation of the loop.
expression
(scope);
for
in a set of objects condition
it may be regarded
Important q u a n t i f i e r s are AL: MxB -~B all, every EI: MxB -~B some DB: MxB -~M
the
which
which nave
(range).
resulting
in
a
truth value
as the loop body. each o£ its substitutions
defines
an
192
ZB: Mx~ ->Z how many with the le£t-hane ~
the
set
bounding
and
the
le~t-nand
5 tne
conoition. Zxamples : DB
(x~Mw(~city) ~ e ( x r V g O ( e n ( R b i r t n p l a c e ) , M W ( M e n g i n e e r ) ) )
with the meaning DB
o£ "~nicn cities
are birthplaces
)
o£ engineers".
(x I , Mw (~manu f) ZB(x2, Vg(en(Rprod) ,Xl) , DB(x 3 , l~w (~lailment) , e(x2, Vg(en(Rmedic) ,x3)))))
with the meaning of "How many products m e d i c a t i o n s £or which ailments?" ~x~ressions Set
o£ which m a n u f a c t u r e r s
are
in the data base
membership
represen£ation
o£ an arbitrary o~
a
set,
~ind is expressed
arbitrary
set
Dy including,
expressions.
in the
Example
(in
German): Mrezeptp£1ichtig Ispasmocibalgin Vg(en(RDerivat), IOxazolidin)
®
IMorpnin Mw(MOpiate)
®
MW(MHypnotiKa) IMethadon Vg(en(RDerivat),
IS uccinimid)
Vg(en(RHeilmittel), where
~
indicates
drugs, Q a l l Tais
concept
all
opiates, is
its advantages are: - Since all objects
IAgitiertheit)
derivates
of
Oxazolidin
to be prescription
etc.
extended
to relations
and measure
functions.
are e v a l u a t e d on request only, changes
Dase may De made locally without that may exist.
Two of
to the data
regard to any interrelationships
193
- Expressions individuals
may be stored without regard for the existence of any for it. Hence one could construct a data base consisting
exclusively
of higher-order
One consequence, however, defined recursively since
relationships.
is that the control mechanism must itself be it may be invoked on any load operation.
3~3 Natu~@ 1 !anguage Few
users
will
feel
at
ease
with
the
highly
stylized
language
introduced in sec. 3.2. One possible step of abstraction, therefore, is the definition of a new abstract machine accepting natural language input. By necessity this is a highly restricted form of natural language
since
its semantics,
and hence
its syntactic
forms,
can be no
more than what may ultimately be reduced to a set theoretic interpretation. Moreover, it must be considered more restrictive than the set theoretic interface because while one may nest set theoretic expressions to an arbitrary depth, those beyond a certain depth simply cannot be stated To
speak
with
in n a t u r a l language
of
objects,
operators
natural
language
turns
in any comprehensible
and control mechanism
out
fashion.
in connection
to be highly unnatural,
It is possible,
that
in terms of the syntax of the interface which in turn may
level
however,
or rather
impossible.
to define an abstract machine
still be based on object types. This is in striking High High
similarity
on
to Very
Level languages vis-a-vis High Level program/r, ing languages: Very Level languages are loosely described as languages used to
specify what is to be done, rather
than how it is to be done
[SI 74].
In accordance with sec.2.2, the object types must relate to the ones of the set theoretic machine. In this case the relationship is straightforward as indicated by the following list: N proper names for the objects of the universe. A attributes (properties of an object of the universe). R references from one object of the universe to a second one Thebacon is referred to by Morphium M references to measures. D numbers or measures. S sentences.
These
or no, and proper names.
are of two kinds:
sentences
to
be
(e.g.
as its derivate).
sentences
answered
to be answered
by yes
by counting or enumerating
194
Some
examples
language
from
XAIfAS
in
which
German
was chosen as natural
interface.
Ist Psyquil
rezeptpflic__~ht_!~?
N A Betraegt die T a g e s d o s i s yon C n i n i d i n M
2 Gramm?
N
D
~elcne O e r i v a t e yon ~ o r p N i u m sina r e z e p t p i l i c h t i g f
The
syntax
of
the
inter£ace
is
describea
by
a 9ra~az
~itn tile
iollowing general properties: (i) S y n t a c t i c a l cannot
variables must
relate to the object types, hence
be based on tile traditional grammatical
noun,
noun
phrase,
essentially
adjective,
semantical
(attributes),
etc.
in nature.
RE(references),
categories
but on c a t e g o r i e s
they
SUCh as that are
The v a r i a b l e s are IN(names),
~F(references
to
measures),
ME ZA
(numbers) ~ SA (sentences), QO (quantifiers} . (2) On the other hand, the traditional c a t e g o r i e s inust be accounted for in some way, a consequence, features. sAS FE~ NED STR ATT ~OM
e.g.
in order
each syntactical
variable
incorrect
inflections.
is indexe~ my a number of
for
restricted
natural
nominative ) genitive ) case aative ) accusative ) wora c l a s s ( a a j e c t . / n o u n )
language,
grammars are Know~ to be
e x t r e m e l y complex because of the m u l t i t u a e of syntactic aspects be
observed~
insofar
As
Examples:
masculine ) NO~ feminine }gender GEN neuter ) OAT strong ~ e c l e n s i o n ACC attribute apposition ADJ number (singular/plural)
(3) gven
to reject
The
as it can be arranged
a) a c o n t e x t - f r e e grammar
in two levels,
in terms of the v a r i a b l e s
from (i); b) a feature program to be a s s o c i a t e d wit~l each p r o d u c t i o n on level a). Example:
Typical p r o d u c t i o n s of level a) are
aE
ME
-~
aE
ME - ~
RE
ME - ~ ~E -~
RE NE RE 1N
to
a p p l i c a t i o n of features s i m p l i f i e s tI~e grammar
SA -* ~IE sind ~h?
195
The production ME 1 -~ ME 2 ME 3 refers to the following feature program numbered
(syntactic variables are
for reference).
Part I: Test o~ right-hand features for acceptance (reduction takes place only i~ the condition is true). t__es~ (ME2,+ADJ+ATI')
A test
A ~!e~ (MAS,FEM,NE0,ME2,ME 3) A egu (NUM,ME2,~E3) Part 2: Assignment
(NO~,GEN,OAI,ACC,~IE2,~3)
of features to the syntactic variable on the
left-hand
side.
-ADJ-ATT,
co_~p (NUM,ME2),
and
(ME 3, -ADJ-Aq~) Ameq
(MAS,FEM,NEU,ME2,ME3) , a_qnd (NOM,GEN,DAT,ACC,~E2,ME3)
Feature operators are underlined. For example, test is true when the features of the first argument meet the condition specified by the second argument, me__qq is true whenever at least one of the listed features agree in both syntactic variables specilied, co~ copies the features ol the syntactic variable specified. 3.4 Pharmacolog~y The natural language level is supposed to serve a variety o£ application areas, we postulate that these application areas are all served
by
the
explainable only
in
the
in
same
natural
language
grammar
since
terms of set theory. Consequently, vocabulary
each ~ust De
these areas Giffer
they assign to the object types. Level 3 is
reached from level 2 simply by introducing names, and relating the object types. ~elow a few typical examples of assignment in the area of pharmacology. proper names
medications,
attributes
e.g. ~hebacon, Morphium, CIBA, Angina pectoris properties
references
e.g. Tablette, rezeptp~lichtig e.g. Indikation and Kontraindikation
references to measures
substances,
companies,
them to
are given
ailments,
(from ailment to
medication), Hersteller (from company to medication) e.g. Preis, Dosis, HaltbarKeit
numbers or measures
e.g. 5 DM, 2 %~abletten/i~ag, ~ ~oc~len
sentences
e.g. ~elche Preise haben Praeparate, die bei Angina Pectoris indiziert sind und deren Kont~aindiKation nicht Glaukom ist?
t96
3.5 T r a n s l a t i o n s ~he
path between aa3acent nodes
(3) and
(4)). ~e Shall briefly
natural t~ree
and
set
language.
traditional
code generation.
phases:
is traversed by translation
illustrate
(sec.2.3,
this for t~e passage between
In this case translation consists of t~e lexical
analysis,
syntactic analysis ano
The sentence
"~elche Firmen sind Herstelier
tablettenfoermiger
Medikamente?"
shall serve as an example. Lexical a n a l z s ! s Lexical
analysis
natural
language
exceptions,
includes the mapping level,
proceeds
and
for
from the p h a r m a c o l o g i c a l
each word encountered,
in three steps:
(i)
reduction of a word to its word stem;
(ii)
d i c t i o n a r y lookup resulting some
to the
with a few
of its features,
in a syntactical
variable,
and s m o r p h e m i c class,
level name for the word. (iii) a s s i g n m e n t of further
features
values of
as well as the set
on the basis of the m o r p h e m i c
class and the actual m o r p h e m i c ending.
• he lexical analysis of the entire word
Isyn.~
Ivar Welche Firmen sind Hersteller
Medika-
results
features
I
Q~ ME RE RE NE ME ME ME
tablettenfoermiger
sentence
in
]int.name
I +MAS+FEM+NEU -~OM+NOM+ACC FEM-NUM+NOM+GEN+DAT+ACC +MAS+NUM+NOM+DAT+ACC +MAS-NUM+NOM+GE~+ACC +MAS+NUM+NOM+AYT+STR+ADJ +FEM+NUM+GEN+DAT+ATT+STR+ADJ +f~AS+FEM+NEU-NUM+GEN+ATT+STR+ADJ +NEO-NUM+NOM+GEN+ACC
DB M26 R23
~9 M22
mente
Note the combinations lexical "Firmen',
syntactic ambiguities due to the d i f f e r e n t feature for "Hersteller" and "tablettenfoermiger'. Note also that
analysis all
by
four
itself cannot always determine cases are still possible),
"tabletten~oermiger') °
the case
(as for
or the gender
(as for
197
Syntactic
analzs!s
Syntactic analysis includes three phases: feature analysis (level b)), final code
reduction (level manipulation. For
a)), each
production applied, reduction and feature analysis follow each other immediately. Hence a production is applied in three steps: (i) Matching of input string and right-hand side. (ii) Test of right-hand features for acceptance. (iii) If true, reduction to left-hand side and assignment of features. For example, the production and feature program from sec.3.3 result in the following when applied to the phrase "tabiettenfoermiger Medikamente": ME2 ('tablettenfoermig'): I) +MAS+NOM+NOM+AT~+ADJ 2) +FEM+N~M+GEN+DAT+AT~+ADJ
(rejectea on m eq) (rejected on me_~q)
3) +MAS+FEM+NEO-NOM+GEN+AT~'+ADJ ME3 ('Medikamente') I) +NEH-NUM+NOM+GEN+ACC ME1 (result): i) +NEU+GEN-NOM-ADJ-ATT (note the disambiguation) The syntactic
analysis of the entire sentence
is illustrated
in figure
3. Because of the possibility of ambiguities the result is a parsing graph rather than a tree (in this case the ambiguity of the sentence is due to "Hersteiler'). The numbers adjacent to the syntactic variables refer to an associated list of features. Final code manipulation is left to the final stages of code generation, but must be considered part of the syntactic analysis because without it context-sensitive or transformational rules could not be avoided. ~o~e_g~neration Whenever a production is applied, a semantic action associated with it generates a functional set expression. Its arguments point to other such expressions unless they are individuals. Example: (tablettenfoermiger
Medikamente)
/ Mw (Mg) (tablettenfoermig)
MW (M221 (Medikament)
A,18
SA,19
~
M[,
14
ND HERSIELL
Figure 3
~\
Ip 9 RE, 8
ll
M£,I ~
ABL['r:[
-
~DIKAHEN
ME, 5 ME, ~ M~N [, N[o 2
?*. I
CO
199
WELCHE FIRHEN SIND HERSTELLER TABLZTTENFOERI41GER HEDIKAHENTE ?
02300047 I0000001 15000000 01100033 04000032 16000000 15000000 01100025 14100025 15000000 15000000 15000000 16000000 15000000 15000000 16000000 15000000 16000000 16000000 16000000 26000000
15000000 01100025 140000C5 15000000 16000000 01200001 10000001 15000000 01100045 01200040 01100C30 05000027 01200044 01100033 04000033 01100033 04000026 16000000 16000000 16000000 00000000
DB X1 t ~ M26 ) ( AA ~'T (22) ( ( ( ) ( ( ) ( ) ) ) E~IRBE
Figure 4
( AA ~'T ( 5 ) ( ) £ XI ( MV* VG* £N R23 MD ~H ~2Z MW H22 ) ) ) ........
200
On c o m p l e t i o n of the parse, syntactic
the pointer
variable SA is transformed
must be s u b m i t t e d
to a further
string m a n i p u l a t i o n
(i) C o m p l e t i o n of the syntactic
to the
This string
for two reasons.
analysis.
Quantifiers
do
not yet appear
them
is
subject
there
structure c o r r e s p o n a i n g
into a linear string.
to
a
in front of the expression.
~oving
number of rules that govern their
sequence. (2) O p t i m i z a t i o n . In many cases q u a n t i f i e r s can
The
cooe resulting
the p r i n t o u t Reverse Set
e.g. DB by
from translation o~ tne sentence adore is shown in
in figure
4.
translation
level names may
level
(whose e v a l u a t i o n may be time-consuming)
be replaced Oy stanaard set or relation operators,
immediately be translated
simply by again
conditions result.
(empty
invoking the dictionary.
sets)
into the p h a r m a c e u t i c a l However,
under certain
set e x p r e s s i o n s may themselves De part of a
This requires a translation
Examples: Vg(RI2, I14)
-~ Heiimittel
Mw(M9)
-~ t a b l e t t e n ~ o e r m i g
I2
-~ Verophen
into both level
2 and level
3.
fuer Psychosen
4 Semantic p_~rimitives as a basis 4.1 M o t i v a t i o n In
order
whether
to stuuy the a G e q u a c y of the rules o~ cn.2 anQ to d e t e r m i n e
they must be ~urther
of c o n s t r u c t i n g
systems,
refined or augmenteQ
to examine existing
in
t~e form of layers. One of the olQest
it
was
[Wo
not
conceived
that way)
it is help£ul,
systems of this ~ind
68,
~o
73]. Like the set theoretic approach,
of
objects
previous
approach,
is
taken.
but
the semantics data bases.
~oods"
universe
and i n t e r r e l a t i o n s h i p s between them. UnliKe
these are not c o l l e c t e d
treated as p r o p o s i t i o n s
This
(t~ougn
is Woods" q u e s t i o n - a n s w e r i n g machine
composed relations
snort
systems that are arrangeG
into m a t h e m a t i c a l
is the
sets and
to which a p r o c e d u r a l approach
is p r o b a b l y due to an o r i e n t a t i o n towards explaining of
natural language rather
than m a n i p u l a t i n g concrete
201
4.2 Semantic
Primitives
~bie~t_t~P~ O
Elementary
Fn
n-ary functions (n>l), e.g. departure x2). I~hese need not be functions function
objects,
may
yield
it is defined
Rn
e.g. Boston,
AA-57,
function
officer(x,O) = a 1 officer(x,al) = a 2
(end)
officer (X,an)
8:~0 a.m.
time (of flight x I for place in the strict sense. If a
more than one value
as a successor
(start)
(e.g. officer
of a ship)
such that
= E~D
n-ary
relation
arrive
(flight x I goes to place x2).
Designators
DC-9,
(predicate)
(n)l), e.g.
3et
(flight x I is a jet),
are either names of elementary objects or of ti~e form
Fn(Xl,...,xn) where x i is a (AA-57, Boston) for 8:00 a.m.
designator;
e.g.
departure
Propositions Rn(Xl,...,Xn) where x i is a designator; (AA-57), place (Boston), arrive (AA-57, Chicago). B
time
e.g. jet
Truth values
Example: (from
A
set of semantic
primitives
for the flight
schedules
[~o 68]):
Primitive
Predicates
CONNECT (Xl, X2, X3) DEPART (Xl, X2) ARRIVE
(XI, X2)
DAY (XI, X2, X3) IN (XI, X2) SERVCLASS (XI, X2) MEALSBRV
(XI,X2)
Flight X1 goes from place X2 to place X3 Flight X1 leaves place X2 Flight X1 goes to place X2 Flight X1 leaves place X2 on day X3 Airport X1 is in city X2 Flight X1 has service of class X2
JET (XI) DAY (XI) TIME (XI)
Flight X1 has type X2 meal service Flight X1 is a jet X1 is a day of the week (e.g.Monday) Xl is a time (e.g. 4:00 p.m.)
FLIGHT (Xl) AIRLINE (XI)
X1 is a flight (e.g. AA-57) X1 is an airline (e.g.American)
AIRPORT
X1 is an airport
(XI)
(e.g. JFK)
table
202
CIT~
(Xl)
Xl is a city
(e.g. Boston)
PLACE
(XI)
X1 is an airport or a city
PLANE
(XI)
X1 is a type of plane
CLASS
(XI)
X1 is a class of service
AND
S1 and $2
(SI, S2)
(e.g. DC-3) (e.g. £irst-class)
] |
Sl or $2 Sl is false
OR (Sl, S2) NO~ (Sl) IF~SE~ (Sl, s2)
~ |
(where S1 and $2 are propositions)
!
!
if Sl then S 2 J
Primitive F u n c t i o n s DTIME
(Xlo X2)
the d e p a r t u r e
ATIME
(XI, X2)
the arrival
NUMSTOPS
(XI,X2,X3)
time of Zlignt x1 from place X2
time of flight X1 in place X2
the number o£ stops of flight X1 between place X2 and place X3 the airline which o p e r a t e s flight X1
EQUIP FARE
(XI)
the type of plane of flight X1
(XI,X2,X3,X4)
the cost o£ an X3 type ticket from place X1 to place X2 with service of class X4
(e.g. the cost
o£ a one-way ticket from Boston to Chicago with first-class
service)
Qperators To
every
function
(procedure)
and relation there exists a p r o g r a m ~ e ~
which
subroutine
~ e t e r m i n e s a value of a £unction or the truth o£ a
proposition. Examples JET
(procedure names are capitalizeu) :
(AA-57)
-9
true
ARRIVE
(AA-57,Chicago)
-9
ARRIVE
(AA-57, boston)
-9
~alse
-9
8:~@ a.m.
D~II~
(AA-57, boston)
~nereas
the
specific terms
abstract
operators,
of
supplied
both by
the
microprograma~ing, adjusting
true
machine the
of cn.3 was Rased on object types Out
abstract machine
object and operator user
in this case
types. Specific
is define~
in
instances must be
for both of them. However, with the auvent of
computer
scientists
should have little p r o b l e m s
in
to this kind o£ notion.
Control m e c h a n i s m As
in
the
notation~
preceeing
e.g.
example,
p r o g r a m s are expresseo
in £unctional
203
TEST(CONNECT would
(AA-57, ~OSTON, C~ICAGO))
stand
for
"Does
AA-57 go £rom 5oston to Chicago?".
Likewise,
queries of any appreciable degree of complexity are based on the notion of bounded quantifier as a representative for loops. The £ormat for a quantified expression
is
FOR
/:; where a type of quantifier (EACH,EVERY,SOME,THE,
nMANY).
a bound variable. class of objects over which quantification is to range. The specification is performed by special enumeration functions, e.g. SEQ,DATALINE,NUMBER,AVERAGE. Besides enumeration these functions may perform searches or computations.
restriction on the range
~ may both be quantified
scope
; expressions.
Unlike
KAIFAS
automatically
where the result of the evaluation of an expression retranslated
and
displayeG,
this
is
must be explicitly
requesteG by commands such as TEST (test trut~l o£ a proposition), PRINTOOT (print the representation for a ~esignator). Examples: (FOR EVERY X1 / (S£Q T~PECS):T;
(PRiNTOOT
(XI))
prints the sample numbers for all the lunar samples which are o£ type C rocks, i.e. breccias (T stands for "true"). (TEST (FOR 3~ MANY X1 / (SEQ FLIGHT):JET(XI); "Do 30 jet flights leave Boston?"
DEPART
(XI,~OSTON)))
4.3 Natural language As a general rule, the introductory remarks to sec.3.3 apply here as well: The level of the "English-like" query language provided on level 2 is influenced by t~%e range of expressions possible on the previously discussed
level i. In contrast to KAIFAS,
inspection of the data base
is not limited to the evaluation of level 1 expressions but may take place during translation from level 2 into level i, too. The semantic actions associated with a rule of grammar impose further restrictions, e.g. they make sure that the first argument of CONNEC~ is inaeed an instance of the class FLIGR~.
204
This
is
illustrated
syntactic analysis
by
the
£ollowing
example.
is p e r f o r m e d and a phrase marker
In a first step a is derived,
e.g.
NP
1 I M-57
NPR
/%
/\ 1
Since
verbs
in
~nglish
I~
,o
correspond
rougniy to p~eaicates, an~ noun
phrases are used to denote
the a r g u m e n t s of the predicate,
the
be
phrase
predicate. is
marker
will
In the example,
necessary
that
the
the
primary
factor
the p r e d i c a t e will be CONNECT.
subject
be
a
flight
the verb in
in d e t e r m i n i n g
and
that
the
For this it there
be
prepositional phrases whose objets are places representing origin (from) and d e s t i n a t i o n (to). The g r a m m a t i c a l relations among elements of a phrase marker
are defined by partial
GI:
S
/\ NP
G2;
S
t V 1 (2)
subjecl-verb
G3;
e.g.
S
i VP
VP
(I)
tree structures,
I t
VP
/ \ V 1
NP
i
{ I)
t2)
vetb-obj ect
/P\ PREP
NP
(| )
{ Z)
Pfeposffion-objec! modifying o VP
Among
the
phrase
three
n~arker,
structures,
v~hich of these
G1
and
G3 ootn match subt[ees
In the
is a c c e p t a b l e depends on the a~ditional
rules, e.g~ (GI:FLIGHT(1) ana(2) = fly). ((i) and (2) are p o s i t i o n a l v a r i a b l e s This rule o b v i o u s l y example,
the
is satisfied.
topmost
S-node
= to and PLACE((2))) ==>
tree structure).
More co~nplex rules are possible;
of the phrase marker
rule I-(GI:FLIGd%((1)) and (2) = fly) and 2-(G3: (i) = ~rom an~ PLACE ((2))) and 3-(G3:(I)
in the partial
CONNECT(I-I,2-2,3-2)
for
is matched by the
205
4.4 Air!ine 9 u i d e ~he system under discussion was first applied to a flignt seneQules table. TO illustrate the application interface, a few examples of queries shall be g i v e n below Does A m e r i c a n
(from
[Wo 68]).
Airlines
have
a
flight
departure
time
from
which
goes
from
~oston to
Chicago? ~hat is
the
Boston of every A m e r i c a n A i r l i n e s
flight that goes from Boston to Chicago? What A m e r i c a n
Airlines
flights
arrive
in Chicago from Boston before
1:8~ p.m.? Bow many
airlines
have
more
than
3 flights that go from Boston to
Chi=ago?
4.5 Lug~{ geology More
recently
the
system
evaluate the chemical that
accumulating
was
has
been
applied to access, compare ana
analysis data on lunar rock and soil composition as
a
result of the Apollo m i s s i o n s
[~o ?3].
Examples: What is the average c o n c e n t r a t i o n of aluminum in high alkali
rocks?
Give me all analyses of SI~046! How many breccias contain olivine? Do any samples have greater
than 13 percent aluminum?
What is the average model c o n c e n t r a t i o n of ilmenite
in type A rocks?
4~6 Critique (i) The
possibility
during
of
translation
confusion. related
Since,
to
inspecting the data base both on level 1 and from
definition,
reference
to
practical
repercussions:
necessitate control
the changes
mechanism
level 2 to level 1 introduces a note of
according data in
the
to sec.2.3, translation
base.
The
Either the
translation process
is d i r e c t l y
must
make no
lack of separation will have
certain changes on level 1 will
rules
of
grammar, or parts of the
for level 1 must be duplicated
for translation
purposes. (2) In
Wooas"
system
the
subroutines
their arguments are of the proper whether
AA-57
kind
do not appear to verlfy that (e.g. ARRIV~ Goes not c~eck
is indeed a flight or Chicago a place),
since this
206
is
done
on
translation~
then p r i m i t i v e These
interdependencies
the
parlance
corresponding arguments.
of to
relationships
this
those
structures unary
for
circumvents
predicates macnines
axioms
t~is
types that
or
must
restrict
accoun~ by
or
in
ranges
oi
machine
ana
not only
for
(~ote that
only
1
categories
of a D s t r a c t
as well.
problem
to level
to each oLner.
by a set oi axioms, Dy
tt~e c o n c e p t s
abstract
but
(correctly)
are related
may be e x p r e s s e d
data
between
terms
left
and functions
As a consequence,
primitive machine
If one
predicates
the KAl~AS
prescribing
all
operators.) (3) O p e r a t o r s albeit
(subroutines)
in
a
one-to-one
requirements are
met
governing
it
corresponding
5 Relational
ana
objects
fashion.
are
In order
the r e l a t i o n s h i p
suffices
to
procedure
as two
treat
interdependent to make
between
a predicate
instances
as well,
sure
that the
abstract
machines
or
function
o£ the same
and
its
resource.
model
5.1 M o t i v a t i o n One
oi
the
relational well
to
users
an
to
iormatte~
A
certain
reade r ' s
are
abstract
unlverse
same way:
field
names a
uniquely
a sequence
or,
as is
by
supposes
oi £ielGs are
ordered
a key,
i.e.
his
structures.
of entries
t~ey an
is Coua's
particularly
CoQ~
that may be named.
identified
Oases
itsel~
of table-liKe
ol a number
entry
is a relation
to Qata
lenas
machlnes.
in terms
the
formally,
a table
by
consists
or
More
consequently,
approaches
72, ~e 74] which
a table
exactly
headings
but
their
speaking, in
particular
alscusseQ
interpretation
attributes. named
widely
[Co 7G,Co
explain
Intuitively certain
most mooel
t~at
are
orGerea
called n-tuple ~ntries
on
here, and,
are not
the contents
ol
fields. familiarity part.
Only
with
the
relational
its i n t e r p r e t a t i o n
model
by a m a c h i n e
here.
5.2 R e l a t i o n a l
algebra
Qbie~t & A
attributes
Kn
relations
naming
a set of ob3ects
(domain)
is assumea
on the
will be e x a m i n e d
207
R n (AI,A2,...,A n) S A 1 x A 2 x ... x A n Example: S U P P L I £ R (SUPPLIERNR, ~AME, LOC), K E Y = S O P P L I E R N R SUPPLIER:
SUPPLIERNR
NAME
LOC
1
Jones
New York
2
Smith
Chicago
3
Connors
Boston
4
~hompson
New York
Key
attributes are indicatee;
anQ
other
Keys may be composite.
Hierarcnicai
relationships are usually eliminateo ~y normalization.
~ence all relations can be assumea to be normalizea. Tn
~
R n n-tuple.
Operators 9tand~d Rnl Q
[We 74] rela~ign o p e r a t o r s
Rn2 -9
Knl+n 2
Direct Product: {(Tnl~Tn2) JTnl E Rnl^Tn2 e R n 2 ) (~ C o n c a t e n a t i o n operator) } attributes
Rnu Rn
-~ R n
Union
R n ~ Rn Rn - Rn
-9 R n -~ R n
In t ~ r s e c t i o n l must be Di£~E~ence "compatible"
Special o p e r a t o r s Rn[A]
-9 R m
Projection:
Kelation R n restricteo
to the
attributes A={AI,...,Am}. Rnl [AQ~]Rn2-~ Rnl+n2Join: { ( T n l ~ T n 2 ) JTnl E Rnl ^ Tn2 ~ Rn2 ^ Tnl [A]~Tn2 where A,~ sets of attributes, @ one oi (Slight modifications, R n [A@B] -9 R n
Restriction:
e.g. natural
R n [A÷~]R n ->R m
~iv~sion:
[Co 71], p.74.
{=,9,<,&,>,l}.
join, are possible).
{~nJTng R n ^ Tn[A]@Tn[B] }
where A,B,O as above.
[B]}
208
~o£tio ! ~e£h~n!s ~ Since are
all
operators
formed
by n e s t e ~
i~elational nave
by linear
For
5.3 R e l a t i o n a l
calculus
In
relation
place
oi
reduced
in
the
for
Individual
an e x a m p l e
algebra
see
Co~G
relational
infix
operators,
and
sec.
operands
"programs" rather
than
5.3.
proposes
an~
an a p p l i e u
proceeds
calculus
relation
constants,
constants,
Tuple
variables,
(attributes
to
show
preQicate tnat
(alpha-expression)
algebraic
may
any be
expression.
are
a I, a 2, a 3,
...
i,
.......
indexeu
2, 3, per
4,
relation
insteau
ot namee)
r I, r 2, r 3, ......
constants,
monodic, dyadic,
Logical
as
operators
the c a l c u l u s :
Index
Predicate
o£
calculus),
to an e q u i v a l e n t
Alphabet
defined
(ALPHA)
(relational
expression
been
sequences
expressions.
calculus
al~e£r~)
symbols,
PI,
P2,
P3, .... ;
=,~,<,~,>,~
3, V , A , v ,
Delimiters. Simple
alpha-expressions
nave
(t I, t2, .... , tK) : w where - w a well-fo[meu -
formula,
terms
consisting
non-indexed
tuple
variable,
set
of
is p r e c i s e l y
tuple the
~xample:
Alpna-expresslon
suppliers
each
o£ W h O m
variables set of
indexeQ
occurring
in
free
ior
supplies
of an
variables
"~ino all
the
] P3r3((rl[l]=r311])
reduction
to r e l a t i o n
tl,
name
projects":
S1 = R1 S2 = R2 S3 = R3
s=sI®s2®
3
T 3 = S[I=6]~S~8=4~ T 2 = '1'3
[1,2,3,4,~]
TI = T2
[(4,5)÷(1,2)]S 2
A (r313]=r2[l]))
algebra:
or .o,
tk
in w.
r2{3]):
Plrl^~P2r2 After
form
t i distinct
- the
(rl[2],
t~e
and
location
oi all
209
= TI[2,3 ] ALPHA
is
a
appealing
language
to the user
may be r e f o r m u l a t e d I~ANGE S U P P L I E R RANGE
PROJECT
RANGE
SUPPLY
G~T ~
in A L P H A
SUPPLIER PROJECT SUPPLY
~or
((L.SUPPLIEk~=K.SUPPLI~R~k)
(order of q u a n t i f i e r s
similar do
of
tnis
to
= K.SUPPLIERNR) A (K.PROJNR
a
have
kind
is SQOARE
= P.PROJ~R)
each
such
statements
found
columns
However,
of a table
formal
looking the
been
shown
une
to oe
the view o£ [elatlens ~y ALPHA: for a value
one row after
examine
have
training,
wnich has been
from t~at offerea
to inspecting
value
of given
3 an~ 4 languages
[bo ?4]
calculus.
or columns
(as opposed
in cns.
to rely on a user's
is d i f f e r e n t
column
elements
to the ones
not
the relational
of values
SQUARE
A (~.PiO0~R=P.P~OONk))
be m a i n t a i n e d ! ) ,
L.LOC):
Dy SQOAR~
(ii)For
must
L
that
(i) Scan
as
levels
reducible offered
more
ine example
P ALL
reasons
language
is slightly
above,
L.LOC):
(L.SOPPLIERNR
devised
that
shown
K SOME
GET W (L.NAME,
5.4 Higher
form
K
or, e q u i v a l e n t l y
RANGE
expressions
F
(VP) (~K)
RANGE
alpha
L
(L.~AME,
RANGE
for
than the p r e d i c a t e
or a set
another).
corresponning
row anG
in this row.
are of a form suc~ as
("aisjunctive
mapping")
bRA(S) (read: is
a
"find B of R where A is S") relation,
respectively), Other
forms,
a similar
A S
e.g.
and is
an
B
that defines
a mapping
are sets of a t t r i b u t e s argument
for projection,
that may
conjunctive
itself
(domain
such
be an expression.
and n-ary mappings,
appearance.
Example : ~iA~iggMP DEPI' ( "TOY ") stanGs
for
"FinQ
the names
of e m p l o y e e s
that R
and range,
in ti~e toy aepartment".
nave
210
~ore a
recently
attempts
relational
[Co
~4].
ehs.3
data
%he a p p r o a c h
and
nave
base
4 in that
been
reporteo
that allow
system
in a ~ialog
~oun~eQ
~ii~ers
drastically
from
a truly
two-way
a user
to engage
on natural
~ngiisn
t~e ones ~ i s c u s s e o
communication
in
is envisioned.
5.5 Comment It
has
been
relational
shown
algebra,
expressible SQUARE
tnat botn
in
i.e.
ALPHA
are t h e m s e l v e s
any query
and
equivalent.
on tne s u c c e s s i o n
equivalence~
the
definition
~rom
the point
given
ss
relational
increasing notion
o~ user
level).
expressible
Equivalence
in relation
of the h i e r a r c h y ALPHA
indicates
does
- SQOAR£
that
ALPHA
is and
relation.
not preclude
by r e s t r i c t i o n
a hierarchy
to the
algebra
hence
is a s y m m e t r i c
machines
sophistication -
are e q u i v a l e n t
and vice versa,
of abstract
algebra This
of h i e r a r c n y
and SQUARh
in S Q U A ~ ,
The c o n d i t i o n does.
ALPHA
however still
De
(in the e i r e c t i o n
of
~urtner
coul~
refinement
on the
is necessary.
6 Conclusions There and
are
some
striking
similaritzes
between
the examples
o£ cns.3,4
5:
- In each - All
the lowest
rely
on
level
has been well
quantification
as
a
£ormalizeu. means
for
building
complex
expressions. -
All
- All
tend
towards
three
systems
On the other a
less
natural
hand,
formal
Experiences
have been only
but
indicate
~nile
a
objectives between has
been
that
successive
perhaps
in the belore.
translations, raise
o£
so far
system
Rave
(cn.3)
to provide level.
situations,
as well.
at the very
least
they
meet
the
languages
coulo
0£ course,
the r e l a t i o n s h i p
nigher
techniques
the e f f i c i e n c y
attempteo
to De made much more precise,
Furthermore, ane
some application.
on an i n t e r m e d i a t e
proof,
user
introduction. will
found
levels°
in some w e l l - d e f i n e d
do noc c o n s t i t u t e
levels
inoicateo
(ch.5)
the KAIFAS
higher
and
language
at least
nierarcnies
mentioned
o~ s u c c e s s i v e and
with
~ew e x a m p l e s
suggest
stylized
that,
on their
implemented
one of them
still
this may be n e c e s s a r y
Qo
language
of nigher
must
levels
imply
be e x p l o r e d
levels.
~inally,
did not attend to the critical q u e s t i o n what form take; this a p p e a r s to be a largely unsolved problem.
as
a number
to measure tne paper
the root should
211
Acknowiedgement~ The reading the manuscript
author is grateful to G.Goos and making helpful suggestions.
for carefully
Re£erences [Ab 74]
J.R.Abrial,
[BO 74]
R.F.~oyce, D.D.Chamberlin, W.F.King, M.M.Hammer, Specifying Queries as Relational Expressions, in [KI 74], 169-176
[Bu 72]
Burroughs Corp., Language (ESPOL),
[Co 76]
E.F.Codd, A Relational Model for Large Snared Data BanKs, Comm.ACM 13(197~), No.6, 377-387
red 72]
E.F.Coad, Relational Completeness of Data base Sublanguages, in: ~.Rustin (ed), Data Base Systems, Courant Computer
Data Semantics,
in [KI 74], 1-59
B6700/77~ Information
Science Symp.,
Executive System Programming Manual, 1972
Prentice-Hall,
Inc. 1972, 65-98
red 74]
E.F.Coea, Seven Steps to Rendezvous in [KI 74], 179-199
with the Casual 0ser,
[Col 68]
L.S.Coles, An Online Question-Answering System with Natural Language and Pictorial Input, Proc. 23rd Natl. ACM Conf. (1968), 1.69-181
[Go 73]
G.Goos, ~ierarchies, in F.L.Bauer (ed), Advancea Course on Software Engineering, Lecture Notes in Econ. and Math. Systems, vol.81, 29-46
|Gr 69]
C.C.Green, The Question-Answering Univ. 1969
[~i 74]
J.W.Klimoie, Nortn-Hollana
|Kr 75]
K.D.Kraegelo~, P.C.Loc~emann, Bierarcnies o£ Data Languages: An Example, Information Systems (in print)
[Su 74]
B.Sundgren, Conceptual Foundation of Approach to Data Bases, in |KI 74], 61-94
[SI 74]
ACM SIGPLAN Symposium on Very High Level Languages, 1974, ACM, New York 1974
Application o£ ~neorem Proving to Systems, Tech. Rep. ~o. CS138, Stanford
K.L.Koffeman (eds), Publ. Co. 1974
Data
Base
the
Management,
Base
In£ological
March
212
[i~e 74]
H.WedeKino, Data Base Systems I, ~I-~issenscna£tsverlag~ Reine Informatik, vol.16, 1974 (in German)
[Hi 68]
N.Wirth0 Computers,
PL3~6, A Programming Language Journ.ACM 15(1968), No.l, 37-74
[wo 68]
~.A.WOOdS~ Machine, 457-471
Proce0ural Semantics £or a Question-Answering Proc. AFIPS Fall Joint Coff!p.ton~l 33(1966),
[No 73]
WoA.~oo~s~ Progress in Natural Application to Lunar Geology, 42(1973)~ 441-450
£or
tne
36~
Language 0nde[stan~lng - An Proc. AFIPS ~ati.Comp.uon£.
Ein System zur interaktiven Bearbeitung umfangreicher Me~daten Ulrich Schauer,
IBM Deutschland GmbH, Wiss. Zentrum Heidelberg
Zusammenfassung Bei der Bearbeitung von Megdaten mu~ man unterscheiden zwischen einer Standardauswertung der Messungen, bei der eine bestimmte Modellvorstellung zugrunde liegt und einer Analyse mit dem Ziel, logische Zusammenhange zu erkennen und ein erkl~rendes Modell zu finden. W~hrend die Standardauswertung durchaus im Stapelbetrieb ablaufen kann mit einem Datenmodell,
das abgestimmt ist auf die im Modell ablesbaren Verknfipfungs-
m6glichkeiten,
ist ffir die Analyse ein interaktives System
wfinschens-
wert mit einem Datenmodell, das beliebige Verknfipfungen erm6glicht und mit einer Datenmanipulationssprache,
die mSglichst deskriptiv sein soll-
re, aber komplexe Auswahlkriterien erlaubt. Verf~gbare Systeme werden den Anforderungen der Analyse nur teilweise gerecht, meist mangelt es der Datenmanipulationssprache
an F~higkeiten zur rechnerischen Datenbe-
arbeitung. Im folgenden wird ein experimentelles System ffir die Bearbeitung von Megdaten beschrieben,
an dem im Wissenschaftlichen Zentrum der IBM in Hei-
delberg gearbeitet wird.
t.
EINFOHRUNG
Umfangreiche Sammlungen yon Megdaten k6nnen erst in vollem Mage nutzbar gemacht werden, wenn die f~r die Analyse zust~ndigen Fachleute Wissenschaftler,
(z. B.
Techniker - meist ohne groge Programmiererfahrung)
in die Lage versetzt werden, ohne Zuhilfenahme von Programmierern selbst die Bearbeitung vorzunehmen. Dazu ist ein interaktives System erforderlich, das erlaubt, Teilmengen der Daten unter komplexen Auswahlkriterien zu bilden und in vorhandene oder neu zu schreibende Bearbeitungsprogramme zu stecken und die Ergebnisse tabellarisch oder graphisch darzustellen.
214
Schon bei den Auswahlkriterien k6nnen recht verwickelte Berechnungen anfallen, die z w e c k m ~ i g
mit Bausteinen aus einer Programmbibliothek
durchgeffihrt werdeno Anpassung des Systems an bestimmte Fachgebiete ist damit m6glich durch Anpassung der zugrundeliegenden Programmbibliothek. Da nur eine begrenzte Anzahl yon vorgefertigten Programmen zur Verffigung stehen kann~ wird h~ufig noch Datenmanipulation durch eine Tr~gersprache (host language) notwendig sein. Als Tr~gersprache ist APL ffir die angestrebte Zielsetzung besonders geeignet durch ein hohes Mag an Interaktivit~% durch Anpassungsf~higkeit
an die Programmiererfahrung des Ben~tzers
und eine Vielzahl yon Operationen zur Datenmanipulation. Figur ! vermittelt einen 0berblick fiber den Systemaufbau. DatenManagementsystem
........ IInformationsSystem
DatenManipulations System
Interaktive Tr~gersprache
(APE) FIGUR I:
System-Aufbau
Die Datenbank enth~it sowohl Problemdaten als auch beschreibende Dateno Programmbibliothek steht symbolisch f~r eine Sammlung von Programmen, die in PL/I, FORTRAN oder Assembler geschrieben sein k6nnen und die von APL aus mit Daten aus dem APL-Arbeitsspeicher oder der Datenbank angestogen werden k~nnen und ihre Ergebnisse wieder im APL-Arbeitsspeicher abliefern. Die Benfitzer-Kommunikation erfolgt mit APL oder mit einem der in APL eingebetteten Systeme zur Manipulation yon Megdaten, Pro-
215
grammen und zugeh~riger Dokumentation. Als Benftzerstation
(Terminal)
kommen in erster Linie Bildschirm und Schreibmaschine in Frage. Einen 0berblick fiber die Datenkomponenten,
die vom System zu verwalten
sind, gibt Figur 2. Katalogbearbeitung beschreibende Daten
ProblemDaten
<:~ ,,
,~ t
Programme
Megdatenbearbeitung FIGUR 2:
Datenkomponenten
Das System mug drei in Wechselbeziehung stehende Klassen von Daten verwalten: a) Algorithmen
(Programmbibliothek)
b) Problemdaten (Me~daten) c) Beschreibende Daten (Katalog der Daten und der Programme) Im allgemeinen wird der Benftzer bei einem konkreten Bearbeitungsfall erst anhand der Kataloginformation feststellen, aus welchen Datenaggregaten (Tabellen) seine Problemdaten auszuw~hlen sind und welche Programme bei der Bearbeitung eingesetzt werden k6nnen, und erst dann die vollst~ndige Probleml6sung festlegen mit Hilfe des Datenmanipulationssystems.
2.
0BERBLICK 0BER DIE SYSTEMKOMPONENTEN
2.1
Algorithmen
Die verf~gbaren Hilfsmittel
zur Bearbeitung der Megdaten und zur Darstel-
lung von Resultaten und Zusammenh~ngen zwischen den Daten lassen sich in drei Klassen einteilen: a) Arithmetische und logische Operationen zum Ausdr@cken yon logischen Beziehungen (z. B. a > 5) und zur Datenmanipulation
(z. B.
x ÷ y ÷ z-tOO) ffir numerische und abgesehen yon arithmetischen
216
Operationen auch ffir nicht numerische Daten. Die Verwendung yon APL als Tr~gersprache erlaubt insbesondere auch bequeme Manipulation yon Rechtecksstrukturen yon numerischen und yon Textdaten (Vektoren~ Matrizen). b) Unterprogramme
zur L6sung von standardisierten Problemen aus Ge-
bieten wie Mathematik tiation) und Statistik
(z. B. numerische Integration und Differen(z. B. lineare Regression, Testverfahren,
Darstellung yon H~ufigkeitsverteilungen c) Anwendungsbezogene zeichnungen,
Standardverfahren
etc.).
(z. B. Analyse von EKG-Auf-
Klassifizierung von FingerabdrQcken etc.).
Die Tr~gersprache APL mit einer Vielzahl von verf~gbaren APL-Bibliotheksprogrammen und der M 6 g l i c h k e i ~ v o n
APL aus
graphische Darstellungen zu
initiieren, bietet schon alle M6glichkeiten zur Datenmanipulation.
Trotz-
dem sind die Klassen b) und c) notwendige Bestandteile des Systems. Die Klasse b) erlaubt Ausweichen auf FORTRAN, PL/I oder Assembler geschriebene Unterprogramme,
was besonders bei grogen Datenmengen bessere Rechen-
zeiten bringen kann. Programme der Klasse c) existieren vorwiegend in FORTRAN oder PL/I~ weil sie meistens f@r Anwendung im Stapelbereich entwickelt werden. 2.2
Problemdaten
Das System ben@tzt ein relationales Datenmodell~ die Datenbank besteht aus einer Sammlung umfangreicher Tabellen, die mit leicht verst~ndlichen Operationen manipuliert werden k~nnen (Codd /1,2,3/). Datenattribute sind den Spalten einer Tabelle fest zugeordnet wie beim SEQUEL-System (Boyce, Chamberlin /4,5/). Spezifikation von Teilmengen von Daten aus einer oder mehreren Tabellen erfolgt mit einer an Beispieleintragungen in die fraglichen Tabellen orientierten deskriptiven Sprache, die sich gleichermagen fur den Einbau von Unterprogrammaufrufen ablauf eignet
in den Programm-
(Zloof /6/).
Die Datenelemente in einer Tabellenspalte k~nnen dimensionierte Daten sein (z. B. Vektoren, die eine Me~reihe darstellen oder Matrizen, die mehrere Megreihen oder eine Funktion yon zwei Ver~nderlichen darstellen k6nnen etc.)° Die offensichtliche Mehrdeutigkeit wird duTch eine der Tabellenspalte zugeordnete Interpretierung behoben. a) Interpretierungsattribut: Regelt die Deutung einer Matrix, z.B. als Werte einer Funktion yon zwei Ver~nderlichen in den Punkten eines gleichabst~ndigen Gitters. Die Definition der Gitterpunkte
217
(x ° + i.h, Yo + j'k)
i = O, I, ..., m-1 j = O, I, ..., n-1
erfolgt durch Angabe von Xo, Yo' h, k und m, n. b) Darstellungsattribut: Erlaubt Spezifikation yon Verdichtungsmechanismen fur Datendarstellungen in Erg~nzung zu beispielsweise I, 2, 4 byte integer. c) Speicherungsattribut: Die meisten Daten werden in der XRM-Datesbank gespeichert digitalisierte
(Lorie /7/). Umfangreiche Datenelemente
(z. B.
Bilder) k6nnen jedoch auch in yon CMS (Conversational
Monitor System) verwalteten Band- oder Platten-Dateien
abgelegt
und in XRM nur durch Angabe ihres Dateinamens und einer Zugriffsroutine bekannt gemacht werden. Das System besorgt automatische Umwandlung physikalischer Einheiten und automatische Datenkonversion entsprechend Interpretierungs-, Darstellungsund Speicherungsattribut
sowie Beachtung yon durch logische Bedingungen
definierten Konsistenzregeln
bei neuen Eintragungen
oder ~nderungen in
einer Tabelle. 2.3
Beschreibende
Daten
Das System zur Manipulation der unformatierten
Kataloginformation
ist
eine selbst~ndige Komponente mit F~higkeiten fNr Generierung, Wartung und f@r rechnerunterstNtztes Auffinden der relevanten Katalogeintragungen Nber Daten und Algorithmen (Erbe, Walch /8/). Formatierte Datenbeschreibung wird in der XRM-Datenbank
gespeichert und umfaSt jeweils ein
Verzeichnis von: a) Umwandlungstabellen
f~r physikalische
Einheiten.
b) Methoden mit Programmidentifikation. c) Datenattributen mit Tabellen und Spaltenbezeichnern. Mittels b) und c) kSnnen Programme und Tabellen rasch identifiziert werden, wenn die Bezeichnung der Methode bzw. der Attribute der fraglichen Tabellenspalte bekannt sind.
3.
DIE DATENMANI~ULATIONSSPRACHE
Zun~chst sind zwei Sprachebenen vorgesehen.
218 Prgz!durale Sprachebene
3.1
Die folgenden Eigenschaften
kennzeichnen die prozedurale Datenmanipula-
tion: a) Der Datenzugriff erfolgt durch APL-Befehle (Lorie, Symonds /9/)° b) Umwandlungen zwischen der externen Datendarstellung in der XRMDatenbank und der internen Datendarstellung (z. B. Darstel!ung und Speicherung). c) Konsistenzregeln
erfolgen automatisch
werden automatisch kontrolliert bei Datenzug~ngen
oder Ver~nderungen. d) Die Daten werden tabellenweise e) Der Ben~tzer ist verantwortlich ten hinsichtlich physikalischer
oder zeilenweise verarbeitet. fur korrektes Verarbeiten der DaEinheiten und Interpretation.
Deskriptive SpFacheben ~
3.2
Die nicht prozedurale
Sprache EQBE stellt eine Erweiterung dar von QBE
(Query by Example, Zloof /6/). Sie eignet sich auch fur Ben~tzer mit geringen Kenntnissen in APL (Erfahrung im Umgang mit APL als Tischrechner gen@gt) und ohne Programmiererfahrung. Die Sprache ist in hohem Ma~e deskriptiv. Relationen und in der Programmbibliothek verf~gbare Unterprogramme werden als Tabellen dargestellt, und der Ben~tzer formuliert seine Datenauswahl, indem er entsprechende Zeileneintragungen vornimmt, die Ausgabewerte bezeichnet und Auswahlkriterien - soweit erforderlich durch APL-Statements definiert. EQBE l ~ t sich am besten anhand yon Beispielen erkl~ren. 3.3
Beispiele R
~
r
Ix
zu E~BE ist ein Schema fur eine Tabelle mit dem Namen R und
I y ~
zwei Spalten mit den Bezeichnern RI und R2.
Die Werte x~ y stellen eine Tabellenzeile
dar, r ist ein Bezeichner
diese Zeile. r, x, y werden vom Ben~tzer eingetragen in das Schema
R
IRI
fur
I R2 ~ I
a
das vom System geliefert wird, wenn man Tabelle R anfordert. Die Datenvariablen x, y k6nnen alle in R gespeicherten Tupelwerte annehmen.
{ ( x , y) I
(x, y)
!. Auswah! einer Spalte
O÷ X
e R}
(Projektion)
219
Die Angabe
eines
Zeilenbezeichners
ist als Symbol Die Abfrage Gesucht
ffir Ausgabe
ist nicht notwendig.
zu verstehen.
lautet:
ist die Menge
Eine m6gliche
der x Werte
Formulierung
{x I ~ ( x ,
y)
Selbstverst~ndlich nur auf Werte Im folgenden
aus RI.
im Pr~dikatenkalkfil
wgre
ER}
erstreckt
sich der Definitionsbereich
aus der R2-Spalte schreiben
von y
yon R.
wir daffir auch k~rzer
{x I u ( x , ) } und fassen u(x,) in R existiert, 2. Einfache
als Pr~dikat dessen
Abfrage
gersprache R
RI
R2
u
x
y
mit einschrgnkenden
formuliert
x>
auf, das wahr
erste Komponente
ist, wenn ein Tupel
gleich x ist.
Bedingungen,
die in der Trg-
werden.
,31 z
5 +yxy
(z < 25) V (z > 50)
D~x {x [~3u(x,y,z) yz
A (x > 5 + y × y)
A ((z < 25) V z > 5O) }
3. Schnittmenge
x > y z = 10 ~÷x T r g g t man i n S a n s t a t t das APL-Statement
z den konstanten
W e r t 10 e i n ,
z = 10.
oder {x ]~/9 r ( x , y ) yz
A s(x,z)
A (x > y )
A (z=lO)}
so e n t f ~ t l l t
220 4. Vereinigungsmenge
x1> y z = 10 0÷
x
{ x
] ~y u C x , y , ) A Cx> y) } L) { x
} 3z
vCx,,z)
(x
i (#. u(x,y,) A (x> y)) v ::]zzvCx,,z) A (z=1O)}
A (z=10)}
oder
S.
Differenzmenge
r
x
y
D+x {x
[ ~ r(x,y) A ~ s(,x) }
Selbstverst~ndlich muB jede Datenvariable, die in einer negierten Tupelvariable auftritt, auch in einer nicht negierten Tupelvariabfen auftreten (oder als globale Variable bekannt sein). 6. Kartesisches Produkt
R RI I
x ...... :I
r
O+
x,y,xl,z { (x,y,xl,z) I r(x,y) A s(xl,z) }
7. Equijoin (Restriktion im Kartesischen Produkt)
-
~1 ~
~ I1~1 Ix i"I ~'2"'I ~
~+x,y,z { (x,y,z) I r(x,y) A s(x,z)}
221
8. Verallgemeinerter
Join mit nachfolgender
R
RI
R2
S
$I
$2
r
x
y
s
xl
z
Projektion
x_>y B÷z
{z
I 3x x-3I -3y
r(x,y) A S(Xl,Z) A (x >- y)}
Anstelle des _> Operators k~nnte eine beliebige goolsche Funktion stehen. 9. Division R r
RI Ix
R2 I y
I
S
$I
$2
T
TI
T2
s
x
z
t
.y
z
~]+x {x I~z ¥Y6 r r(x,y)A s(x,z) A t(y,z)} .y steht fiir {y l~x ~z r(x,y) A wobei
-4
s(x,z)}
,
bedeuten soll, daI~ x fest zu w~hlen ist, und das Auf-
X
treten yon .y in t ist so zu verstehen,
dab gilt ~ Y6.Y
t(y,z)
10. Gruppierung
{x Iv v { r ( x , y ) A s(x,z)A t(y,z)} g
kann bis jetzt noch nicht formuliert werden. Man braucht ein Hilfsmittel, um AbhRngigkeit zwischen Variablen anzugeben. Mit der Vereinbarung,
daf~ y.z bedeuten soll-I ~z ' sind die entspreY chenden Eintragungen :
sis ] r
x
y
s
t
..............
.y
y.
zl
D+x Wir sind jetzt in der Lage, jede Operation der Relationenalgebra auszuf{ihren. Die Vollst~ndigkeit yon QBE in der vorgestellten erweiterten Form ist damit fiir einfache Abfragen, die nur eine Operation der Relationenalgebra
umfassen,
erwiesen.
Sie folgt auch fur beliebig zusammengesetzte Operationen: Jede Abfrage yon QBE etabliert bei ihrer Definition eine logische Datensicht, die der Resultattabelle entspricht. Erst bei Ausf~hrung eines APL-Programmes) das yon einem Abfrageprozessor
aus der logischen Datensicht erzeugt wird,
222
entsteht die Resultattabelleo
Eine neue Abfrage kann auf der iogischen
Datensicht yon schon definierten Abfragen aufgebaut werden, und damit kann eine komplexe Abfrage in Einzelschritte aufgel~st werden. 3.4
Diskussion der Erweiterungen von QBE
Die nachfolgend beschriebenen Erweiterungen erlauben die Behandlung yon recht komplexen Abfragen, wie sie bei Me~daten zu erwarten sind, ohne die Einfachheit f~r elementare Abfragen zu beeintr~chtigen. a) In einer Programmbibliothek erfa~te Algorithmen (APL-Funktionen, FORTRAN-Unterprogramme, PL/1-Prozeduren oder Assemblerroutinen) k6nnen f~r Datenauswertung oder Datenselektion innerhalb einer Abfrage eingesetzt werden. b) Beliebige APL-Befehle k6nnen innerhalb einer Abfrage zur Datenselektion und Auswertung verwendet werden. QBE erlaubt auger den Vergleichsoperationen nur eine begrenzte Anzahl eingebauter Funktionen wie COUNT, SUN etc. ¢) Die Resultattabelle einer Abfrage kann durch Angabe yon formatbeschreibenden Formularen auf verschiedenste Art dargestellt werden, auch in graphischer Form und wiederholt mit wechselnden Formularen. d) Dutch jede Abfrage wird eine logische Datensicht definiert, die zur Entkoppelung komplexer Abfragen in einer Folge von einfacheten Abfragen verwendet werden kann. e) Jede Abfrage kann zu wiederholten Malen ausgef~hrt werden. Dabei k~nnen von Mal zu Mal die Werte globaler Variablen ge~ndert werden. F@r APL-erfahrene Ben~tzer er6ffnen sich dadurch interessante Mgglichkeiten zur Datenbearbeitung mit anpassungsfghigen Bausteinen. f) Der Entkopplungseffekt von QBE, da~ die Zeileneintragungen in beliebiger Reihenfolge m6glich sind, wurde noch verst~rkt (Verwendung der Gruppierungsm6glichkeit). g) Durch die Gruppierungsm~glichkeit k~nnen auch Abfragen ohne Zerlegung in aufeinanderfolgende Schritte bearbeitet werden, die sich der Behandlung durch QBE entziehen. h) Als Gegenst@ck des ALL D-Operators (all different) von QBE dient in EQBE ein vorgesetzter Punkt, entsprechend beim ALL-Operator (alle mit Wiederholungen) ein vorgesetzter Punkt und Angabe des Tupelbezeichners in Klammern gesetzt. Eine Pseudovariable wie .y oder .x (r) kann in APL-Befehlen verwendet werden und steht stellvertretend ffir einen Bereich gleichartiger Werte.
223
4.
MESSDATENBEARBEITUNG
4.1
Das Datenbearbeitungssystem
APL ist zur interaktiven Analyse von Me~daten, die im APL-Arbeitsspeicher Platz finden, hervorragend geeignet (Schatzoff /10/). Bei gro~em Datenumfang verliert APL an Attraktivitgt, weil Datenselektion aus Tabellen dann aus Platzgr~nden nicht im APL-Stil durch eine Operation abet einen dimensionierten Bereich dargestellt werden kann, sondern nur durch eine Rekursionsvorschrift ~ber alle Tabellenzeilen. Eine prozedurale Sprachebene mit APL als Trggersprache
ist daher noch nicht voll zufriedenstel-
lend. Ein weiterer Gesichtspunkt bei Me~daten ist, da~ Messung h~ufig f@r die Zusammenfassung
von vielen Einzelwerten
steht (z. B. digitalisierte
Me~-
kurve). FUr die Bearbeitung solcher Messungen ist es w@nschenwert yon der Tr~gersprache APL aus, Programme, die in einer anderen Sprache (FORTRAN, PL/I, Assembler) Andere experimentelle
entwickelt wurden, aufrufen zu k6nnen.
Datenbanksysteme,
die APL als Trggersprache
ver-
wenden, sind meist nur ffir geringen Datenumfang konzipiert (Palermo /I]/), Klebanoff, Lochovsky, Tsichritzis /12/) und erlauben den Einsatz von Programmen,
die nicht in APL geschrieben wurden, entweder gar nicht
oder nur mit ineffizienter Datenkommunikation
(~ber externe Dateien).
Bei der in Figur 5 beschriebenen Architektur erhalten wir ein System zur Probleml~sung mit Datenbankzugriff
auf zwei Sprachebenen
(prozedural und deskriptiv)
Einsatzm~glichkeit von vorgefertigten Programmen aus einer leicht erweiterbaren Programmbibliothek (FORTRAN~ PL/] oder Assemblerprogramme) Hilfsmitteln Programme
zur Verwaltung der Dokumentation fiber Daten und
Automatischer Datenumwandlung in gew~nschte physikalische Einheiten Automatischer Datenkonversi~n, soweit durch Implementierung, Darstellung und Speicherung erforderlich Unterstfitzung graphischer Ein/Ausgabegergte Verffigbarkeit von Programmen zur graphischen Darstellung - einer Schnittstelle
f~r leichte Substitution von Ein/Ausgabeger~ten
224
VM /370 Conversational Monitor System
I CP/CMS o~andos ~ Informationssystem (Daten,Methoden)
i Nicht procedurale Sprachebene (EQBE)
Procedurale Sprachebene (DB-Service) Dateizugriff Spooling
XRM DB-System ProgrammBib lio thek (FORTRAN, Assembler, PL/I)
Schnittstelle ~ilfs'~ f@r prozessoren , Ein/Ausgabeger~te
Menutechnik etc.
Station
FIGUR 3: Systemarchitektur
]
Biid-~ schirm I
I
~a~in
225
4.2
Be , i s p i e l e
zur D a t e n b e a r b e i t u n $
Die folgenden zwei Beispiele sollen die Fghigkeiten zur Probleml~sung illustrieren.
Im ersten Beispiel wird die Verbindung mit Programmen aus
einer Programmbibliothek gezeigt, im zweiten Beispiel unter anderem die Bengtzung von globalen Variablen. 1. Welches in der Datenbank erfaBte Material hat einen mittleren Reflexionsbeiwert
.~TERIAL~PEKTREN
(zwischen 250 und 300 nm) gr6ger als 60?
~{¢TERIALNAME
REFLEXIONSSPEKTRUM
material
reflexion
AUSGABE
SIMPSONREGEL
INTEGRALWERT
integral
xl
÷
250
x2
÷
300
STARTWERT 150 NM
SCItRITTWEITE 5 NM
,,EINGABE iNTEGRAND
150
GRENZEN
reflexion
xl
x2
60
Die obigen Eintragungen in das Tabellenschema der Materialspektren und eine schematische Darstellung des Programmes SIMPSON-REGEL zusammen mit einigen APL-Befehlen definieren die Ergebnisliste der gesuchten Materialien. Kommt als zus~tzliche Bedingung hinzu, dab f~r das gesuchte Material das Predukt aus spezifischem Gewicht y[kg/dm3], spezifischer W~rme c [ cal/(grad.g) ~[cal/(cm.grad.sec)]
] und W~rmeleitf~higkeit
gr6~er als 0.5 sein mu~, so ist das obige
Schema wie folgt zu erggnzen:
226 SPEZ.
~NTERIALWERTE
GEWICHT
LEITFI\III GKE IT NAME
WXRME
c
gamna
lambda
~material
0.5 >gamma[KG-DN~3]xc[CAL.GRADxG] xlambda [CAL.CMxGRADxSEC] Bei dieser Formulierung ist die Existenz einer Eintragung in der Tabelle }~9\TERIALWERTE gesichert. Eine widersprechende Eintragung k6nnte augerdem existieren (falls t~NTERIALNAME nicht Schlfisseleigenschaft hat). Bei der folgenden Abgnderung ist entweder die zusfitzliche Bedingumg erffillt oder nicht entscheidbar Eintragung der Materialwerte
MATERIALWERTE
(weil keine
existiert):
SPEZ. ~ I E ' .... IMATERIALW)~RME 1 LEITF)~HIGKEIT INAME
i GEWICHT [gamma' '
c
]
lambda
[material
0.5 <- gamma.[KG+DM*3]xc[CAL+GRADxG]xIambda[CAL.CblxGRADxSEC] 2. Die Datenbank m6ge Aufzeichnungen schiedener Behandlungsarten
enthalten fiber die Wirkung ver-
sowie Daten fiber die behandelten Per-
sonen. Um einen ersten 0berblick
zu bekommen,
ist eine Haufigkeitstabelle
gewOnscht, die den Zusammenhang zwischen Wirkung und Behandlungsart ffir ein bestimmtes Kollektiv von Versuchspersonen (Raucher, m~nnlich,
alter als 40 und mit 0bergewicht)
BEHANDb -LUNG_ !AT IIART , NIRKUNGy
PERSON, name
wiedergibt.
227
PERSON
/
NAb~
GEWICHT
GR~SSE
b~NNLICH
ALTER
RAUCHER
a
I
h
! name
40 < a
[JAHR]
i < g [KG]÷ (h[CM]-IO0) I + x(b) J ÷ y(b) P[I;J]
÷ P[I;J]
Durch das Anh~ngen vorkommenden
von
(b) an x und y wird bewirkt,
Paare x,y ber@cksichtigt
Die globale Variable wird,
dutch P ~- 5
werden
dab alle in b
(auch Wiederholungen).
P, welche mit den Werten x(b), y(b)
muB vor Ausf[hrung
kungsstufen
+ I
der Abfrage
initialisiert
10 p O, wenn zwischen 5 Behandlungsarten unterschieden
gebildet
werden,
z. B.
und 10 Wir-
wird.
W~rden kontinuierliche
Werte
und Wirkung vorliegen,
so mfiBte noch eine
f~r Kennzeichnung
von Behandlungsart
Intervalleinteilung
vor-
gegeben werden f~r x und y, z.B.:
x < x I, x I ~ x < x2,
..., x 3 ~ x < x4, x 4 ~ x
Y < YI' Yl s x < Y2'
"''' Y8 s y < Yg' Y9 ~ y
durch Angabe
der Zahlenfolgen
IX und IY k6nnen gleichfalls Anstelle yon
Intervallnummer
[1]I
"''' Y9"
~bergeben werden.
tritt dann I + IX INDEX x(b)
Dabei ist INDEX eine APL-Funktion,
die die
feststellt:
IX INDEX X ÷ I + +/X
Durch Ab~nderung
~ IX V
der globalen Variablen
neue Resultattabelle der Abfrage.
..., x 4 und IY ÷ YI'
als globale Variable
I ÷ x(b) und J ÷ y(b)
und J ÷ IY INDEX y(b).
VI
IX ÷ Xl,
erzeugt werden,
IX, IY, P kann eine v611ig
ohne neuerliche
Kompilation
228
4.3
Einsatzm6glichkeiten
Das vorgestellte
System ist in erster Linie fur die Bearbeitung von Me~-
daten in Wissenschaft und Technik konzipiert. Sicher linden sich auch im kommerziellen Bereich Einsatzm6glichkeiten, z. B. fur interaktive Datenanalyse mit dem Ziel, Zusammenh~nge
zu erkennen, die sinnvolle Vor-
hersagen erm~glichen. ComputerunterstUtztes Entwerfen (CAD) als spezielles Anwendungsgebiet dieses Systems wird am Wissenschaftlichen Zentrum Heidelberg untersucht (Kantorowitz
/13/).
Wie Figur 4 zeigt, l ~ t
sich das System - je nach Benutzerstandpunkt
-
charakterisieren als: - Erweiterung yon APL (Datenbankzugriff, Einsatzm6glichkeit yon kompilierten Programmen, Auskunftssystem Uber Methoden, Programme und Daten), Erweiterung eines Datenbanksystems
zum Probleml8sungssystem
aktive Datenmanipulationssprache :nit Einsatzm6glichkeit Sammlung yon Programmen, Auskunftssystem ~ber Methoden,
-
und Daten), Erweiterung einer Sammlung von Programmen (Datenbankzugriff,
f~r eine Programme
zum Probleml6sungssystem
interaktive Datenmanipulationssprache,
system ~ber Methoden, Programme und Daten), - Erweiterung eines Auskunftssystems Nber Methoden, Daten zum Probleml6sungssystem (Datenbankzugriff, manipulationssprache
(Inter-
Auskunfts-
Programme und interaktive Daten-
mit Zugriff zu Programmbibliothek).
229
Funktion Datenmanagementsystem
Interaktive Tr~gersprache
Komponente XRM (Extended Relational Memory) APL
Prozedurale Sprachebene (Interface zu XRN)
APL-Funktionen
Deskriptive Sprachebene (mit Obersetzung nach APL)
EQBE (Extended Query by Example)
Programmbibliothek
Scientific Subroutine Package (Ben~tzerprogramme)
Programmdokumentation
Methodenbank (in APL implementiert)
Kommandosprache filr Programmaus f~hrung
EQBE
Interaktive graphische Datenmanipulation
GRAPHPAK
Sequentielle Dateien
CHS/APL
Schnittstelle APL - PL/I
Hilfsprozessor
Unterst~tzung von Ein/Ausgabegergten
APL-Funktionen
IGUR 4:
Komponenten zum Probleml6sungssystem
230
Keine der oben erw~hnte~ Komponenten stellt allein gesehen ein Novum dar~ ausgenommen vielleicht das Auskunftssystem @ber Daten~ Methoden und Programme. Im Zusammenwirken entsteht ein System zur interaktiven Bearbeitung umfangreicher Megdaten, bei dem der Probleml~ser selbst (ohne Zwischenschaltung yon Programmierern) die f~r ihn wichtige Information aus einer Datenbank unter anwendungsspezifischen Auswahlkriterien abrufen und f~r gleichfalls anwendungsbezogene Berechnungen nutzbar machen kann. Bei unserem experimentellen System wird APL als Implementierungs- und Tr~gersprache verwendet, um entsprechend implementierte Teilsysteme leicht einf~gen zu k6nnen. Durch die Wahl der Komponenten bedingt bleiben manche Aspekte eines Datenmanagementsystems zun~chst unber~cksichtigt. Dennoch kann das System helfen, Erfahrung zu sammeln ~ber die Forderungen, die f~r die Bearbeitung yon Me~daten an Datenbanksystem, Datenmanipulationssprache und Auskunftssystem zu stellen sind, um ein Probleml6sungssystem zu erhalten, das auch ffir Nichtprogrammierer attraktiv ist.
23t Literatur [ 1]
E.F. Codd, "A Relational Model of Data for Large Shared Data Banks", CACM, Vol. 13, No. 6, June ]970, pp. 377-387
[ 2]
E.F. Codd, "Normalized Data Base Structure: A Brief Tutorial", Proc. 1 9 7 1 A C M SIGFIDET Workshop
[ 3]
E.F. Codd, "Interactive support for Non-Programmers: The Relational and Network Approaches", Prec. 1974 ACM SIGFIDET Workshop
[ 4]
R.F. Boyce, D.D. Chamberlin, "SEQUEL: A Structured English Query Language", Proc. 1974 ACM SIGFIDET Workshop
[ s]
R.F. Boyce, D.D. Chamberlin, "Using a Structured English Query Language as a Data Definition Facility", IBM Research Report RJ 1318, Dec. 1973
[ 6]
M.M. Zloof, "Query by Example", IBM Research Report RC 4917,July 1974
[ 7]
R.A. Lorie, "XRM an Extended (n-ary) Relational Memory", IBM Technical Report 320-2096, Jan. 1974
[ 8]
R. Erbe, G. Walch, "An Interactive Guidance System for Method Libraries", IBM Germany, Wissenschaftliches Zentrum Heidelberg, Technical Report 75.O4.OO1, April 1975
[ 9]
R.A. Lorie, A.J. Symonds, "A Relational Access Method for Interactive Applications", Courant Computer Science Symposia 6, "Data Base Systems", 1971, Prentice Hall
[IO]
M. Schatzoff, "Interactive Statistical Data Analysis - APL Style", IBM Technical Report 320-2079, April 1972
[11]
F.P. Palermo, "An APL Environment for Testing Relational Operators and Search Algorithms", Proc. APL 75
[12]
J. Klebanoff, F. Lochovsky, D. Tsichritzis, "Teaching Data Base Concepts Using APL", Proc. APL 75
[13]
E. Kantorowitz, "A Computer Aided Design Front End for the Measurement Data Base", IBM Germany, Wissenschaftliches Zentrum Heidelberg, Technical Note 75.07, July 1975
DATENBANKORGANISATION BEI DER HOECHST AKTIENGESELLSCHAFT
Otmar Saal, Diplom-Volkswirt,
HOECHST AKTIENGESELLSCHAFT
Zusammenfassun~ In einem generellen Rahmen wird zun~chst aufgezeigt, von welchen Bedingungen und ~berlegungen HOECHST bei der Planung von Datenbanksystemen ausgeht. Am Beispiel Yon Anforderungen seitens stark integrierter Abrechnungs- und Abwicklungssysteme werden dann ausgew~hlte Fragen aus der praktischen Anwendung yon Datenbank- und Datenkommunikationssystemen
er6rtert.
DATENVERARBEITUNG IM SYSTEMVERBUND
Um den Rahmen der sp~teren Ausf~hrungen verst~ndlich zu machen, erscheint es zweckm~Big,
zun~chst einen kurzen Oberblick ~ e r
die Struktur unseres Unterneh-
mens zu geben.
Die HOECHST AG legte f~r das GeschAftsjahr 1974 einen WeltabschluB vor, in dem 0bet 400 in- und ausl~ndische Gesellschaften konsolidiert sind, an denen das Unternehmen mit mindestens
50 % beteiligt ist. Weltweit wurde ein Umsatz yon
20,2 Mrd. DM erzielt. Die Produktionspalette deckt mit etwa 50.000 verschiedenen Erzeugnissen fast vollst~ndig das gesamte Gebiet der Chemie ab.
Das Gesamtunternehmen HOECHST wird in 3 Gruppen betrachtet,
n~mlich HOECHST Welt,
HOECHST Konzern und HOECHST AG, wobei sich die nachfolgenden Ausf~hrungen iiberwiegend auf die Muttergesellschaft mit insgesamt 13 inlandischen Werken und einem Umsatzvolumen im Jahre 1974 von 9,7 Mrd. DM beziehen werden.
Betrachten wir das Zusammenwirken der einzelnen Unternehmenseinheiten der HOECHST AG
(Werke, Konzerngesel!schaften,
Auslandsgesellschaften)
chen (Ressorts, Bereiche, Unternehmensleitung) Aufgaben f~r die Datenverarbeitung,
mit den Funktionsberei-
hinsichtlich der dabei anfallenden
dann wird offensichtlich,
dab die notwendigen
233
Daten fur Abrechnungssysteme, mations- und Planungssysteme erfaBt,
zugef~hrt,
Abwicklungs-
und Dispositionssysteme
sowie Infor-
nut durch umfassende Systeme der Datenverarbeitung
einheitlich
aufbereitet,
gespeichert und ausgewertet werden
k6nnen.
Dementsprechend entsprechendes
tr~gt ein den jeweils zentralen und/oder dezentralen Aufgaben System von Datenverarbeitungseinrichtungen
archisch organZsierten
Zusammenwirken
in einem quasi hier-
dazu bei, die ben6tigten Daten zu erfassen
und zu verarbeiten und/oder f~r eine weitere Stufe der Verarbeitung
im Gesamt-
system bereitzustellen.
Unter der Bezeichnung Systemverbund
HOECHST arbeiten wir an der Realisierung
einer Konzeption,
die die Probleme einer zentralen und dezentralen
Datenverarbeitung
in einer mSglichst effizienten und betriebssicheren
helfen soll. Unser Mehrrechnerverbundsystem
yon 3 Gro~rechnern
oder lokalen Weise 16sen
in der Zentrale
wird sinnvoll erg~nzt durch einen Verbund angepaBter dezentraler Rechner- oder Terminalintelligenz,
wobe± weniger eine System-Distribution
im Vordergrund
sondern vielmehr die Verwendung der jeweils geeignetsten Einrichtungen Aufgaben der lokalen Datenerfassung Kommunikationseinrichtung
steht,
far die
und Verarbeitung mit einer m6glichst direkten
zum zentralen System. Soweit m~glich und sinnvoll wer-
den vom lokalen Rechner aus auch die zentralen Ressourcen mittels "Remote-JobProcessing"
genutzt, wof~r ein entsprechendes
steht. Die Werke und Gesch~ftsstellen in die Lage versetzt,
eigene werksbezogene
gen, die normalerweise ordnung 0bersteigende
Workstationprograrrm% zur Verf~gung
werden dutch dieses Verbundsystem Informationsbed~rfnisse
dort nut dutch eine die wirtschaftlich Datenverarbeitungsanlage
Dieses Gesamtsystem der Datenverarbeitung
au~erdem
zu befriedi-
sinnvolle Gr6Ben-
erf~llt werden k~nnten.
in unserem Unternehmen kann sich aber
nicht allein auf eine Weitergabe von Daten f~r die £ibergeordneten zentralen Datenverarbeitungsaufgaben Teileinheiten
dutch die jeweils 6rtlich oder funktional getrennten
erstrecken,
sondern verlangt gerade im Bereich der Datenspeiche-
rung und Informationsauswertung
eine einheitliche Architektur.
Damit ergab es sich fast zwangsl~ufig,
dab die Datenverarbe±tung
bei HOECHST in
konsequenter Verfolgung des Konzeptes einer integrierten Datenverarbeitung
zu
einer Datenbankorganisation
der
Datenbest~nde die physischen
kommen muBte, die eine Allgemeinverwendbarkeit
sowohl fiber die einzelnen Anwendungsbereiche Grenzen eines einzelnen Rechenzentrums
Wie bei jedem Unternehmen,
als aber auch Ober
hinaus sicherstellen kann.
das schon sehr frOhzeitig mit dem Einsatz der Daten-
verarbeitung begonnen hat, strebte auch HOECHST anfangs vorwiegend die Inte-
234
gration der Datenerfassung dungsgebieten
und einen effizienten DatenfluB
an0
Das soll abet keineswegs bedeutent heitlichkeit
zwischen den Anwen-
dab beim A~fbau der zentralen Dateien die Ganz-
der Planung und die Beachtung der Gesamtzusammenh~nge
vernachl~ssigt
worden ist. Es standen eben anfangs fiberwiegend Projekte der Massendatenverarbeitung in den Abrechnungsf~r den eigentlichen orientierten
und Administrationsbereichen
Fachbereich
Datenverarbeitung
die Belange der unmittelbaren gehende,
an, die zuerst einmal primer
aufgebaut wurden und die aufgrund der stapel-
auch meist in der Dateiorganisation Fachbereiche
nicht fachbereichstypische
tenerfassung und der z w e c k m ~ i g s t e n
0berwiegend
Daten dienten vorwiegend der integrierten DaWeitergabe m6glichst umfassend gepr~fter Daten.
Schon rein technisch gesehen hatten wir damals keinerlei M6glichkeiten lich einer umfassenden, handhabenden
auf
hin organisiert waren. Dar~ber hinaus-
abet dennoch m~glichst anpassungsf~higen
hinsicht-
und leicht zu
integrierten Datenbankorganisation.
Erst die techno!ogischen
Entwicklungen
der Datenverarbeitung
lieBen uns etwa ab
1967 dutch geeignete externe Speicher mit wahlfreiem Zugriff und dutch die neuartigen Kommunikationsformen Realisierung
im Rahmen einer Echtzeitverarbeitung
von umfassenderen
wendungssystemen
und dateiorganisatorisch
an die Planung und
starker integrierten An-
herangehen.
AUFGABEN DER DATENBANKEN
IM GESAMTINFORMATIONSSYSTEM
Seit etwa 6 - 7 Jahren befinden sich Art und Struk~ur unserer Anwendungen starken Wandel.
Die reinen Abrechnungs-
und Administrationssysteme
in einem
konnten jetzt
durch Datenbankkonzeptionen
und direkten Zugriff Ober Datenfernverarbeitungsein-
richtungen zu Dispositions-
und Informationssystemen
ausgebaut werden. Die Daten-
speicherung kann yon dem bisher ~iberwiegend inaktiven Zustand auf den Magnetb~ndern in sine aktive,
jederzeit yon den Benutzern
auf Magnetplattenspeicher tionssysteme
~berf~hrt werden.
f~r die Fachbereiche
die Art der maschinellen
ansprechbare
Speicherungsform
Die Datenerfassungs-
Durchf~hrung
durch die Echtzeitverarbeitung
neller gestaltet als abet auch dutch die direkte VerknOpfbarkeit banken aussagef~higer.
viel ratio-
zu anderen Daten-
Denn nun konnten die f~r den Fachbereich notwendigen
mationen durch Zugriff auf die Datenbankorganisation zentral gef~hrte Datenbanken baut werden.
und Administra-
selbst wurden dadurch sowohl im Hinblick auf
anderer Fachbereiche
leichter zu wirksamen Teilinformationssystemen
Zus~tzlich erm6glicht die integrierte Datenbankorganisation
direkten Abruf yon Daten aus fachbereichsbezogenen Schwerpunktsystemen
mehrerer Fur~tionsbereiche
Teilsystemen
Infor-
oder auf ausge-
einen
zur Bearbeitung
in
oder gar in zentralen Informations-
235
systemen.
Wenn wit den Begriff "zentrales Informationssystem" system" anstatt der vielfach Qblichen Bezeichnung benutzen,
oder auch "zentrales Berichts"Management Information
System"
dann hat das seinen Grund.
Uns scheint MIS zu stark auf eine Informationsgewinnung ebenen festgelegt,
wodurch der Eindruck erweckt wird, da~ das entsprechende
der Datenverarbeitung
und Datenspeicherung
dab die Informationsbed~rfnisse
der operativen Ebene his bin zur h~chsten F~hrungsebene zentral gef~hrten Datenbanken
System
primer unter diesem Gesichtspunkt kon-
zipiert wurde. Wir sind vielmehr der Ansicht,
gemeinsamen,
nur f~r h6here FQhrungs-
von
unbedingt aus jeweils
abgedeckt werden mOssen. Diese Daten-
banken selbst k~nnen dann durch verschiedene Teilinformationssysteme
erstellt
werden und dienen zun~chst einmal primer zur Bew~Itigung der Aufgaben in Systemen, die f~ir die operative Ebene erstellt wurden. DaB diese Datenbanken dar~ber hinaus auch in der Lage sein m~ssen, die Anforderungen systemen abdecken zu k6nnen, d a s i s t
von Obergeordneten
im wesentlichen
Informations-
eine Frage einer planvollen
und flexiblen Datenbankstruktur.
Eine planvolle und auf die Erf~llung aller zentralen Informationsbed~rfnisse gerichtete Datenspeicherung System der Datendefinition
und DatenverschlOsselung.
bei HOECHST vom Beginn der Datenverarbeitung zentrale Definition,
Dementsprechend
wird auch
an ein sehr gro6er Wert auf die
Entwicklung und Pflege aller Schl~sselbegriffe
und Ordnungs-
kriterien gelegt, die in einem zentralen Schl~sselbuch der Datenverarbeitung das gesamte Unternehmen verbindlich
Unter der Voraussetzung dann prinzipiell Anforderungen
aus-
erfordert aber zun~chst einmal ein einheitliches
for
festgelegt und erg~nzt werden.
einer klaren Datendefinition
und Verschl~sselung
ist es
kein allzu schwieriges Problem mehr, die Daten den jeweiligen
der Informationssysteme
zu verkn~pfen und auszuw~hlen.
entsprechend bereitzustellen,
zu verdichten,
Wenn man ein geeignetes Datenbank-Management-System
zur Verf~gung hat, kann durch einen universellen Aufbau der Datenbanken viel elastischer und unmittelbarer
auf wechselnde
Informationsbed~rfnisse
des Manage-
ments reagiert werden als dies bei dem starren Rahmen eines einmal vorgedachten und im Dateieninhalt
festgelegten MIS m6glich w~re.
Damit gehen wir bei HOECHST eindeutig den Weg, zun~chst einmal sehr umfassende Teilinformationssysteme
aufzubauen und das eigentliche MIS als ein quasi Qberge-
ordnetes "zentrales Berichtssystem" der Datenbanken
durch die gemeinsame,
jederzeit aussagef~hig
zu halten.
zentrale Organisation
236
DATENBANKEN
IN EINEM INTEGRIERTEN TEILINFORMATIONSSYSTEM
Dutch praktische Beispiele m~chte ich die bisherigen generellen Aussagen etwas konkreter werden lassen. Um abet auch hierf~r zun~chst den Gesamtzusammenhang verstandlich
zu machen, werde ich einen Uberblick 0bet ein umfangreiches
stark verzahntes seinerseits
aus einer Anzahl fir sich allein wirksamer Teilsysteme
Die derzeit engste Verzahnung
im Datenverbund
im Bereich der Auftragsabwicklung, Produktionsdatenerfassung, disposition
der Versanddisposition
sowie der KontokorrentfOhrung.
auch auf zentrale Datenbanken
zur~ck.
yon der Auftragsannahme
tions- undAbwicklungsstufen
his
168t sich nut voll automatisiert
und Disposition,
und -Abwicklung,
In den wesentlichen
im Echtzeitbetrieb
Eine Auftragsabwicklung
der Einkaufs-
Bestandteilen
~ber die verschiedenen
ar-
Disposi-
im Kontokorrent
wenn auch die jeweils relevanten
und wirklich aktuellen Daten aus den tangierten anderen Teilsystemen Zugriff zur Verf~gung
der
und greifen dabei weitestgehend
hin zur Rechnungsverbuchung durchf~hren,
besteht.
haben wit zwischen den Teilsystemen
der Lagerbestandsf~hrung
beiten alle diese Teilsysteme
im direkten
gestellt werden k6nnen.
Dementsprechend
ben6tigt bereits das Teilsystem der Auftragsabwicklung
nungsschreibung
umfassenden
Produkt,
und
System im Bereich des Verkaufs und der Produktion geben, das
Zugriff auf Informationen
Lagerbestandsf~hrung,
Abet diese gegenseitige
Produktionsplanung,
Bereitstellung
und Rech-
aus den Bereichen Kunden,
Transportmittel
und andere.
von Daten yon und for andere Arbeitsge-
biete darf keineswegs nut im engen Rahmen eines lokal orientierten Systems erfolgen, sondern mu~ dem Gesamtverbund Unternehmenseinheiten
der Abwicklung und Abrechnung ~ber einzelne
hinweg Rechnung tragen.
So kann die Definition der Kundenauftr~ge Gesch~ftsstellen
erfolgen~
und die Auslieferung
l~gern oder von den verschiedenen
Anhand einiger
ist von AuBenl~gern,
Betriebsst~tten
stark vereinfachter
reich der Auftragsabwicklung die zur entsprechenden
sowohl in dem Stammhaus als auch in den
Frage- und Aufgabenstellungen
aus diesem Be-
bei HOECHST werde ich nun versuchen,
Datenbankorganisation
(Lager, Werke,
B = Best~nde
(Istbestand,
C = Kunden und Lieferanten
Uberlegungen,
gef~hrt haben, zu verdeutlichen.
Dazu stelle ich drei stark integrierte Datenbanken heraus, A = Auftrage
von Zentral-
in Deutschland aus mSglich.
n~mlich f~r:
interne Lieferungen)
Dispositionsbestand,
Prod.-Plan,
(Offene Posten, Bestellungen)
Bestellung)
237
Zun~chst soll durch Auftragsdefinition gang gebildet werden.
in dem Bestand der Datenbank A ein Neuzu-
Dazu bedarf es abet bereits bei der Auftragsannahme
folgen-
der Feststellungen:
-
Kann die Ware zur Zeit ~berhaupt geliefert werden und zu welchen Konditionen, oder wenn nicht, wann und von welchem Lager oder Produktionsbetrieb
kann wieder
geliefert werden?
-
Ist der Kunde bez~glich der Kunde gleichzeitig zwischen Kundenobligo
seines Kreditlimits
noch belieferbar,
auch als Lieferant vorkon~nt, und unseren Verbindlichkeiten
Um diese Fragestellungen
beantworten
oder aber, falls
wie sieht die Differenz oder Bestellwerten
aus?
zu k6nnen, mOssen f0r die erste Frage Infor-
mationen aus der Datei B und f~r die andere Frage Daten aus der Datenbank C zur Verf0gung
stehen.
Die Ver~nderungen handelt,
der Bestandsdatei
dutch Aktivit~ten bewirkt,
verursacht werden,
(B) werden,
soweit es sich um Betriebsbest~nde
die nicht allein durch das Verkaufsgeschehen
sondern ebenso durch Zu- oder Abgange im Produktionsproze~
denn bei der Auftragsabwicklung
in den Produktionsl~gern
einer von vielen statusver~ndernden
Vorg~ngen.
ist der Kundenauftrag
Eine st~ndige Dispositionsbereit-
schaft erfordert n~mlich noch eine Reihe ~nderer Daten. Je nachdem, tion auftragsorientiert,
lagerorientiert,
abl~uft, m~ssen die entspreehenden daten, Auftragsdaten,
kontinuierlich
Dispositionssysteme
Produktionsplandaten,
nur
ob die Produk-
oder diskontinuierlich
auch auf aktuelle Bestands-
Anforderungen
aus Produktion und Be-
stelldaten zugreifen k~nnen.
Ausgehend von einer einfachen Fragestellung
nach der Lieferbereitschaft
f0r ein
Produkt k6nnen wir jetzt bereits eine beachtliche Verkn~pfung verschiedener systeme erkennen,
die alle einen gemeinsamen Integrationspunkt
Teil-
in der Bestands-
datenbank haben.
Auch bei der Beantwortung wit Abh~ngigkeiten
der anderen Frage nach der Bonit~t des Kunden erkennen
vom Zahlungseingang,
anderer Verkaufsbereiche
yon der Auftragsannahme
durch Disponenten
und schlie~lich yon den eigenen Bestellanforderungen
wie der Begleichung unserer Lieferantenrechnungen,
so-
falls dieser Kunde auch gleich-
zeitig uns gegen~ber Lieferant ist.
In meinen bisherigen A u s f ~ h r u n g e n w u r d e bung und andeutungsweise
absichtlich diese organisatorische
auch die funktionalen Zusammenh~nge
Umge-
der Teilsysteme mit
238
aufgezeigtg um klar erkennen zu lassenf dab die Dateiorganisation vor allem unter dem Aspekt der Gesamtzusa1~nenh~nge gesehen werden muB. Datenbankorganisation muB sich n~mlich yon der frfiher vorherrschenden Dateiorganisation dadurch unterscheiden, dab eine universelle Verwendbarkeit der Daten fQr eine Vielzahl von Anwendungen auch ~ber den prim~ren Anwendungsbereich hinweg erreicht werden kann.
Wenn man zus~tzlich die bekannten Postulate f~r den Einsatz von Datenbanken erf~llen will, n~mlich Aktualit~t der Daten f~r alle Benutzer, Redundanzfreiheit und Zugriff zu den Daten nach verschiedenen Kriterien, dann war dies genau die Ausgangssituation unserer Uberlegungen,
als wir etwa Anfang 1968 an die System-
planung fiir unser erstes Auftragserfassungs- und -Abwicklungssystem im Echtzeitbetrieb herangingen und uns nach einer geeigneten Datenbank-Software umsahen. Unsere Anforderungen an die einzusetzende Datenbank-Software betrafen aber nicht nur ein Instrument fur die eigentliche Datenbankverwaltung,
sondern wit suchten
ein insgesamt flexibles und ausbauf~higes, abet auch in seiner weiteren Entwicklung abgesichertes System.
Da auch unsere ersten Datenbank-Anwendungen
nut als Teilssysteme konzipiert wet-
den konnten und selbst innerhalb der Einzelsysteme lediglich in Entwicklungsphasen zu realisieren sind, muBte das einzusetzende Dahenbanksystem ebenfalls in seiner Struktur recht anpassungs- und ausbauf~hig sein und im Datenbankverwaltungsteil leicht und sicher die Integration weiterer Anwendungsprogram/ne ermSglichen.
IMS als Datenbank-Software
Noch bevor uns IMS bekannt wurde, haben wir unter dem Stand der Erkenntnisse und M6glichkeiten Anfang 1968 versucht, einen eigenen Datenbankprozessor zu entwickeln. Ausgehend yon der Dateiorganisation des Stficklistenprozessors sollte die notwendige Strukturierung der Dateien m~glichst und fiber entsprechende Makros der universelle Zugriff zu den Datenelementen realisiert werden.
Doch noch w~hrend der Entwicklung dieses eigenen Datenbankprozessors wit Vorabinformationen ~ber das Information Management System
erhielten
(IMS) und ent-
schlossen uns nach einem umfangreichen Systemtest zum Einsatz dessen Datenbankteiles
(DLI).
Ffir die Datenbankorganisation mit IMS sprach vor allem der aufgrund der Baumstruktur gegebene flexible Aufbau mit maximal 256 Segmenttypen und einer var~ablen
239
Segmentanzahl
auf 15 verschiedenen
programmunabh~ngigen
Stufen. Darf~ber hinaus bot das IMS, durch die
Datenbankbesehreibungen
einzelnen Benutzerprogramme g~ngig machen zu kSnnen,
und dutch die Einrichtung,
nur jeweils erforderliche
eine uns ideal erscheinende M6glichkeit,
gige und dem wachsenden Integrationsgrad
grationsgrad
zufrieden geben,
dutch unterschiedliche
mationsbedOrfnis
die Notwendigkeit,
programmunabh~n-
gut anpaBbare Datenbanken aufzubauen.
Konnten wir uns in den beiden ersten Anwendungsjahren Struktur unserer IMS-Datenbanken
f~r die
Segmente als sensitiv zu-
noch mit der stark linearen
so brachte der wachsende Inte-
Anwendungssysteme
und das ansteigende
Infor-
die Datenbanken unabh~ngig von ihrer phy-
sischen Speicherung auch logisch strukturieren
zu k6nnen, wie dies dann ab IMS
Version 2 auch m6glich wurde.
Ebenfalls mit Version 2 wurde auch der Datenkomunikationsteil zentralen System f~r die Nachrichtenarmahme, zwischen 140 Datenstationen fast nur fur den IMS-Betrieb
£ibernommen,
des IMS zu unserem
-Steuerung- und Verwaltung von in-
so dab heute schon ein System /370-168
eingesetzt werden muB. Dar~ber hinaus wird IMS auf-
grund seiner zentralen Datenbankverwaltung
und Nachrichtensteuerung
Verbindung mit GIS eingesetzt und auBerdem mit STAIRS verkn~pft. heute yon HOECHST als ein umfassendes steuerungssystem
Datenbankverwaltungs-
und Nachrichten-
angesehen.
Aufgrund der zentralen Bedeutung der mit IMS organisierten recht hohen Nachrichtenaufkommens Vorg~nge
jetzt auch in
Somit wird IMS
Datenbanken
mit einem starken Anteil anderungswirksamer
in den Datenbanken muBten wir besonders beim Datenbankdesign
der Programmstruktur Sehnelligkeit
und bei der Wahl der Zugriffsbefehle
und Sicherheit
und eines
sowie bei
einen groBen Weft auf
legen.
IMS l~Bt dem Benutzer einen groBen Spielraum bei der Organisation und Strukturierung der Datenbanken. beitungsgeschwindigkeit
Deren Design aber beeinfluBt ganz entscheidend die Verarder zugeh~rigen Anwendungsprogramme
Hinblick auf den Gesamtdurchsatz
und kann sich auch im
im IMS-System sp6rbar bemerkbar machen.
Da zu Beginn der Anwendung von IMS weder Erfahrungen vorlagen noch in irgendeiner Weise ein Verfahren zur Simulation des Zeitverhaltens schiedlicher
Strukturierung
zur VerfOgung
der Datenbanken bei unter-
stand, muBten viele grundlegende Er-
kenntnisse yon uns zuerst einmal im Rahmen spezieller Testuntersuchungen
gesam-
melt und dann im praktischen Betrieb erg~nzt und angepaBt werden. Allerdings muBten im Laufe der Zeit manche unserer dabei gewonnenen Regeln infolge wesentlicher Anderungen von Hard- und/oder Software wieder neu fiberdacht und ver~ndert werden.
240
So hat die Verf~gbarkeit 0bet preiswertere Plattenspeicher mit erheblich verbesserter Speicherkapazit~t einerseits und die immer aufwendiger werdende K o ~ u n i k a tion zwischen Benutzer- und Verwaltungssystem bei den neuen Betriebssystemen andererseits dazu gef~hrt F v o m
Konzept der tieferen Strukturierung mit feiner Seg-
mentierung wieder abzugehen. Es wird dabei zwangsl~ufig mit gr~Beren !nformationseinheiten
(Segmenten)
gearbeitet, die jedoch oft nieht voll genutzt werden und
entsprechend mehr externen Speicherplatz ben~tigen.
Der Mehraufwand bei der Daten-
bereitstellung im Anwenderprogramm zwischen einem gr~Beren und einem kleineren Segment ist verschwindend gering gegeniiber dem zweimaligen Kommunizieren zwischen Anwender- und Kontrollprogramm.
Ahnliche Einsparungen erlauben die im Laufe der Weiterentwicklung yon IMS eingef~hrten Syntaxverbesserungen.
W~hrend fr~her im Regelfall mehrere Segmente ange-
fordert wurden und die Auswahl im Anwenderprogramm erfolgen muBten, erlauben es jetzt die booleschen Verkn~pfungen verschiedener Kriterien in den Suchanweisungens die gew~nschten Informationen mit weniger Aufrufen vom System ausw~hlen zu lassen.
In der Organisationsform der IMS-Dateien streben wit heute ~berwiegend Verfahren an, bei denen zum Auffinden des Satzes nicht mehr das aufwendige Durchsuchen der Indextafeln erforderlich ist, sondern durch ein Umrechnungsverfahren
aus dem Sor-
tierschl~ssel eine direkte Adresse ermittelt werden kann. Allerdings ist es oft schwierig ein Verfahren zu finden, das g l e i c h m ~ i g teilt. Dies gilt besonders f ~
0her den gegebenen Bereich ver-
sogenannte sprechende Schl~ssel, die keinerlei
R~cksicht auf eine Speicherorganisation nehmen° Obwohl bei diesen Umrechnungsverfahren in der Regel die Sortierfolge verloren geht, interessiert uns der schnellere zugriff f~r die Echtzeitverarbeitung erheblich mehr als der vermehrte Aufwand f~r ein gelegentliches
sequentielles Verarbeiten dieser Datenbest~nde.
Heute sind bei HOECHST ca. 60 % aller online-Dateien nach direkten Zugriffsverfahren organisiert.
Die restlichen Dateien konnten wegen ihrer Schl~sselstruktur und
h~ufigen sequentiellen Verarbeitung noch nicht umgestellt werden. Problematisch bei den indexorientierten Verfahren sind Neuzug~nge, da IMS f~r sie 0berlaufketten bildet, was die Performance ganz erheblich senkt. Gerade bei einem online-System werden die neuen S~tze in mehreren Phasen gepr~ftt verarbeitet und weitergeleitet~ wobei jeweils ein aufwendiges Lesen erforderlich ist. In jeder Nacht reorganisieten wit die meisten dieser Dateien, wobei ben~tigte Auswertungen und Statistiken erstellt werden und als Datensicherung eine Kopie anfallt.
VSAM als verbesserte indexorientierte Zugriffsform des Betriebssystems wird zur Zeit bei uns getestet. An einen produktiven Einsatz ist aber erst zu denken, wenn wit vom sicheren fehlerfreien Funktionieren im Zusammenspiel mit IMS Oberzeugt sind.
241
Dutch praktische Erfahrungen wurde auch ein gewisser Wandel bei der Gestaltung umfangreicher
zentraler Datenbanken
ausgel~st;
denen Benutzern oft recht unterschiedliche Umfang aufzunehmender
Daten gestellt werden.
banken, wie beispielsweise spezielle Abwicklungs-
speziell dann, wenn yon verschie-
Anforderungen
hinsichtlich
der Kunden- und Lieferantendatenbank,
oder Abrechnungsprograrr~ne
k6nnen h~ufig Ober das hinausgehen,
Inhalt und
Die in solchen zentralen Stammdatenf~r einzelne
zu speichernden
Informationen,
was fur die restlichen Benutzer jemals von
Bedeutung sein kann.
Wir batten dieses Problem vor allem bei typischen branchenbezogenen serer Kundendatei,
die beispielsweise
deren Ausf~llung des einheitlichen dustriebereichs.
bei Arzneimittelkunden
Strukturrahmens
zu einer v611ig an-
f~hrte als bei Kunden des In-
Hinzu kommen h~ufig abweichende Anforderungen
tualisierung der Informationen,
Zust~ndigkeit
Daten in un-
hinsichtlich Ak-
im ~nderungsdienst
und Aufnahme
neuer Daten, wodurch trotz allen Komforts der Datenbankverwaltungssysteme immer wieder Unruhe auch in diejenigen Benutzergruppen gen wird, die primer v o n d e r Struktur nicht betroffen
doch
solcher Datenbanken
getra-
Erweiterung oder einer kleineren Anderung in der
sind.
Aus ~berwiegend pragmatischen Gr~nden haben wir uns daher in einigen F~llen weniget an die reine Theorie eines universell verwendbaren und redundanzfreien bankkonzeptes
gehalten und mehr die flexible und benutzerfreundliche
sowie ein sicheres Verhalten der Datenbank
in der Systemumwelt
Daten-
Handhabbarkeit
in den Vordergrund
unserer Oberlegungen gestellt.
Darum wurden einige bisher schon recht komplexe Datenbanken und fur sich durch neu hinzukommende Umfang erweitert.
Anwendungsgebiete
nicht mehr in dem an
erforderlich werdenden
Wir gingen nun verst~rkt auf das Prinzip der Auslagerung
speziel-
let Daten in dedizierte Dateien ~ber. Diese Subdateien bleiben logisch in gewissem Umfang noch v o n d e r
Mutterdatenbank
abh~ngig, weil der Stammteil der Informationen
von dorther eingespeist wird. Auf der anderen Seite mOssen ver~nderte Daten aus der Subdatei v611ig unabh~ngig yon der Mutterdatenbank
~bernommen werden kSnnen.
Physisch werden diese Subdateien v611ig unabh~ngig yon dex Mutterdatenbank und erhalten dort im Rootsegment weise oder L6schungen. einen bereitser~ffneten er~ffnet werden.
lediglich Vermerke ~lber Er6ffnung,
Logisch bleiben sie dadurch voneinander Stammteil
Derartige Aufteilungen
~nderungshin-
abh~ngig,
denn ohne
in der Mutterdatei kann auch die Subdatei nicht
Ebenso d~rfen Basisdaten
solange entsprechende
gef~hrt
in der Mutterdatei
nicht gel6scht werden,
Daten in den Subdateien noch ben6tigt werden.
einer Datenbank in Mutterdatei und Subdateien k6nnen nicht
nur aus organisatorischen
Gr~nden erfolgen,
sondern m~ssen auch durch die
242
Verhaltensweise
der Hard- und Softwaresysteme
sehr viele Anwendungsgebiete die Zugriffsh~ufigkeiten kommen. Andererseits der Update-Vorg~nge
in Erwagung gezogen werden;
auf den gleichen Plattenstapel
bereitet uns die tempor~re einige Zeitprobleme,
rungen sinnvollerweise
denn bei
umfassenden Datenbanken kann es schon allein durch
im Stapelbetrieb
zu erheblichen Engp~ssen
Sperrung der Datenbanken w~hrend
vor allem dann, wenn die Datenbank~ndeauf einem anderen Rechner durchgef~hrt
werden.
Durch die Aufteilung Bereich
in dedizierte Subdateien erfa6glichen wit es jedoch, dab im
der voneinander unabh~ngigen
schiedene Anwendungsprograrmme nen. Synchronikationspunkte vermerken
Daten einer logischen Gesamtdatenbank
weitgehend ungest6rt und gleichzeitig
und ein aufeinander
in den Einzeldateien
abgestimmtes
ver-
arbeiten k~n-
System von Hinweis-
sorgen dann daf~r, dab der Gesamtzusammenhang
der
Datenbank erhalten bleibt.
Ein anderer in der praktischen Arbeit nicht zu untersch~tzender zierten Subdatenbanken Fehlersituationen
lieber eine gewisse Datenredundanz
wichtige Progran~ne im Falle eines Ein-/Ausgabefehlers
in Kauf, als dab
oder anderer technischer
im Zugriff zu lange auf die Durchf~hrung umfangreicher
stellungsmaBnahmen
fur die Gesamtdatenbank
warten mOssen.
auch dann~ wenn die nachts im Stapelbetrieb ganisationsl~ufe
bei
oder sonstigen St~rungen im System. So nehmen wir ggf. innerhalb
dieses Subdatenbanksystems
Behinderungen
Vorteil yon dedi-
liegt in der erheblich besseren Reaktionsf~higkeit
zentraler Datenbanken
durchzufNhrenden
Anderungs- und Reor-
eine St~rung im Ablauf erfahren und nicht
mehr rechtzeitig bis zum Anlaufen des Echtzeitbetriebs nen. Zugeh~rige Subdatenbanken
Wiederher-
Dies gilt im Prinzip
bereitgestellt
werden k6n-
hingegen bleiben yon diesen St6rungen oft v~llig
unber~hrt oder k6nnen auch ohne die anstehenden Datenbank~nderungen
weiterhin
benutzt werden.
Andererseits
k~nnen abet auch Auswirkungen
Gesamtdatenbanksystem
yon Programmzusanm~enbr~chen
muS dann nicht unbedingt die Gesamtdatenbank programme
stoppen,
geschenkt;
einer gewissen Anf~lligkeit
sowohl durch die Benutzersysteme
Soft- und Hardware,
Abh~ngigkeiten
zur schnellen Behebung der Fehlersituation
Gerade dem Problem der Vermeidung wirkungen,
und damit alle tangierten Anwendungs-
sondern kann wegen geringer gegenseitiger
gezieltere MaBnahmen
auf ein
dutch dedizierte Subdateien geringer gehalten werden. Man
einleiten.
gegen~ber St~rein-
als aber auch nach wie vor durch
wird heute vom Hersteller
noch viel zu wenig Aufmerksamkeit
denn welche Vorteile soll eine theoretisch
und auf alle Informationsbelange
viel
eingerichtete
sehr sinnvoll
DateDbankorganisation
strukturierte bringen,
243
wenn diese nicht absolut benutzungsfreundlich
und zuverl~ssig
Ziel integrierter Datenbank- und Informationssysteme bei aller w~nschenswerten
soweit wie nur m6glich operationsf~hig
Da Auswirkungen
muB es vielmehr
Bewahrung der Gesamtzusammenh~nge,
ihrer speziellen Funktion von St6rungen verbundener
einzelner Anwendungsprogramme
Systeme unbeeintr~chtigt
eingerichtet.
entsprechend
geeigneter Strukturierungen
und sicherheitsm~Big
sowohl hinsichtlich
VerfUgbarkeitsaspekte
Diese hat zur Aufgabe,
sowohl die datenbanktechnischen
und auf das einzelne Datenbankdesign Empfehlung
und
nur
Sicht beurteilt werden k6nnen, wurde bei HOECHST eine
spezielle Koordinationsstelle rend der Planungsphase
in
als auch im Hinblick auf die
Belastung fur das Gesamtsystem und die grunds~tzlichen noch aus ~bergeordneter
sein, da~,
die Teilsysteme
erhalten bleiben.
durch Aufbau und Anwendungen von Datenbanken
des Systemverhaltens
angelegt ist? Das
bereits w~h-
Gesamtaspekte
einzuwirken,
der Anwendungsprogramme
zu beachten
als auch durch zu einem
zeitlich
gUnstigen Ablauf im Gesamtsystem rechtzeitig beizutragen.
DarUber hinaus werden yon diesen Spezialisten wendungsbeispiele
fur Datenbankdesign
dutch Merkbl~tter
und Informationsseminare
allgemein g~ltige Normen und An-
erarbeitet und in jeweils geeigneter Form den Anwendern von Datenbanksystemen
zug~nglich gemacht.
HILFSMI~TEL FOR DATENBANKDESIGN
UND -VERWALTUNG
W~hrend wir uns in der Vergangenheit wendigen Testversuchen
sehr tastend und mit teilweise recht auf-
an eine endgUltige
Struktur einer Datenbank heranbewegt
haben, bemUhen w±r uns heute beim Design sowohl mehr um die Anwendung generell gesicherter Erkenntnisse geeigneter Hilfsmittel
Einerseits
aus der praktischen Erfahrung als auch um den Einsatz
fur eine wirksame Datenbankmodellierung.
helfen uns zur Erreichung dieses Zieles eine Reihe von Hilfsprogram-
men, die rein organisatorisch transparenter
die Struktur der Datenbank und deren Inhalt
gestalten und als Modellierungshilfe
~nderungen im Design erm6glichen; zur Simulation der Datenbanken unter dem Gesichtspunkt nationsfahigkeit
andererseits
sehr einfach notwendige Ver-
k~nnen zus~tzlich noch Programme
eingesetzt werden, die eine geeignete Struktur
der Mengenger~ste,
der Datenelemente
Allerdings m6chte ich einschr~nkend
der Zugriffsh~ufigkeit
und der Kombi-
herausfinden helfen.
sagen, dab wit umfassende Simulationen wegen
der sehr aufwendigen Vorarbeit f~r die Beschaffung der quantitativen Angaben und
244
wegen des Aufwandes fur die Beschreibungen der vielf~ltigen Zugriffsfunktionen seitens der Benutzerprogramme bisher noch nicht durchgef~hrt haben. In der Zukunft jedoch werden zuverl~ssigere Planungen unter Anwendung verbesserter Simulationsverfahren schon deshalb unerl~Blich werden, well die Datenverarbeitung nicht nut wegen des allgemeinen Kostendrucks,
sondern auch wegen der %IberhShten
Systembeanspruchung durch spezielle Anwendungsprogramme
nicht st~ndig das Gesamt-
system erweitern kann.
Andere,
ganz dringend notwendige Hilfsmittel,
sowohl for die Design-Phase als
auch fiir die laufende Verwaltung der Datenbanken, program2ne.
Ohne derartige,
sind geeignete Dokumentations-
im englischen Sprachraum mit "Data Dictionary and
Directories" bezeichnete Systeme kann man eine effektive Datenbankplanung und einen laufenden Oberblick %iber Struktur, Querverbindungen
im Daten- und Benutzer-
bereich sowie flber den jeweiligen Status der Datenbank nicht mehr zuverl~ssig erreichen.
Umso mehr m~ssen w i r e s
als Anwender komplexer Datenbanksysteme bedauern, da~ bis-
her vom Anbieter der Datenbanksysteme dieses schwierige Problem der DatenbankDictionary-Systeme
so sehr schleppend bearbeitet wurde und die Benutzer meist
eigene und wegen des hohen Aufwandes oft unzureichende Tei!16sungen fur ihre Datenbankdokumentation und Administration erarbeiten mugten.
In dieser, f~r eine
weitere und gesicherte Fortentwicklung von Anwendungen mit Datenbanken so entscheidenden Frage mOssen wir an IBM die dringende Aufforderung richten, die Kunden in ihrer Datenbankverwaltungsarbeit
dutch ein umfassenderes und benutzer-
freundliches "Data Dictionary System" zu entlasten und zu einer weitgehend maschinellen Dokumentation der eingesetzten Datenbanken beizutragen.
Neben diesen Administrationshilfen f~r eine leichtere Gestaltbarkeit und Verwaltung yon Datenbanksystemen ist auch der permanente Einsatz von Hilfsprogrammen zur Beobachtung des arbeitenden Systems und zur Auswertung statistischer Kenngr6gen unerl~Blich,
um dadurch sowohl die Arbeitsweise einzelner Programme als
auch das gesamte Systemverhalten beurteilen und anpassen zu k6nnen. Hierf~r k6nnen wir aber auf ausreichende Daten aus IMS und SMF zur~ckgreifen und geeignete Monitoren zu deren Auswertung einsetzen.
Geringe Belastung dutch einzelne Anwender f~hrt bei IMS wegen der Verzahnung der Abl~ufe innerhalb der online-Kontrollregion zu einem insgesamt besseren Performance-Verhalten.
Es ist daher erforderlich,
sowohl das Verhalten einzelner Pro-
gramme als auch ihr Zusammenspiel miteinander zu ~berpr~fen. Dazu benutzen wir Programme, die auf der Auswertung von Logbandsatzen basieren. Die Tagesstatistik
245
zeigt die Aktivit~ten eines Programmes w~hrend eines ganzen Tages. Daraus kann man erkennen,
ob einzelne Programme
ten eine ~erdurchschnittliche
Die Tagesstatistik
Anforderung
(z. B. zu Spitzenzeiten),
innerhalb eines Progranz~durchlaufs
Datenbankzugriffe.
zu den verarbeiteten Nachrichhaben.
ermittelt keinen Eindruck vom Verhalten eines Programms inner-
halb einer gewissen Umgebung Aktivit~ten
im Verh~itnis
Rate von Datenbankzugriffen
yon der Reihenfolge
sowie v o n d e r
Ffir diese Zwecke gibt es den DC-Monitor,
entsprechende
Informationen mitschreibt.
der auf besondere
Systemspezialisten
die damit gewonnenen Listen aus und k6nnen den verantwortlichen zu geschickteren
Datenbankaufrufen
veranlassen bzw. allgemeine Richtlinien heraus-
erreichen k~nnen. Es ist bei einem online-System,
dungsprogramme
miteinander
konkurrieren,
stellationen oder Ablaufreihenfolgen
gleiche Kon-
Der EinfluB von kleinen Xn-
maBst~ibe sind daher allenfalls Gesamtzahl der Zugriffe, Durchsatzrate
von Nachrichten
der
in dem ca. 40 Anwen-
natfirlich kaum m6glich,
zu wiederholen.
imAblauf
derungen kann daher meist nicht in exakten Zahlen ausgedrfickt werden.
Rechnerzeit,
werten
Programmierer
geben. Wir haben auf diesem Wege schon wesentliche Verbesserungen Programme
der
Dauer einzelner
Bewertungs-
insgesamt verbrauchte
zu bestimmten Tageszeiten usw.
GEGEBENE UND NOTWENDIGE KOMMUNIKATIONSFORMEN
Aus den bisherigen Ausf0hrungen
war zu erkennen, dab die entscheidende
keit zum Aufbau einer umfassenden Datenbankorganisation ausgel6st wurde. Echtzeitverarbeitung bar hin zum Arbeitsplatz
dutch die Echtzeitprojekte
mit 0ffnung der Datenverarbeitung
des eigentlichen
Systembenutzers,
unmittel-
der in einer interakti-
ven Betriebsweise mit dem System und seinen Datenbanken kommunizieren deft aber die Anwendung eines umfassenden
Notwendig-
Nachrichtensteuerungs-
soll, erfor-
und Verwaltungs-
systems.
Die wesentlichen Teilhabersysteme,
hier vorgestellten Anwendungen
sind von ihrem Typ her sogenannte
bei denen in einem vorgegebenen Anwendungssystem
einzelnen vom Benutzer ausgel6sten Transaktionen
fest zugeordnete Prozeduren
aktiviert werden. Ffir diesen Typ der Nachrichtensteuerung bietet IMS mit seinem Datenkommunikationsteil wendige Erg~nzung des Datenbankteils. handenen Sfcherheitseinrichtungen Nachrichten
not-
Auch die f~r das Gesamtsystem des IMS vor-
als auch der datenbankwirksamen
Logging sowohl der
Aktivit~ten und einem wirksamen
best~rken uns zus~tzlich in der Ansicht,
dab wir mit IMS im PrinZip das richtige Softwareprodukt systeme zur Verffigung haben.
und Programmkontrolle
die f~r ein Informationssystem
mit einem ~mfangreichen
Pr~fpunkt und Wiederanlaufverfahren,
aufgrund der
f~r unsere Informations-
246
Richtigerweise ist die Syst~msteuerung yon IMS recht umfassend ausgelegt,
so
dab wir inzwischen unter dessen Kontrollprogramm nicht nur die transaktionsbedingten "Message Control Progran~ne" Datenfernverarbeitungsprogrammes
laufen lassen, sondern auch stapelorientierte
das GIS und das "Information Retrieval System"
STAIRS mit umfangreichen Dokumentationsdatenbanken
zur Anwendung bringen.
Das erw~hnte GIS hat alierdings zur Zeit noch eine v611ig untergeordnete Bedeutung und soll erst nach Ablauf einer erfolgreichen Erprobungszeit in zuk~nftige Planungen einbezogen werden. Trotzdem k6nnen wir schon aufgrund der ersten Probeanwendungen erkennen, dab es eine interessante Erg~nzung zu den Datenbanksystemen darstelien kann. Ob es allerdings voll geeignet ist, um unmittelbar vom Endbenutzer sporadisch auftretende Anfragen an bestehende Datenbanken schnell formulieren zu lassenr scheint noch ungewiB. Es w~re unseres Erachtens besser, fur die Endbenutzer eine einfachere und in deutscher Sprache formulierbare Abfragesprache zu haben und daf~r GIS fur erfahrene Benutzer noch weiterhin auszubauen, um beispielsweise auch durch Feld- und variable Indizierung noch bequeme Anfragen an IMS-Datenbanken richten zu k6nnen. Dabei w~re es ebenfalls yon Vorteil, wenn aus den vorhandenen IMS-Datenbankbeschreibungen
auch automatisch die IMS-Datei-
beschreibung erzeugt w~rde, oder aber durch ein 0bergeordnetes Datenbankmanagement IMS und GIS gemeinsam bedient wOrden.
Das hier ebenfalls erw~hnte STAIRS wird bei HOECHST als umfassendes Dokumentations- und Information Retrieval System eingesetzt. Unter der Nachrichten- und Programmsteuerung von IMS wird STAIRS bisher fGr umformatierte Datenbanken im Bereich der medizinischen Literaturdokumentation, und zur Patentdokumentation eingesetzt.
der Forschungsdokumentation
S~mtliche Fragestellungen und Suchvor-
g~nge in zur Zeit auf 16 Magnetplattenspeicher
IBM 3330-11 gespeicherten Doku-
menten erfolgen in Echtzeitverarbeitung ~iber Bildschirmterminals unmittelbar im System-Benutzerdialog.
Die bisher erl~uterte umfassende Nutzung des IMS als Gesamtsteuerungssystem bringt allerdings ein ~berproportionales Ansteigen der CPU-Belastung dutch vermehrten systeminternen Verwaltungsaufwand mit sich, so dab wir uns jetzt Gedanken machen, bis zu welchem Zeitpunkt das IMS bei unserer geplanten Vermehrung der Datenstationen - selbst bei einer /370-168 - noch in der Lage sein kann, alle Anforderungen in einem System zu bedienen.
Sollte demzufolge die Leistungsf~higkeit und die jetzige Arbeitsweise des IMS seitens IBM nicht entscheidend ge~ndert werden, dann bliebe nur eine verh~itnism~ig
unwirtschaftliche Aufteilung der IMS-Anwendungen auf zwei Systeme. Dem
247
sind allerdings
sowohl dutch die Datenbankverwaltung
AnwendtIngssysteme
eindeutige Grenzen gesetzt.
scheint uns hingegen in der Realisierung lagerung geeigneter
Funktionen
des IMS als auch dutch die
Ein wesentlich
sinnvoller Weg
einer Konzeption zu liegen, die eine Aus-
in intelligente
und mit dedizierten Dateien ausge-
stattete Datenstationen
erm6glicht.
Dadurch kann das Zentralsystem
meidbaren Transaktionen
entlastet werden als aber auch die Funktionsf~higkeit
sowohl yon verdes
Systems dutch eine zumindest tempor~r m6gliche unabh~ngige Arbeit an den peripheren Datenstationen
verbessert werden.
In diese Richtung weisende Konzeptionen wurden ja auch beispielsweise Systemen IBM 3790 oder auch 3770 angek~ndigt. im IMS sowohl hinsichtlich
Allerdings
seiner Datenbankkonzeption
steuerung eine volle Integration der M6glichkeiten eines echten hierarchisch
gegliederten
erwarten wit dann auch
als auch bei der Nachrichten-
dieser Terminalcomputer
Systemverbundes.
W~nschenswert
n~mlich ein voller Einbezug der vom Vorrechner gef~hrten Datenbest~nde tenbankverwaltungssystem der Mutter-Datenbank
mit den SNA-
des IMS, so dab alle entsprechenden
im Sinne
w~Lre dann in das Da-
Dateiver~nderungen
in
auch sofort f~r die dedizierte Datei mit ausgel6st w~rden und
umgekehrt.
Wenn wit die £tbliche Definition haber am System voneinander system miteinander verbunden
eines Teilhabersystems,
bei d e m v e r s c h i e d e n e
abh~ngig sind und 0ber ein gemeinsames sind, hinsichtlich
nisation in Unserem Unternehmen betrachten,
einer umfassenden Datenbankorga-
dann sind viele Arbeiten,
Job Processing yon den Werken aus im Rahmen des Systemverbundes eher Teilhaber als Teilnehmersysteme.
Teil-
Informations-
die im Remote
betrieben werden,
Obwohl die DatenObertragung
stapelorientiert
erfolgt und in der Regel auch v~llig unabh~ngige Programme aufgerufen werden, gibt es doch auch im RJE-Betrieb viele Anwendungen, dur- und Datenbanken
die auf gemeinsame
zentrale Proze-
zugreifen.
Diese Arbeiten k6nnen derzeit aber nut im Rahmen der RJE-Prozeduren ASP-System abgewickelt werden,
das einerseits
ten wird als das iMS, andererseits benutzt,
auf einer ganz anderen Anlage gefah-
aber auch eine v611ig andere 0bertragungstechnik
so dab nicht einmal eine gemeinsame Leitungsbenutzung
m6glich ist, obwohl beide Systeme Bestandteile
als auch aus organisatorischen
gehend eine gemeinsame Leitungssteuerung
von IMS und RJE
des gleichen Gesamtinformations-
systems und der gemeinsamen Datenbankorganisation wirtschaftlichen
auf unserem
sind. Hier ist es sowohl aus Gr~nden dringend erforderlich,
und m6glichst auch Datenbankverwaltung
fur IMS und RJE herbeizuf~hren°
Ein solches Paket yon Verfahren
fur eine einheitliche Daten~bertragungssteuerung
um-
248
ist ja inzwischen yon IBM als "System Network Architecture"
angek~ndigt.
Aller-
dings scheint ~ns diese Bezeichnung wenigstens
bisher noch ein vielversprechendes
Schlagwort
des Wortes "Network"
zu sein, das vor allem hinsichtlich
noch mit sehr
viel Inhalt ausgef011t werden muB; denn wir ben6tigen
in unserem Unternehmen
Rahmen des Systemverbundes
Konzept fur die Daten~ber-
tragungssteuerung
nicht nur das einheitliche
und die hierarchisch
und untergeordneten
Datenstationent
geordnete Kommunikation
im
zwischen Rechner
sondern wit erwarten vor allem aus Gr~nden
einer erh~hten Sicherheit und Verf~3gbarkeit ein Netzwerksystem
zwischen gleichbe-
rechtigten Systemen mit gemeinsam benutzbaren Programmbibliotheken
und Datenban-
ken.
Ich hoffe, dab ich trotz des ~berwiegend tische Erfahrungen Referates
allgemein gehaltenen und nur auf prak-
oder konkrete Planungsans~tze
auch den anwesenden Wissenschaftlern
Best~tigungen
ihrer Auffassungen
ken und Datenkon~nunikation
bei HOECHST ausgerichteten
und Software-Architekten
einige
oder auch einige Anregungen zum Thema Datenban-
geben konnte.
Nutzun~ von Datenbanken im nicht-wissenschaftliche ~ Bereich einer Hochschule
Eckhard Edelhoff, Universit~t Dortmund
Zusammenfassun@ Ziel des Vortrages ist es darzustellen, in welchem Umfang und zu welchem Zweck Datenbanken in Verwaltung und Bibliothek einer Hochschule eingesetzt werden k~nnen. Genauer eingegangen wird auf die datenverarbeitungsrelevanten Fragen im Bereich einer Hochschulbibliothek, insbesondere auf die Auswirkung unterschiedlicher Datenverarbeitungstechniken.
Inhaltsverzeichnis I.
Der Gesamthochschulbereich Dortmund
2.
Projekte ±m Bereich der Bibliotheken und Veraltungen
3.
Das Bibliotheksprojekt
3.1
Stand der Automatisierung im Bibliotheksbereich
3.2
Buchlauf in einer konventionellen Bibliothek
3.3
Buchlauf unter Ausnutzung eines On-line-Systems
3.4
Die Bildschirme von DOBIS
3.5
Die Datenbank im Dortmunder Bibliothekssystem
4.
Das Verwaltungsprojekt
250
I.
DER GES~6THOCHSCHULBEREICH
Der G e s a m t h o c h s c h u l b e r e i c h der U n i v e r s i t ~ t
Dortmund,
der F a c h h o c h s c h u l e
Hagen,
des G e s e t z g e b e r s
Gesamthochschule
Das R e c h e n z e n t r u m einer
Dortmund,
besteht
Hochschule
der F a c h h o c h s c h u l e
Nach der A b s i c h t
Hilfe
Dortmund
aus
Dortmundr
der P ~ d a g o g i s c h e n
einer
DORTMUND
Ruhr,
sollen die g e n a n n t e n
integriert
an der U n i v e r s i t ~ t
IBM/370-158
Dortmund
die H o c h s c h u l e n
die U n i v e r s i t ~ t
Hochschulen
zu
werden,
Bielefeld
versorgt
seit
1973 m i t
des G e s a m t h o c h s c h u l b e r e i c h e s
und
seit
in Ha g e n m i t D a t e n v e r a r b e i t u n g s k a p a z i t ~ t
1975 die F e r n u n i v e r s i t ~ t
und den d a z u g e h ~ r i g e n
Dienst~
leistungen. Die B i b l i o t h e k e n in eine:n N e u b a u
im G e s a m t h o c h s c h u l h e r e i c h - zu einer
Einhelt
dab ab 1976 das B i b l i o t h e k s s y s t e m thek und ca. Die e i n z e l n e n
ab 2976
zusammengefaBt in D o r t m u n d
25 B e r e i c h s b i b l i o t h e k e n Hochschulen
werden
sein.
aus einer
- nach Einzug
Das bedeutet~ Zentralbiblio-
besteht,
im G e s a m t h o c h s c h u l b e r e i c h
haben
eigenstindige
Verwaltungen.
2.
PROJEKTE
In den J a h r e n
IM B E R E I C H
1971/72
wurde
DER B I B L I O T H ~ K E N
e n t s c h i e d e n r die B i b l i o t h e k e n
t u n g e n des G e s a m t h o c h s c h u l b e r e i c h e s den
zu b e s c h a f f e n d e n
UND V E R W A L T U N G E N
GroBrechner
Dortmund
und V e r w a l -
in die V e r s o r g u n g
einzubeziehen~
HierfNr
durch
sprachen
fol-
gende Gr~nde: - Es b e s t a n d
keine Aussicht
Bibliotheken
neben
und V e r w a l t u n g e n
- Bei e n t s p r e c h e n d e r
Leistungsf~higkeit
die A n w e n d u n g e n
der B i b l i o t h e k e n
den A n w e n d u n g e n
aus F o r s c h u n g
Ausnutzung.
einem GroBrechner
dedizierten
Rechner
einen
eines G r o B r e c h n e r s
und V e r w a l t u n g e n
und Lehre
zu einer
fir
zu beschaffen. f~hren
zusammen mit ausgewogenen
251
Als K o n s e q u e n z
zu d i e s e r
Entscheidung
- 1972 eine P r o j e k t g r u p p e in den B i b l i o t h e k e n
wurde
zur'Organisation
unter
der A r b e i t s a b l ~ u f e
BerHcksichtigung
der D a t e n v e r a r b e i -
t u n ~ g e m e i n s a m v o n den Bibliotheken und d e m R e c h e n z e n t r u m Einbeziehung
yon Mitarbeitern
- 1974 eine P r o j e k t g r u p p e von den V e r w a l t u n g e n , Mitarbeitern
der F i r m a
mit e n t s p r e c h e n d e m
dem R e c h e n z e n t r u m
der F i r m a
unter
IBM und Auftrag
unter
gemeinsam
Einbeziehung
yon
IBM und der H o c h s c h u l i n f o r m a t i o n s s y s t e m e
GmbH gegr~ndet.
3.
DAS
BIBLIOTHEKSPROJEKT
Die A u f g a b e n s t e l l u n g
'Organisation
tigung des E i n s a t z e s
der D a t e n v e r a r b e i t u n g '
Die B e a r b e i t u n g
bibliothekarischer
einer H o c h s c h u l e -
-
-
-
-
erfolgt
~ber
der A r b e i t s a b l ~ u f e
Objekte
zahlreiche
unter B e r ~ c k s i c h -
bedarf der Erl~uterung: zu/n Zwecke der N u t z u n g
in
Stufen
Literaturauswahl, Bestandskontrolle, Bestellung, Eingangsbearbeitung, Rechnungsbearbeitung,
- Sachkatalogisierung, - alphabetische
Katalogisierung,
- Einbandbearbeitung, - SchluBkontrolle, - Auskunft, - Ausleihe. F~r eine g e o r d n e t e liothekarischen zahlr e i c h e r Arbeitsplatz
Bearbeitung
Objekte
Kataloge,
Register
Die Redundanz
Ziel der E i n f H h r u n g
l i o t h e k e n muB d e s h a l b - die e i n z e l n e n - die m a n u e l l e fl~ssig
u.a.
Bibliotheken
und L i s t e n der jeweils
der bibdie F H h r u n g
fur S t a t i s t i k e n abgelegten
der D a t e n v e r a r b e i t u n g
am
Daten
in die Bib-
sein:
Arbeitsg~nge F~hrung
zu machen,
Zusammenhang verringern,
Identifikation
ist in k o n v e n t i o n e l l e n
Karteien,
erforderlich.
ist erheblich.
und e i n d e u t i g e
miteinander
der v e r s c h i e d e n e n
d e r e n Redundanz
zu verknHpfen, Auskunftsmittel
zu b e s e i t i g e n
mit deren Unterbringung
stehenden
~ber-
und den
im
Wegeaufwand
zu
252
d i e ~o,/nmunikation der B i b l i o t h e k den N u t z e r n
zu e r l e i c h t e r n
- die Nachhaltung yon Belegen, F~r d i e s e n Aussicht
u,~,
folgende
den
Lieferanten
zu v e r b e s s e r n
der e r f o r d e r l i c h e n
Meldungen
Zweck w u r d e n
bzw,
mit
und
und
S t a t i s t i k e n r das D r u c k e n
zu a u t o m a t i s i e r e n ~
Datenverarbeitungsm6glichkeiten
in
gestellt: Magnetplattenspeicher w~hrend
zur A u f n a h m e
der L i t e r a t u r b e a r b e i t u n g
- Sichtger~te
anfallenden
m i t der M ~ g l i c h k e i t ~
line am A r b e i t s p l a t z
jederzeit
und R ~ c k g e w i n n u n g Daten,
die g e s p e i c h e r t e n
verf~gbar
aller
zu m a c h e n
Daten bzw.
onzu
erg~nzen. Nutzung
Bei dieser
des M o n i t o r s
Entscheidung
wurden
CICS der Fi r m a
folgende
IBM.
Gesichtspunkte
- Zu d e m d u t c h L e h r e und F o r s c h u n g
bestimmten
zu b e s c h a f f e n d e n
GroSrechners
der
und der V e r w a l t u n g e n
Bibliotheken
intensive
Offnungszeit
der B i b l i o t h e k
aus d e m B e r e i c h
einf~gbare
ein/ausgabe~
fallen v e r t e i l t
~ber d e r e n g e s a m t e
an.
genannten
die B i b l i o t h e k mittel
sind A n w e n d u n g e n
des
Komplemente.
- Die A u f g a b e n
-Die
ber~cksichtigt: Aufgabenprofil
Ziele der E i n f ~ h r u n g sind nur dann
der R e c h n e r
der D a t e n v e r a r b e i t u n g
erreichbar,
am A r b e i t s p l a t z
wenn
jederzeit
in
als A u s k u n f t s zur V e r f ~ g u n g
steht.
3.1
Stand der A u t o m a t i s i e r u n g
Versuche,
bibliothekarische
verarbeitung Zu n e n n e n
Arbeiten mit
zu r a t i o n a l i s i e r e n ,
sind uoa.
im B i b l i o t h e k s b e r e i c h
haben
Hilfe der e l e k t r o n i s c h e n
eine ~ber
die auf B a t c h - V e r f a h r e n
10-j~hrige
sich s t ~ t z e n d e n
Daten~
Geschichte, Systeme
(Off-line-Systeme) ~
und
- der B i b l i o t h e k
der U n i v e r s i t ~ t
- der B i b l i o t h e k
der W a s h i n g t o n
University
- der B i b l i o t h e k
der U n i v e r s i t y
of Illinois
in neuerer
Bochu~,
Zeit die auf R e a l z e i t - V e r f a h r e n
School
of Medicine,
sich s t ~ t z e n d e n
Systeme
253
und Versuche(~n-line-Systeme): der Bibliothek der Stanford University, des Ohio College Library Centers r der Bibliothek der Universit~t Bielefeld, des IBM Labors in LOS GATOS. Es handelt sich hierbei,
abgesehen v o n d e r
Datenverarbeitungstechnik,
um im Ansatz und Umfang sehr unterschiedliche die Automatisierung
Systeme- Systeme, die auf
eines Teils der Arbeitsg~nge ausgelegt sind, bzw.
Systeme, die alle Arbeitsg~nge umfassen,
Das Dortmunder Bibliothekssystem
ist in Anlehnung an die mit dem in dem I ~ - L A B O R
entwickelten System
gemachten Erfahrungen aufgebaut worden. Off-line-Systeme
Off-line-Systeme
sind unabh~ngig yon der verwandten Technologie nicht
in der Lage, die Handhabungen einer konventionellen Bibliothek grunds~tzlich zu ver~ndern bzw. zu erleichtern. Mit dieser Technik k~nnen wesentlich nur jene Arbeitsg~nge rationalisiert werden, deren Ablauf an die burn around time des jeweiligen Rechners angepa~t werden kann: -
Die F0hrung der Kataloge,
Karteien und Register kann bez~gl.
der jeweils notwendigen Ver~nderungen erleichtert werden. Am Arbeitsplatz
sind sie fur den Bibliothekar nach wie vor er-
forderlich. - Die Wiederverwendung m~glich,
einmal erhobener Daten ist grunds~tzlich
jedoch in der Regel an die Ausnutzung externer Daten-
tr~ger, wie Lochstreifen,
Lochkarten u~a. gebunden. Korrekturen,
Erg~nzungen und umfangreiche Kategorienschemate
fOhren zu Um-
st~nd!ichkeiten und Schwierigkeiten und doppelten Arbeitsvorg~ngen. - Es gibt keine ~ber die konventionelle Handhabung hinausreichende
M~glichkeit,
Informationen ~ber den Stand der jeweiligen
Bearbeitung eines Objekts verf~gbar zu halten.
254
On-line-Systeme
On-line-Systeme sind in der Lage, die komplizierten konventionellen Handhabungen des Buchlaufes abzubauen: Der Bibliothekar ben~tigt an seinem Arbeitsplatz physisch keine Karteien, Register und Kataloge. Die Wiederverwendbarkeit yon einmal erfaBten Daten ist sichergestellt, ebenso deren ~nderungen und Erg~nzungen. Es gibt eine einfache M~glichkeit der Information ~ber den Stand der Bearbeitung eines bibliothekarischen Objektes.
Informations-
l~cken sind nicht vorhandeno Beide Systeme nutzen die Technologie des jeweils zur VerfHgung stehenden Rechners in v~llig unterschiedlicher Weise aus° W[hrend On-line~Systeme wegen ihres g r ~ e r e n
Softwarekomforts einen GroBrechner als Tr~ger be-
n~tigen, k~nnen Off-line-Systeme prinzipiell auch auf f0r diesen Zweck spezialisierten Rechnern der mittleren Datentechnik gefahren werden.
3°2
Buchlauf in einer konventionel!en Bibliothek
Es sollen bier als Grundlage for die nachfolgenden Abschnitte~ die w~hrend eines Buchlaufes erforderlichen Arbeitsvorg~nge in einer konventionellen Bibliothek dargestellt werden. Es handelt sich hierbei jedoch notwendig um eine unvollst~ndige, auf Monographien beschr~nkte Schilderung~ o Buchauswahl - Die Literaturauswahl erfolgt dutch die Fachreferenten anhand von Bibliographien, Prospekten, w~chentlichen Verzeichnissen der Deutschen Bibliothek u.~. unter Ber~cksichtigung der Benutzerw~nsche und der Ausleihstatistik~ -
Neben den jeweils yon den Fachreferenten vorzugebenden bestelltechnischen Daten~ wie Bestellart~ Anzahl der Exemplare, Standort, Fachgrupper gehen im Regelfalle die oben genannten gekennzeichneten Unterlagen an die Abteilung Erwerbung.
255
o Buchbestellung - In der A b t e i l u n g E r w e r b u n g w e r d e n die A n s c h a f f u n g s v o r s c h l ~ g e der R e f e r e n t e n geprHft.
Hierbei sind unter U m s t ~ n d e n b i b l i o g r a p h i s c h e
R e c h e r c h e n d u r c h z u f ~ h r e n und a n s c h i i e B e n d ist am a l p h a b e t i s c h e n Katalog,
an der I n t e r i m s k a r t e i und an der B e s t e l l k a r t e i eine Be-
s t a n d s k o n t r o l l e durchzufHhren. - E n t s p r e c h e n d dem Ergebnis der B e s t a n d s k o n t r o l l e wird die B e s t e l l u n c durchgefHhrt.
F o l g e n d e Daten sind u.a. Verfasser
erforderlich:
(bei V e r f a s s e r s c h r i f t e n )
Titel Verlag Ort, E r s c h e i n u n g s j a h r Auflage Bestellart S e r i e n t i t e l ~ ( b e i Serien) Bandangabe Haushaltstitel Lieferant Quelle - K o p i e n der B e s t e l l u n g w e r d e n in die B e s t e l l k a r t e i und in die Buchh~ndlerkartei
eingelegt.
O Bucheingang - In der A b t e i l u n g E r w e r b u n g wird das Buch auf B e s c h ~ d i g u n g oder Fehldruck hin HberprHft.
Ein L a u f z e t t e l zum Zweck der K o n t r o l l e des Ge-
s c h ~ f t s g a n g e s und der E i n t r a g u n g yon Daten, die w ~ h r e n d des Ges c h ~ f t s g a n g e s anfallen,
wie z.B. Signatur und S a c h k a t a l o g i s i e r u n g s -
daten, wird beigegeben. - Es wird eine B e s t a n d s k o n t r o l l e f~r u n v e r l a n g t e zur A n s i c h t - S e n d u n g e n durchgefHhrt.
U n v e r l a n g t e Sendungen und zur A n s i c h t - B e s t e l l u n g e n
w e r d e n den F a c h r e f e r e n t e n zur K a u f e n t s c h e i d u n g vorgelegt. - FHr alle B~cher mit p o s i t i v e r K a u f e n t s c h e i d u n g werden die Buchh~ndlerkartei,
die B e s t e l l k a r t e i und die I n t e r i m s k a r t e i auf den neu-
esten Stand gebracht. -
Die BHcher werden inventarisiert.
Die R e f e r e n t e n e r h a l t e n die BHcher zur w e i t e r e n Bearbeitung.
256
O Sachkatalogisierung - Die R e f e r e n t e n k l a s s i f i z i e r e n die B[cher n a c h ihrem inhalt und legen den Inhalt der N e u e r w e r b u n g s l i s t e -
fest,
Die BUcher w e r d e n zur Titelaufnah/ne fur die a l p h a b e t i s c h e n K a t a l o g e weitergeleitet,
o Titelaufnahme - E n t s p r e c h e n d den Regeln fdr die a l p h a b e t i s c h e K a t a l o g i s i e r u n g w e r ~ den die zur I d e n t i f i z i e r u n g der BUcher r e l e v a n t e n Daten erfaBt. Es h a n d e l t sich h i e r h e i um eine erneute A u f n a h m e jener Daten~ bereits w [ h r e n d des B e s t e l l v o r g a n g s
die z,T,
a n g e f a l l e n sind,
- Die Zettel fur die a l p h a b e t i s c h e n K a t a l o g e und den S a c h k a t a l o g w e r d e n e r s t e l l t und in den z e n t r a l e n a l p h a h e t i s c h e n K a t a l o g t den z e n t r a i e n S a c h k a t a l o g und den S t a n d o r t k a t a l o g eingefUgto
Bei B U c h e r n m i t S o n d e r s t a n d o r t sind z u s ~ t z l i c h zettel
f~r
die a ! p h a b e t i s c h e n K a t a l o g e der Abtei!ungen~ den L e s e s a a l k a t a ! o g , den K a t a l o g der L e h r b u c h s a m m l u n g und den K a t a l o g der H a n d a p p a r a t e
erforderlich, S o n d e r s t a n d o r t e v e r f U g e n im w e s e n t l i c h e n a u s s c h l i e ~ l i c h 0ber einen a k t u e l l e n Buchbestand,
A u s l a g e r u n g e n yon B U c h e r n in die Z e n t r a l b i b -
liothek sind in u m f a n g r e i c h e m M a B e erforderlich.
Bei ~ n d e r u n g e n des
S t a n d o r t e s eines Buches sind die e n t s p r e c h e n d e n K o r r e k t u r e n
in allen
K a t a l o g e n vorzunehmen.
- Die Bdcher w e r d e n an die E i n b a n d s t e l l e w e i t e r g e g e b e n . o Einbandbearbeitung - Soweit HierfHr
notwendig,
w e r d e n die BHcher zum B u c h b i n d e r w e i t e r g e l e i t e t ,
ist eine B u c h b i n d e r k a r t e i erforderlich.
- Die B~cher w e r d e n nach D u r c h f ~ h r u n g der B u c h b i n d e r a r b e i t e n an die Beschriftungsstelle weitergeleiteto
257 o Beschriftung Es werden die Signaturen und die R~ckentitel aufcebracht und die
-
Stempelungen gemacht. - Die B~cher werden der SchluBkontrolle
zugefHhrt,
o SchluBkontrolle - Die SchluBkontrolle
ist eine formale Kontrolle z~m Zweck der Uber-
pr~fung des Gesch~ftsganges
anhand des Laufzettels und zur Uber-
prHfung Yon Buchdaten und Daten auf den Katalogzetteln, o Benutzung -
Die B~cher werden in den Standorten aufgestellt,
- F~r die Zwecke der Ausleihe werden Benutzerkartei und ein Couponregister gef~hrt,
3,3
Buchlauf unter Ausnutzun9 eines 0n~line-Systems ,
,
,
~
in d±esem Ahschnitt sollen - heschr~nkt auf Monographien ~ m~gliche Ver~nderungen in der Handhahung des Buchlaufes unter Einsatz eines On-lineSystems dargestellt werden. Hier, ebenso wie im vorangegangenen Abschnitt, bleibt die Darstellung wegen des m~glichen Detailierungsgrades st~ndig.
unvoll-
Wesentliche Merkmale organisatorischer Ver~nderungen sind; -
Einmal erfaBte Daten k~nnen aufbereitet jederzeit zurHckgewonnen werden.
-
Das System h~it jederzeit Informationen Hber den Bearbeitungsstand aller erfaBten Objekte bereit,
- Merkmalsgebundene
~berwachungen und uberprHfungen k ~ n n e n a u t o m a t i s c h
durchgefHhrt bzw. unterstHtzt werden.
Die Auswirkungen dieser Ver~nderungen auf den Buchlauf sind nach~olgend an einigen Beispielen dargestellt: - Daten, die in der Erwerbungsabteilung
fur zu bestellende bibliothe-
karische Objekte ermittelt werden, kSnnen fur die aiphabetische Katalogisierung genutzt werden. Hierdurch ergibt sich eine Verlagerung der Katalogisierungsarbeiten in die Erwerbungsabteilung, Dies wird zus~tzlich beg~nstigt durch die M0glichkeit,
fur die
258
Erwerbung
und die
Magnetb~nder - Nicht
der D e u t s c h e n
gleichzeitig
verschiedenen regelm~Sig !agen. Daten den.
Kata!oglsierung
Anforderungen
Bereichsbibliotheken
zur m e h r f a c h e n
Mit Hilfe
eines
Konventionell
Die B e a r b e i t u n g
rung v o r g e n o ~ m e n
Eine
Titels
die einmal
GrOnden
vonder
Bearbeitung
Katalogisie-
ist ~berfl~ssig. unterst~tzt
u,~.
k~nnen
Auf den B u c h l a u f sichtigter
und
Bestellungen,
individuell
durchgef~hrt bezogenen
Katalogdaten~bernahme
den v o r l i e g e n -
(yon u,a, aufg~mndbeab-
von F r e m d b ~ n d e r n automatisch
nicht d u r c h g e f ~ h r -
erste!it
werden,
von Katalogisierungsarbeiten
Erwerbungsabteilung
f~hrt die v o l l k o m m e n e
Buchlaufes
schnelleren
zu einer
Buchbinderauftr~ge
entsprechend
Erinnerungslisten
mit der V e r l a g e r u n g
durch
werdeno
ter K a t a l o g i s i e r u n g s a r b e i t e n ) k @ n n e n Zus a ~ n e n
Der V o r -
werden
von F r e m d l e i s t u n g e n .
vorgenommener
automatisch
den G e g e b e n h e i t e n
wer-
d u r c h die
kann w i e d e r u m
bzglo
erhobenen
kann nach
von D u b l e t t e n
Ausnutzung
System
n i c h t m~glich.
zur A n s i c h t - S e n d u n g e n
zus~tzliche
f~r die
wiederverwendet
zur V e r m e i d u n g
- Die M a h n u n g e n
die
von B e s t e l l u n t e r -
gang der K a t a l o g i s i e r u n g die a u t o m a t i s i e r t e
-
k~nnen
Aufwand
d u r c h den F a c h r e f e r e n t e n
werden.
z~B,
im k o n v e n t i o n e l l e n
aus t e c h n i s c h e n
von u n v e r l a n g t e n
eines
Ausfertigung
zus~tzlichen
ist dies
der K a u f e n t s c h e i d u n g
f[hren
manuellen
On-line-Systems
ohne
wie
B i b l i o t h e k r auszunutzen.
eintreffende
in der Regel
Erwerbung
Fremdleistungen~
in die
Durchsichtigkeit
Verf~gbarkeit
des
des b i b l i o t h e k a r i -
schen Objekteso - Im B e r e i c h
der A u s k u n f t
k~nnen Aussagen
die V e r f ~ g b a r k e i t
bibliothekarischer
jeweils
Standes
aktuellen
- In der A u s l e i h e erhebungen Standes
loka!
vorgenommen
sind R e s e r v i e r u n g e n ~ und n i c h t - l o k a l
vornehmbar°
~ber das V o r h a n d e n s e i n
Objekte
unter E i n s c h l u S
und
des
werden. Sperrungenr
aufgrund
des
Mahnungen,
jeweils
Geb~hren-
aktuellen
259
3.4
Die Bildschirme yon DOBIS
Das Dortmunder Bibliothekssystem (DOBIS)
ist mit dem Ziel entwickelt
worden, den Rechner als Speicher fur alle in der Bibliothek w~hrend des Buchlaufes anfallende und fur diesen verwendbare Daten einzusetzen. Zum Absetzen und zur Wiedergewinnung von Daten stehen Sichtger~te zur VerfHgung. Es sind drei Arten yon Bearbeitungsvorg~ngen zu unterscheiden: - Vorg~nge ohne Nutzung der M~glichkeiten der Datenverarbeitung. Hier sind u.a. bibliographische Recherchen zu nennen. -
Vorg~nge unter Ausnutzung der On-line-Funktionen des Systems. Das sind: Registersuche Bestellung Zugang Zeitschriftenbearbeitung Katalogisierung Rechnungsbearbeitung Einbandbearbeitung Ausleihe Fernleihe
-
Vorg~nge unter Ausnutzung der Off-line-Funktionen des Systems. Diese sind Drucken u.a. von: Bestellungen Katalogen Mahnungen Uberwachungslisten und Statistiken
Die Verf~gbarkeit von Datenstationen ersetzt dem Bibliothekar am Arbeitsplatz die bisher benutzten manuell erstellten Kataloge, Register und Karteien. Die in Bildschirmdialoge umgesetzten Arbeitsabl~ufe erfordern allerdings eine groBe Anzahl verschiedenartiger Bildschirmanzeigen. Schwierigkeiten, die sich hieraus ergeben, werden dadurch vermieden, dab alle Anzeigen einem einheit!ichen Aufbau folgen:
260
Daten desselben dem Schirm.
Typs
Dieser
erscheinen
stets
ist in drei Teile
an der g l e i c h e n
Stelle
unterschiedlicher
auf
Funktion
gegliedert: Kopfteil Raum f~r w e c h s e l n d e
Informationen
Anweisungsteil Der K o p f t e i l
umfaBt
deren gerade
ablaufende
gezeigten
Angaben
~ber die a n g e s p r o c h e n e n
Unterfunktion
f~r die w e c h s e l n d e n
weiligen
Arbeitsschritt
fiche Angaben, Lieferanten
z.B.
oder
Daten
zurn~chsten
Wie bereits
eines
enthilt
Aktion
der an-
weiteren
mehrfach
betont,
m~ssen
in welcher
z.B.
Daten
dab alle Angaben,
Objektes
zur E i n g a b e
nur einmal
bestimmter
die
eingegeben
zur Verf~gung°
sind,
werden.
Dieses
zur W i e d e r a u f f i n d u n g
im System w e s e n t l i c h ISSN,
wird der Bear-
Bildschirmausgabe.
sie an allen A r b e i t s p l i t z e n
zur Folge,
~ber einen
W e i s e der Be-
In einer T e x t z e i l e
aufgefordert~
dem je-
u n t e r s c h i e d -~
usw,
dar~ber,
kann.
einer
!iothekarischen
entsprechend Funktion
Registers r Informationen Dokument
Hinweise
fortsetzen
enth~it
der a b l a u f e n d e n
zum A u f r u f
Von da an s t e h e n zip hat
Tei!e
den D i a l o g
oder
Informationen
innerhalb
ein besti_ntmtes
Der A n w e i s u n g s t e i l
beiter
und eine C h a r a k t e r i s i e r u n g
Bildschirmmaske
Der Raum
arbeiter
DOBIS~Funktionen~
Prin-
eines
bib-
wie:
ISBN
Titel Personen,
K~rperschaften
Signaturen Nummern,
Abk~rzungen
Sachkatalogdaten sorgf~itig Speicherung
auf Fehler gleicher
dberprHft Daten
de B e a r b e i t u n g s f e h l e r Angabe,
die
nannten
gezwungen~
automatisch
die E i n g a b e
zur E i n g a b e
eines
zahlreichen
oder
Formen
e n t we d e r
und daraus
zu verwenden.
bei
sind die oben
folgenjeder ge-
in das b e t r e f f e n -~
dadurchr
benutzt,
eine
dab das S y s t e m
oder der B e a r b e i t e r
wird.
Angaben
zu erleichtern,
werden
wird d e s h a l b
- in diese
aufgefordert
vom Bibliothekar
Versionen
zur E i n s i c h t n a h m e
zur R e g i s t e r s u c h e
bibliothekarischen
standardisierte
und k ~ n n e n
Registers
- f~hrt~
Suchwortes
Hierdurch
Der B e a r b e i t e r
Das g e s c h i e h t
mit den F o r m v o r s c h r i f t e n chert
eines
gespeichert
de R e g i s t e r
Bei
in v e r s c h i e d e n e n
vermieden.
zur ~ n d e r u n g
Daten
w e r d e n mHsseno
sind b e s t i m m t e
Abk~rzungen
Um dem Bearbeiter sind diese
b e n u t z t werden.
den U m g a n g
im S y s t e m gespei-
261
3.5
Die Datenbank im Dortmunder Bibliothekssystem
Unter den bereits im Kapitel 3 'Bibliotheksprojekt' -
-
-
Ersetzen aller Kata!oge, Wiederverwendung
genannten Pr~missen
Karteien usw.
einmal erhobener Daten
Durchsichtigkeit des Buchlaufes
ist es erforderlich,
den gesamten Buchbestand d.h. den Inhalt s~mt-
licher konventioneller
Informations- und Dokumentationsmittel
on-line verf~gbar zu machen.
integriert
Bei einem Buchbestand yon I Mio. B~nde
bedeutet dies, dab Sekund~rspeicher
in der Gr~Benordnung yon
6-8 x Io 8 Bytes erforderlich wird. Dem hohen Speicherbedarf auf der einen Seite stehen sehr komplexe, durch vielf~itige Regeln und Zw~nge festgelegte Datenstrukturen und Abfrageerfordernisse gegen~ber.
Da zwischen den Bibliotheken ein enger Daten-
austausch national und international stattfindet,
kann hiervon nicht
ohne eigenen Schaden abgewichen werden. Die Datenbank im Dortmunder Bibliothekssystem kennt die nachfolgend nach ihren Inhalten unterschiedenen Dateien: -
Hauptdateien zur Aufnahme von u.a.: bibliographischen
Informationen
Bestellinformationen Ausleihinformationen Rechnungsinformationen Druck-und
Terminwarteschlangen.
Die bibliographischen speichert.
Informationen sind in zwei Hauptdateien ge-
Die logischen SMtze in diesen Dateien werden fortlau-
fend numeriert.
Die einander entsprechenden S~tze sind miteinander
verkn~pft.
Jeder physischen bibliographischen Einheit entspricht
mindestens
je ein logischer Satz in beiden Dateien.
Entsprechend
der Komplexit~t der bibliographischen Gebi!de sind auch die logischen S~tze innerhalb einer Datei miteinander verbunden. zu unterscheiden: Monographien Monographien mit beigef~gten Werken Mehrb~ndige Werke mit und ohne eigenen StHcktitel Schriftenreihen Zeitschriften
Es sind
262
Monographien
stellen den Normalfall
beiden H a u p t d a t e i e n ren S~tzen
je ein logischer
zugeordnet werden.
heit m e h r e r e
logische
S~tzen erforder!ich. plexe Struktur
dar,
Jedem Exemplar kann in den
Satz ohne V e r k n ~ p f u n g e n
S~tze bzw~ V e r k n ~ p f u n g e n Im N a c h f o l g e n d e n
zu anderen
ist ein Beispiel
aus dem Bereich der Schriftenreihen
<
abgebildet:
S chriftenreihe
(Serie)
~ St~ickt.I
I
St[ckt. 2
logischen
fdr eine kom-
untergeordnete Schriftenreihe mit selbst~nd±ge~Titel
Sticktitel
zu ande-
In allen anderen F~llen sind je physischeEin-
Unterreihe B (unselbst~ndig)
St~ckt.2 0
St~ckt. 3
d
Tell I (unselbst~ndig)
l
Teil 2 (StUckt.)
mit beigefi%~tem Werk
l
0t
Unterreihe A (unselbstZmdig) Chemi e
% Band I
Band 2
Band 3
Band1
Anm. : Gek0rztes Beispiel aus der unter Ziffer 12 zitierter~ Schrift.
Band2
263
Zugriffsregister
-
zur Aufnahme von u.a.:
Personen/Kooperationen Titeln Schlagw~rtern Signaturen Verlagen ISBN/ISSN Benutzername Lieferanten Die Zugriffsregister dienen auf der einen Seite dem Wiederauffinden der bibliographischen Einheiten.
Sie bieten aus dieser
Sicht wesentlich mehr M~glichkeiten als die konventionellen formationsmittel lichkeit,
In-
aufgrund der gro8en Vielfalt und z.B. der M~g-
das Titelregister um permutierte Titel mit entsprechen-
den Verweisungen
zu erweitern. Andererseits dienen die Zugriffs-
register zur Datenaufnahme.
Z.B. sind in den ersten beiden Regi-
stern neben der Ansetzungsform (beschreibend)
gespeichert.
hin als Hilfsmittel,
(ordnend)
auch die Vorlageform
Beide Register zusammen dienen weiter-
aufgrund deren Sortierung der alphabetische
Kataiog erstellt werden kann. Alle Hauptdateien und Zugriffsregister besitzen einen VSAM-~hnlichen Index, der mehrere Indexstufen umfassen kann. Auf diese Weise ist ein schneller Zugriff auf den jeweils ben~tigten physischen, bzw. variabel langen logischen Satz gew~hrleistet.
-
Dateien zur Aufnahme von Code-Tabellen. Diese Tabel!en dienen u.a. der Platzersparnis
in den Hauptdateien.
264
4.
Das
Vel-waltungsproj ek~
Wie bereits eingangs gesagt~ trag der
wurde
1974 eine P r o j e k t g r u p p e m i t dem Auf-
' O r g a n ± s a t i o n der A r h e i t s a b l ~ u f e
in den V e r w a l t u n g e n unter
B e r H c k s i c h t i g u n g der D a t e n v e r a r b e i t u n g ' g e g r ~ n d e t .
Zu dem B i b l i o t h e k s -
projekt bestehen folgende Unterschiede:
-
W e g e n des A r b e i t s u m f a n g e s
k~nnen die v e r s c h i e d e n e n
F u n k t i o n e n der H o c h s c h u l v e r w a l t u n g e n nicht g l e i c h z e i t i g o r g a n l s i e r t werden.
Ihr s c h w [ c h e r e r innerer Zusammen-
hang m a c h t d a r ~ b e r hinaus eine n a c h t r [ g l i c h e Integration, soweit ein e n t s p r e c h e n d e r Rab~en g e s c h a f f e n w u r d e ~ m~glich. -Aufgrund
der H o c h s c h u l s i t u a t i o n
des Systems,
sollen nach F e r t i g s t e l l u n g
bzw, d e s s e n T e i l e , d i e e i n z e l n e n V e r w a l t u n g e n
u n a b h ~ n g i g v o n e i n a n d e r a r b e i t e n k~nnen. - Die O r g a n i s a t i o n s a r b e i t wird e r s c h w e r t d u r c h die u n t e r s c h i e d l i c h e n H a n d h a b u n g e n von V e r w a l t u n g s v o r g ~ n g ~ i n
den
e i n z e l n e n Hochschulen.
Als erste F u n k t i o n w u r d e der P e r s o n a l - und der S t u d e n t e n b e r e i c h in Angriff genommen,
Hierbei w u r d e n die yon der F i r m a H o c h s c h u l i n f o r m a t i o n s -
systeme G m b H und den S t a t i s t i s c h e n A m t e r n e n t w i c k e l t e n N o r m u n g e n von B e g r i f f e n und S c h l ~ s s e l n b e r ~ c k s i c h t i g t .
F o l g e n d e M ~ g l i c h k e i t e n der D a t e n v e r a r b e i t u n g
stehen der P r o j e k t g r u p p e
zur Verf~gung:
- Die D a t e n b a n k IMS der Firma IBM. - Plattenspeicher
in dem b e n 6 t i g t ~ n Umfang.
- Der Einsatz des R e c h n e r s ist in den e i n z e l n e n F u n k t i o n e n so zu planen~ dab die A n w e n d u n g e n v o r w i e g e n d in der zweiten und d r i t t e n Schicht g e f a h r e n w e r d e n k~nnen,
265
Literaturanglaben
Elektronische Datenverarbeitung in der Universit~tsbibliothek Bochum. Hrsg. von G~nther Pflug u. Bernhard Adams. Bochum 1968. Alexander, R.W.: Library Management System (LMS): Descriptive specifications for an on-line, real-time integrated system. Los Gates, Cal.: IBM o.J. Experimental Library Management System (ELMS): Librarian's User Manual. Los Gates, Cal.: IBM 1972. Datenerfassung und Datenverarbeitung in der Universit~tsbibliothek Bielefeld. Hrsg. yon Elke BonneB u. Harro Heim. Pullach bei MHnchen 1972. (Bibliotheksstudien. Bd. IA.) Bibliographic Automation of large library operations using a time-sharing system (BALLOTS): Phase I. Final Report. Stanford, Cal. 1971. Bibliographic Automation of large library operations using a time-sharing system (BALLOTS): Phase 2, part I. Final ~eport. Stanford, Cal. 1972. First annual report of the BALLOTS project to the National Endowment for the Humanities. Stanford, Cal. (1973). The Shared Cataloging System of the Ohio College Library Center. Frederick G. Kilgeur, Philip L. Long u.a. in: Journal of Library Automation. Vol. 5, No. 3, 1972. An automated on-line circulation system. Hoadley and A. Robert Thorson. The Ohio Libraries 1973. (Proceedings and Papers Held at The Ohio State University Sept.
Ed.: Irene Braden State University of an Institute 13-14, 1971.)
IO
Ohio College Library Center. Annual Report 1973/1974.
11
Bibliotheksautomatisierung in den USA und in Kanada. Hrsg. yon Walter Lingenberg. Pullach bei M~nchen 1973. (Bibliothekspraxis. Bd. 10.)
12
Deutsche Forschungsgemeinschaft. Bibliotheksausschu8. Maschinelles Austauschformat fHr Bibliotheken (MAB I). Berlin 1974.
13
Empfehlungen f0r den Einsatz der Datenverarbeitung in den Hochschulbibliotheken des Landes Nordrhein-Westfalen. Hrsg. v o n d e r Planungsgruppe Bibliothekswesen im Hochschulbereich NRW. D0sseldorf 1974.
14
Jedwabski, Barbara: DOBIS - ein integriertes On-lineBibliothekssystem. in: 10 Jahre Universit~tsbibliothek Dortmund. Zum 1.6.1975 hrsg. von Valentin Wehefritz. Dortmund 1975.
15
DOBIS. Anwendungsbeschreibung. (GAP. Application Guide.) IBM 1975. (Erscheint Ende 1975)
16
DOBIS. Systembeschreibung. (Erscheint Ende 1975)
(GAP. Systems Guide.)
IBM 1975.
Einsatz
eines Datenbanksystems
Roll HeitmHller,
].
62 Wiesbaden,
DIE HESSISCHE
Die Hessische
beim Hessischen Am Hochfeld
Landeskriminalamt
12
POLIZEI
Polizei
ist seit I. Januar
d.h. alle polizeilichen
Einrichtungen
1974 eine staatliche
Polizei,
werden vom Land Hessen unterhal-
ten. Bis zum 31o Dezember
1973 gab es neben der staatlichen
Polizeidienststellen
in bestimmten
Im Bereich der Verbrechensbek~mpfung den besonderen wendigen
gesetzlichen
Informationen
allen interessierten
fNr bestimmte
k~mpfung und Aufkl~rung ermittlung). trag,
2.
RSCKBLICK
Die Einf@hrung
hat das Hessische der Hessischen
der EDV bei der Hessischen
Bundesverwaltung
0berlegungen
(so z.B. zur Be-
Landeskriminalamt
Polizei
und
zur Brandursachenzentral
den Auf-
zu betreiben.
gesehen werden.
Polizei
Bereits
automatisiert
wie polizeiliche
EDV verarbeiten
stand die Forderung,
werden.
und bei der
1964 und, wenn auch sehr vage,
angestellt,
sich mit dem modernen Arbeitsmittel
k6nne. An Ende der 0berlegungen
kann nicht losgel~st
in anderen Bundesl~ndern
schon vorher wurden erste 0berlegungen
T~terermittlung
Neben dieser
auch Exekutivaufgaben
Zentralstellen
not-
Erkenntnisse
IN DIE ENTWICKLUNG
von den entsprechenden
formation
Stel!en mitzuteilen.
Landeskriminalamt
Spezialaufgaben
die Datenverarbeitung
Landeskriminalamt
und gewonnene
von Wirtschaftskriminalit~t,
Dar~berhinaus
kommunale
die zur Verbrechensbek~mpfung
auszuwerten
und berechtigten
Funktion hat das Hessische unterh~it
hat das Hessische
Auftrag~
zu sammeln,
Polizei
St~dten.
In-
lassen
zun~chst m~sse die
Diese Forderung wurde in den folgen-
267
den Jahren geradezu zur Voraussetzung zur Einf~hrung der EDV bei der Polizei in der BRD hochstilisiert.
Eine Analyse polizeilicher Arbeit
war bisher noch nicht erfolgt; an ihre Stelle traten eben Forderungen, Einzelbereiche zu automatisieren. Eine auf Bundesebene eingesetzte Arbeitsgruppe erarbeitete erste Ans~tze einer Analyse. Der Wert einzelner Informationsbereiche,
zum Teil auch einzelner Begriffe, wurde diskutiert.
F~r die Hessische Polizei fielen in diesen Zeitraum die ersten Gesprgche mit Herstellern yon EDV-Anlagen und wissenschaftlichen
Institutionen. Das
Ergebnis war eher entmutigend. Nach der groben Beschreibung der zu l~senden Aufgabe erkl~rten alle Befragten,
eine allgemeingfiltige, fertige DV-
L~sung zur Realisierung polizeilicher Aufgaben sei nicht vorhanden und lasse sich auch nicht ohne weiteres aus vorhandenen Verfahren entwickeln. Im Vordergrund der 0berlegungen standen damals bereits die Probleme der Speicherung yon Massendaten im direkten Zugriff, der schnellen Wiedergewinnung gespeicherter Daten, der schnellen Aktualisierung der Information und nicht zuletzt das Problem der hohen Verffigbarkeitsforderung der Polizei an die DV-Einrichtungen. Als Ergebnis dieses Zeitabschnittes bleibt festzuhalte~,
dag als Voraus-
setzung zur Einf~hrung der EDV bei der Polizei zun~chst eine intensive Analyse der polizeilichen Arbeitsabl~ufe erforderlich war und, auf dieser Analyse basierend,
ein polizeiliches DV-System zu entwickeln sein wfirde.
Dieser Erkenntnis wurde auf verschiedenen Ebenen Rechnung getragen: Beim Bundeskriminalamt wurde zum 1.1.1968 eine "Arbeitsgruppe EDV" eingerichtet, zu der alle Bundesl~nder einen geeigneten Mitarbeiter entsenden sollten. Aufgabe dieser Arbeitsgruppe sollte es sein, ein einheitliches polizeiliches DV-System fur die gesamte Polizei der BRD zu entwickeln. Die Arbeitsgruppe sollte alle bisher in der Bundesrepublik angestellten @berlegungen zur polizeilichen Datenverarbeitung sammeln und auswerten~ um auf dieser Basis ein einheitliches System zu entwickeln. Die Arbeitsgruppe EDV war bis Anfang 1970 in wechselnder personeller Zusammensetzung tgtig. Als Ergebnis ihrer Arbeit bleibt festzuhalten:
Die Arbeitsgruppe war mit zu geringer Kompetenz ausgestattet, um den Auftrag auch nut ann~hernd zu erf~llen. - Das geforderte ¥erfahren zur automatischen T~terermittlung war in dem gegebenen personellen und zeitlichen Rahmen nicht zu erstellen.
268
Sinnvoller, Grenzen
weil einfacher
zu 16sen,
ein System,
schnell und umfassend
Die dazu erforderlichen
In dieses
einer Druckerei
Karteikarten
karteien hergestellt
auf
stellen konnte.
und -mengen wurden
in syste-
System sollte auch ein Teil
die sog. Bfirofahndung,
es sich um ein Verfahren,
zun~chst
Information
zur Verffgung
Informationsbereiche
Form zusammengestellt.
der Personenfahndung, handelt
der Arbeitsgruppe
das die zum bekannten T~ter vorhandene
Anfrage m6g!ichst
matischer
und in zeitlich und sachlich fberschaubaren
schien nach Auffassung
einbezogen werden.
Hierbei
so aufzubereiten,
dag in
Suchantrgge
zur tgglichen Aktualisierung
werden und ein monatlicher
von Fahndungs-
Fahndungsbuchdruck
er-
EDV" durchgeffhrte
Er-
folgen konnte. Die parallel
zu den Arbeiten
hebung und Analyse brachte
interessante
das Karteisystem
Ergebnisse.
Obwohl
de die Informationswiedergewinnung
immer wieder behauptet wurde, polizeiliche
dag von ca. 60 Karteien
"Personalien"
auskamen.
in der Hauptsache
stimmte mit den Oberlegungen
BKA im wesentlichen
le-
Demnach wur-
fiber die Persona-
nicht abet fiber andere Beschreibungsmerkmale
Diese Erkenntnis
Ffr den Aufbau
Landeskriminalamt
sei die wichtigste
stellte sich heraus,
20 ohne das Ordnungsmerkmal
lien der Tgter,
im Hessischen
zur T~terermittlung
Informationssammlung, diglich
der "Arbeitsgruppe
des IST-Zustandes
betrieben.
der Arbeitsgruppe
EDV beim
fberein.
eines
Informationssystems
sich nun grunds~tzlich
neue Denkansgtze,
der Hessischen
Polizei
ergaben
die sich wie folgt zusammen-
fassen lassen: - Mittelpunkt bek~mpfung folgte,
a!ler polizeilichen
T~tigkeit
im Bereich der Verbrechens-
ist der Fall als Anlag zum r~tigwerden
dag ein System ohne Berfcksichtigung
nur ifickenhaft~
sondern auch falsch aufgebaut
- Da die Personalien
des T~ters
tions-Wiedergewinnung
spielen,
fiberhaupt.
eine wesentliche
nicht
sein wfrde. Rolle bei der Informa-
sollten sie an hervorragender
aber - und das war neu - nut einmal
Daraus
von Fallinformation
Stelle,
ffr eine Person in dem System Platz
finden. Das Verh~itnis sein.
Fall / Person sollte dutch Verknfpfungen
darstellbar
269
- Das System sollte die M6glichkeit bieten, zu jedem beliebigen Zeitpunkt Erweiterungen sachlicher Art anzubringen. Die genauere Betrachtung der vorstehenden Forderungen zeigte, da~ sich das gesamte Problem nicht in einem Entwicklungsgang l~sen lassen w@rde. Die Auswirkungen auf die polizeiliche T~tigkeit w~ren wahrscheinlich so schwer geworden, da~ es zumindest fraglich schien, ob die Arbeit nicht gel~hmt worden w~re. Dies war Grund genug, das Hessische Polizeiinformations-System
(HEPOLIS) stufenweise aufzubauen und in der ersten Stufe
nur das zu realisieren, was am wenigsten einschneidende Folgen f~r die polizeiliche T~tigkeit haben w~rde. Zur Aktualisierung der gespeicherten Information wurden verschiedene M6glichkeiten in Betracht gezogen. Die beste L6sung schien, Datenaufbereitung und Datenerfassung zu dezentralisieren. Untersuchungen der Leistungsf~higkeit eines zun~chst grob geplanten Daten~bertragungsnetzes ergaben, dag die Kapazit~t neben dem Auskunftsdienst auch die dezentrale Aktualisierung des Bestandes zulassen w~rde. Die grunds~tzlichen Anforderungen an das aufzubauende System waren zusammengefagt: - Im Mittelpunkt der ersten Ausbaustufe soll die Personenauskunft stehen; - Die Informations~bermittlung soll mittels Datenfernfibertragung erfolgen; Die Information soil an zentraler Stelle gespeichert,
aber dezentral
aufbereitet und eingegeben werden; Die Aktualisierung des Bestandes soll jederzeit vom 0rt der Informationsgewinnung aus m6glich sein und - die folgenden polizeitaktischen Forderungen sollen erf~llbar sein: Information mug auf Wunsch, ganz oder teilweise, schnell und jederzeit zur Verf~gung stehen; sie mug von m6glichst jedem 0rt erreichbar,
ein-
fach zu handhaben und m6glichst umfassend sein. Aufgrund dieser Forderung erfolgte eine Ausschreibung mit der Aufforderung, Angebote zu Hardware und Software abzugeben. Die Firma IB~ erhielt
270
den Zuschlag, weil sie neben dem gNnstigsten Preis/Leistungsverh~Itnis ausgewogenste Verh~itnis
3.
das
zwischen Hardware und Software bieten konnte.
DIE REALISIERUNGSPHASE
Nachdem die Vertragsverhandlungen mit der Firma IBM abgeschlossen waren, wurde zun~chst intensiv an der Konfiguration der DV-Anlagen gearbeitet. Die Forderung nach hoher Verffigbarkeit der Einrichtung fNhrte dazu, zwei Rechner einzusetzen,
(IBM /370-145 mit 768 KB Hauptspeicher)
die beide
in ihrer Ausstattung und ihren F~higkeiten gleich sind, damit sie wahlweise einzeln den Betrieb yon HEPOLIS aufrechterhalten kSnnen. Die angeschlossenen externen Einheiten tenlaufwerke
(Magnetplattenspeicher
- 16 Plat-
IBM 3330 mit insgesamt 2,2 Mrd. Zeichen im direkten Zu-
grill -, Magnetbandmaschinen, fibertragungssteuereinheiten)
Schnelldrucker,
Lochkartenleser und Daten-
sind technisch so ausgelegt,
da6 jeder
Rechner auf jede dieser Einheiten zu jeder gewfinschten Zeit Zugriff haben kann. Die DatenNbertragungssteuereinheiten
sind, genau wie die Rechner,
doppelt installiert. Alle anderen Einheiten sind in genfigend gro6er Anzahl vorhanden,
um auch bier Ausf~lle im technischen Bereich m0glichst
ohne grOBere Wartezeiten fiberbrficken zu kSnnen. Als Betriebssystem werden OS MFT If, derzeit im Release 27.7, und OS-VS eingesetzt. Zur Verwa!tung der gespeicherten Daten und zum Betrieb der Datenfernverarbeitungseinrichtungen wird IMS 2 Level 4 eingesetzt. Als Datenstationen werden ausschlie~lich IBM 3270 Terminals verwendet. Diese sind als Einzelstation oder Mehrfachstationen - je nach Bedarf vorhanden und alle mit einem Puffer ffir 1.920 Zeichen ausgelegt. Die Wahl fiel auf diese relativ gro~en Bildschirme, weil HEPOLIS, soweit irgend mSglich, benutzerfreundlich
aufgebaut werden sollte und kleinere
Bildschirme automatisch zu Restriktionen in der Organisation des Bildschirmaufbaus geffihrt h~tten. Nur mit dem gro~en Bildschirm ist es gelungen: fast in allen F~llen eine Informationseinheit zubauen, ohne an Obersichtlichkeit
in einem Bildschirm auf-
zu verlieren,
271
jedes einzelne Datenfeld mit einer Feldbezeichnung yon 10 Zeichen zu adressieren und so in den meisten F~llen Aussagen ohne AbkOrzungen zu machen und, was f@r den Benutzer die Bildschirmformate gleichf6rmig
for Auskunftsdienst
und Anderungsdienst
nahezu
angesehen werden, wenn fast jede Zeile des
nut ein Datenfeld
der 0bersichtlichkeit Der Datenbestand
Personenbezogene
enth~it.
Dennoch dient ein solcher Aufbau
und macht das System benutzerfreundlich.
unter
standdatenbanken.
-
ist,
aufzubauen.
Es mag als Raumverschwendung Bildschirms
sehr wichtig
IMS-Steuerung
gliedert
sich zur Zeit in drei Be-
Dies sind: Daten
- Falldaten - KFZ-Daten. Der Bereich KFZ-Daten
ist programmtechnisch
im IMS-System bereits
abgebildet.
Alle
Informationen
zu einem Objekt
(Person, Fall,
dieser Dateien in nur einem Datensatz mit einem Ordnungsbegriff,
noch nicht realisiert,
KFZ) werden innerhalb
abgebildet.
Jeder Datensatz
einer satzspezifischen
Nummer,
zu einem Objekt bei Kenntnis
entsprechenden
werden.
zukommen.
auch auf anderen Wegen an die gew~nschte
Dies ist im allgemeinen
IMS 2 keine M~giichkeit IMS-Konzept
der "logischen
chert Invertierungslisten
Datenbank"
Invertierung
aufgegriffen
dutch Anwendungsprogramme
So bestehen nunmeh~ die folgenden M6glichkeiten, merkmalen
auf Datens~tze
zuzugreifen,
m6glich.
bietet,
-
mit dem phonetisierten
wurde das
erstellt.
mit Identifizierungs-
ohne die satzspezifische
oder
Namen oder
- mit einem DeliktschlOssel
heranDa
und die erforderli-
zu kennen: - mit Name und Geburtsdatum
ist, war
Information
nur durch Invertierung
der automatischen
der
Da in der polizeilichen
Praxis diese Nummer nicht ilnmer und in jeder Situation bekannt es notwendig,
ist
adressierbar.
Auf diese Weise kann die Information Nummer wiedergewonnen
jedoch
auf die personenbezogenen
Daten;
Nummer
272
- mit der Angabe
Beh6rde und Aktenzeichen
- mit dem amtlichen
Kennzeichen
- mit der Fahrgestellnummer - mit einer Kombination - mit der Motornummer
-
personenbezogenen personenbezogenen Richtungen
sowie
- den Personalien
allein,
oder
aus beiden oder Daten.
hergestellt
worden
zwischen
Daten und fallbezogenen
Daten,
Daten und KFZ-bezogenen
Daten,
innerhalb
der personenbezogenen
Verbindungen
Auswertung
Datenbestandes.
Problem po!izeilicher
nicht ohne weiteres
des Verfahrens
Reicht
abh~ngigen
das sich mit IMS 2
Segmente
konnte auch das
indem einem Segment eine GrOBe
Daten in einer statistisch
des angesprochenen
ver-
Datenmenge
Segment-Typs,
gr6~er
werden
Daten in einem ersten abh~ngigen
Segment
auch das nicht aus, k~nnen beliebig viele Segmente
Typs gef~llt werden.
Eine kurze Beschreibung zeigen,
ist, bei
langen Datenfelder.
kann. Falls die tats~chliche
ist als die Aufnahmef~higkeit
dieses
der abh~ngigen
die die zu erwartenden
die nicht mehr unterbringbaren untergebracht.
Datenverarbeitung,
gel~st werden,
n~nftigen Menge aufnehmen
realisiert
in-
sich yon
genutzt werden k~nnen.
lOsen lie~, waren die variabel
Problem zufriedenstellend gegeben wurde,
Es versteht
die nunmehr
weitem nicht alle theoretischen M6glichkeiten
Unter Ausnutzung
in beiden
erlauben nahezu jede polizeilich
des gesamten
da~ in der ersten Ausbaustufe,
Ein weiteres
jeweils
Daten zwischen
und den Personenfahndungsdaten.
Diese weitreichenden teressante
Daten;
oder
auf KFZ-bezogene
Daneben sind Verbindunge~
-
auf fallbezogene
von Auskunftsdienst
wie das System in der Praxis
und %nderungsdienst
soll
genutzt wird:
- Auskunfts diens t Der Polizeibeamte ~berpr~ft.
ben~tigt
Er wendet
tionsmittel
erforderlichen
Datenstation,
Identifizierungsmerkmale
NachrichtenschlNssel
Nber eine Person,
die er gerade
sich mit dem der Situation angepassten
an die n~chstgelegene
ob die so beschriebene merkmale
Information
und erkl~rt,
Person gesucht wird.
Kommunika-
gibt dem Bediener die er m6chte wissen,
Der Bediener
AFO2 sowie die ihm ~bermittelten
in die erste Zeile des leeren Bildschirms
tastet den
Identifizierungs-
ein und bet~tigt
273
eine Funktionstaste.
HEPOLIS
teilt ~hm aufgrund dieser Werte
innerhalb
von I0 Sekunden mit, ob diese Person gesucht wird oder nicht. gibt dazu die gesamten Personalien sondere Hinweise
forderlich,
der Alias-Daten,
zur Person und alle Fahndungsdaten
anderen Nachrichtenschl~ssel rufen werden.
einschlie~lich
Der jeweils
ausgedruckt
k6nnten entsprechende
angezeigte
HEPOLIS be-
aus. Mit einem andere Daten abge-
Bildschirminhalt
kann,
falls er-
werden.
- ~nderungsdienst Das Aufgabengebiet
~nderungsdienst
dern und L6schen vorhandener Daten in vorhandene ten. Letzteres bunden.
umfagt neben den Funktionen Ver~n-
Daten und der Funktion Zuffgen weiterer
Datens~tze
auch die Funktion Einbringen
neuer Da-
ist immer mit dem Er~ffnen eines neuen Datensatzes
ver-
Mit dieser Funktion wird die Forderung nach dezentralisierter
Datenerfassung
erf@llt,
da alle Funktionen
s~tzlich fiber alle Datenstationen Im ~nderungsdienst - Die Punktionen
sind zwei Prinzipien
sind - mit Ausnahme
mit dem entsprechenden Nummer m6glich.
des ~nderungsdienstes
angewendet worden:
des Einbringens
Nachrichtenschlfssel
Dies ist notwendig,
rung genau an dem Datensatz
grund-
ausge~bt werden k~nnen.
neuer Daten - nur
und der satzspezifischen
um sicherzustellen,
durchgeffhrt
wird,
dag eine ~nde-
an dem sie durchgeffhrt
werden soll. - Alle Funktionen Anforderung Handelt
erfolgen
formatgesteuert,
eines entsprechenden
es sich um die Funktionen
Ausgabeformat
~nderungsformates "Ver~ndern"
mit den Daten geffllt,
met adressiert
werden.
Umfang der zuzuffgenden Datenfeld,
bringen neuer Datens~tze" Funktionen
weicht
wird das
in bereits
spezifiziert
werden k6nnen.
Art und Ist dies
aus, in dem jedes
einmal Platz hat. Die Funktion "Ein-
etwas von den anderen ~nderungsdienst-
ab. Der Benutzer kann sich auf jeden Fall so verhalten,
ob er der erste w~re,
der Daten zu einem bestimmten
will. Die dazu erforderliche tenschlfssel
Num-
vorhandene
in der Anzahl,
gibt das System ein Standardformat
das zugef@gt werden darf,
voraus.
oder "L6schen",
vorausgehen,
Datenfelder
geht die
die durch die satzspezifische
Bei der Funktion "Erg~nzen
Daten" kann eine Formatanforderung nicht gewfnscht,
d.h. jeder Funktion
Formatanforderung
und den entsprechenden
als
Objekt einbringen
erfolgt mit dem Nachrich-
Identifizierungsmerkmalen.
Findet
274
das System unter diesen Merkmalen ausgegeben; Benutzer
keinen Bestand,
ist Bestand vorhanden,
kann nun entscheiden,
zuordnen will oder nicht.
6ffnung
eines neuen Datensatzes
des Nnderungsdienstes abgenommen
0her eine Sonderfunktion
sind schwieriger
Das liegt daran, gleichzeitig
erreicht werden mu~. Alle Programme Kontrollfunktion, genommen wird.
zu handhaben,
aber ein hohes Ma6 an Sicherheit haben eine
~ber die eine Pr~fung der ~nderungsberechtigung da~ nur der Besitzer
im Bestand gespeichert
kann. Nnderungsversuche
als
im Bereich
yon der DV-Anlage
des Nnderungsdienstes
So kann erreicht werden,
dessen Kennzeichen
Da-
kann die Er-
da~ dem Benutzer
so viel wie m6glich Formalismen
werden sollen,
Der
erzwungen werden.
des ~nderungsdienstes
die des Auskunftsdienstes.
angezeigt.
ob er seine Daten einem vorhandenen
tensatz
Die Funktionen
wird ein Leerformat
wird er vollst~ndig
ist - Nnderungen
yon Nichtbesitzern
vor-
der Daten -
vornehmen
werden programmgesteuert
ab-
gewiesen. Auskunftsdienst aus jederzeit
und ~nderungsdienst
und in beliebiger
zu erforderlichen
Programme
k6nnen yon derselben
Reihenfolge
Datenstation
durchgeffihrt werden.
und Datenbest~nde
sind in HEPOLIS
Die da-
jederzeit
~erfNgbar.
4.
DER VERBUND
ZUM INFORN~TIONS-SYSTEM
Unter INPOL wird der Zusammenschlu~ systeme verstanden.
polizeilicher
INPOL ist erforderlich,
gebauten Polizeiinformationssysteme Bundeskriminalamtes
@bet festgeschaltete
Hersteller ben werden~
waren
die Funktion
Hardware
zum Verbundbetrieb
statto
von allen Beteiligten t~glichen
Betrieb
realisierbar
als durchaus
@bernimmt
einer zentra-
findet
in einem
Da Rechner verschiedener
und Software
in INPOL betrie-
bestimmte Absprachen
- Daten~bertragungsprozedur Hier wurde eine auf den DIN-Normen basierende
auf-
und mit dem System des
Der Datenaustausch
Leitungen
mit unterschiedlicher
Datenverarbeitungs-
Das Bundeskriminalamt
Informationssystem
fen Nachrichtenvermittlungsstelle.
(INPOL)
um die auf L~nderebene
untereinander
zusammenzuschlie~en.
dabei neben seinem eigenen Sternnetz
DER POLIZZI
Absprache
war. Die Prozedur
zufriedenstellend.
erforderlich.
getroffen,
die
erweist sich im
275
Datenaustauschsatz Zum
Datenaustausch
wurden
die
Verbundteilnehmer
beim
heitlich
anzuwenden
wickelt,
mit
empfangs
austauschen.
Nachricht
dem
auf als
und
Weiterhin
Auf
Fehler
ouch
Empfang
haben.
Nachrichtenformate
Daten Weise
ist
Senden ein ~ber
sowohl
einen
auf
falschen
den
yon
Nachrichten
die
Art
des den
aufmerksam der
die ein-
Quittungsverfahren
unrichtigen
Zustand
festgelegt,
es mSglich,
im 0bermittlungsdatensatz
k~nnen auf
diese
beim
wurde
die Verbundpartner
Fehlerquittungen satzes
bestimmte
Inhalt
Datenbank
ent-
NachrichtenSender
einer
zu machen. eines
Daten-
hinweisen.
Nachrichtenkopf Jedem Austauschdatensatz
ist ein Nachrichtenkopf vorangestellt,
der
yon Anwendungsprogrammen verarbeitet wird. In diesem Nachrichtenkopf sind Informationen Nber den Sender und Empf~nger der Nachricht ebenso enthalten wie Angaben Nber ihre Art und L~nge. Der Nachrichtenkopf wird dem Sender in der Quittungsnachricht vom Empf~nger der Nachricht zurNckgesandt. Die zur Zuordnung erforderlichen Daten sind ebenfalls im Nachrichtenkopf enthalten. Die Verbundsteuerung sowie die Aufbereitung der Sende- und Empfangsdaten in das jeweils nStige Format erfolgen im HEPOLIS in einem besonderen Programm, das unter IMS-Steuerung permanent im Rechner vorhanden ist. Zur Umsetzung der Daten in das erforderliche Format dient in diesem Programm ein Tabellen-Modul,
der in beiden Richtungen wirksam ist. D.h. mit nur
einem Tabellenglied erfolgt die 0bersetzung vom HEPOLIS-Format in das Sendeformat oder vom Empfangsformat in das HEPOLIS-Format. Ober den Verbund werden in beiden Richtungen t~glich zusammen ca. 2000 Nachrichten zuzNglich der erforderlichen Quittungen ausgetauscht. Hierbei handelt es sich ausschlie~lich um Update-Nachrichten. Bestimmte Datenbereiche werden aus Sicherheitsgr~nden und zur Beschleunigung der Ausk~nfte in den an INPOL angeschlossenen Systemen parallel gespeichert. Um sicherzustellen,
da~ diese Best~nde auch tats~chlich iden-
tisch sind, werden in bestimmten Zeitabst~nden Bestandsabgleiche durchgef@hrt. Hierzu werden die Datenbest~nde zu einem bestimmten Zeitpunkt vom Update ausgeschlossen und entladen. Nach Beendigung dieses Vorganges werden diese
(entladenen) Best~nde miteinander verglichen. Un-
stimmigkeiten werden protokolliert, beseitigt.
auf ihre Ursache hin untersucht und
276
Es versteht sich yon selbst, da~ das vorher erw~hnte Sicherungsverfahren zur Verhinderung yon unberechtigten Updates auch im Verbund gilt.
5.
HEPOLIS IM T~GLICHEN BETRIEB
Das System wurde im FrOhjahr 1974 mit den Erstdaten geladen. Dabei wurde der Ladeproze~ nicht in Form eines "initialload" durchgefahrt, die einzelnen Datens~tze wurden programmgesteuert
sondern
in das System einge-
bracht. Dabei wurde jeder Zugang ~ber die parallel aufgebauten Suchlisten am jeweils vorhandenen Bestand vorbeigefahrt. dazu, Mehrfachbest~nde
Dieses Verfahren diente
aus den bis dahin nicht bereinigten handgefahrten
Karteien zu erkennen und nicht in das System zu bringen. Bei diesem Erstladen wurden ca. 390.000 Personendatens~tze eingespeichert und in ca. 15.000 F~llen Mehrfachbestand erkannt. Der so aufgebaute Bestand wurde alsbald far den Auskunftsdienst
freigegeben.
Im November 1974 wurde der aktuelle Personenfahndungsbestand
zur Parallel-
speicherung vom Bundeskriminalamt abernommen und nach dem oben beschriebenen Verfahren in den Bestand eingefagt. Von 150.O00 abernommenen Datens~tzen trafen ca. 12.500 bereits auf Bestand.
In diesen F~llen wurden
dem vorhandenen Datensatz lediglich die noch fehlenden Daten zugefagt. Seit Januar 1975 iguft HEPOLIS roll im 24-Stundenbetrieb mit online-update und online-Auskunftsdienst.
Die aber die Datenstationen abgewickelte
Menge yon Arbeitsauftr~gen liegt derzeit bei durchschnittlich 9.500 t~glich mit Spitzen um 12.2OO tgglich. Da im Nnderungsdienst Arbeitsauftrag
zu jedem
2 IMS-Transaktionen geh6ren, liegt die Zahl der abzu-
wickelnden Transaktionen bei durchschnittlich
12.OO0, in Spitzen bei
16.000 t~glich zuzag!ich der Transaktionen des Verbundes. Die derzeit stgrkste Belastung lag bei 1.2OO Arbeitsauftr~gen oder etwa 1.6OO Transaktionen in einer Stunde. Das System bew~itigte diese Arbeitslast bei einer mittleren Antwortzeit vo~ 3 Sekunden im Auskunftsdienst und 5 Sekunden im ~nderungsdienst,
wo-
bei der Betrieb in 3 IMS-Regions abgewickelt wird. Die Arbeitsauftr~ge des ~nderungsdienstes
erfordern 2 IMS-Transaktionen, weil der Aufbau der
Eingabemaske und das danach erfolgende Update yon verschiedenen Programmen erledigt werden. Dies wurde so geplant, um die auch m6gliche Conversatio~al-Programm~erung zu vermeiden. Im Betrieb sind derzeit 78 TPProgramme und 8 BNP-Programme mit insgesamt 172 Transaktionscodes.
277
Alle Programme benutzen denselben Plausibilit~ts-Pr@fungsmodul selben Fehlerbehandlungsroutinen.
und die-
Dadurch wird erreicht, dab der Daten-
bestand einen m6glichst hohen Grad an Richtigkeit hat und dem Benutzer Fehler einheitlich auf dem Bildschirm angezeigt werden. Die Pr~fungslogik ist so angelegt, dab alle eingehenden Nachrichten bis zum Ende auf Fehler gepr~ft werden. Am Ende der Pr~fung werden festgestellte Fehler in einem Fehlerformat angezeigt.
Ist trotz der Fehler eine Verar-
beitung m6glich, wird sie durchgef~hrt und das Ergebnis angezeigt.
Ist
eine Verarbeitung nicht m6glich, erfolgt ein entsprechender Hinweis in der Fehleranzeige. Das System wird aus Sicherheitsgr@nden einmal in 24 Stunden terminiert. Dies ist erforderlich, um die Restartzeiten bei abnormalem Ende so kurz wie m6glich zu halten. Die mittlere Ausfallzeit des Systems liegt unter EinschluB der o.g. geplanten Abschaltungen derzeit bei 2,2% der Verf@gungszeit
(bezogen auf
24 Stunden t~glich). Die Restartzeiten bei abnormalem Ende liegen je nach Schwere des Fehlers zwischen 45 Minuten und 2 Stunden. Wesentlich zur Beschleunigung des Restarts hat beigetragen,
dab jede Woche eine komplet-
te Fassung der Datenbank auf Magnetplatten gesichert wird, so dab lange Restorel~ufe yon den ebenfalls vorhandenen Sicherungsb~ndern entfallen. Zur Fehlerbehebung allgemein ist zu sagen, dab die Restart- und RecoveryRoutinen des IMS sich in der Praxis voll bew~hrt haben. Eine Reorganisation der Datenbank war bisher erst einmal erforderlich. Sie dauerte insgesamt 92 Stunden und verlief nach anf~nglichen Schwierigkeiten reibungslos. Da w~hrend dieser Zeit der %nderungsdienst unterbrochen werden muBte und der Auskunftsdienst in seinen Aussagen mit fortschreitender Zeit immer inaktueller wurde, wird zur Zeit mit Vorrang an der Erstellung eines Programmsystems gearbeitet, das den Xnderungsdienst auch w~hrend der Dauer der Reorganisation erlaubt. In allgemeinen kann gesagt werden, dab HEPOLIS trotz der kurzen Dauer seines Einsatzes yon den Benutzern bereits akzeptiert ist und den weiteren Ausbaustufen erwartungsvoll entgegengesehen wird.
278 6.
AUFWAND
Zum Abschlu~ HEPOLIS
einige Bemerkungen
zum Aufwand,
in der ersten Ausbaustufe
kannt werden,
sischen Polizei
zum Einsatz
nicht erstellt werden konnte, fang eingesetzt zeitweise
werden.
Dabei sollte nicht ver-
erstmals
kam. Das bedeutet,
werden muBte.
Da das System mit eigenem Personal
gesamt
zu erstellen.
dab mit dem Aufbau yon HEPOLIS
worben und ausgebildet
der geleistet werden muBte, EDV bei der Hes-
dab das EDV-Personal
in der zur Verffigung stehenden
mu~te externes
Personal
und Programmierer
Analyse,
Programmierung,
Ins-
Darin sind ent-
Schulung und Datenerfassung,
nicht abet die Datenaufbereitung. Gemessen liegen.
am Erreichten
scheint der Aufwand
Um-
1974 waren
besch~ftigt.
liegt der Aufwand bisher bei 75 - 80 Mann-Jahren.
halten Planung,
Zeit
in erheblichem
In den letzten Wochen des Jahres
26 externe Organisatoren
ge-
Dieser Prozeg dauert noch an.
im vernfinftigen Rahmen zu
Relational
Data Dicti0nar [ Implementation
I A Clark,
IBM United Kingdom Scientific
Centre,
Peterlee,
UK
Abstract The paper presupposes
a team of application
application
generator
served by a relational
application
grows by including not only routines
by accumulating
new relations,
developers database
using an (RDB).
The
for input/output,
the latter r e p r e s e n t i n g
but
data-definition
activity by the developers. A data dictionary
(DD) is needed
(I)
to interrelate
(2)
to relate these to routines,
(3)
to produce auditing reports
The benefits
relations,
and technical
problems
input streams and reports, and clerical procedures manuals. of maintaining
the DD itself as a
RDB are treated.
INTRODUCTION
This paper assumes
a development
served by a relational
database
by adding I/O and processing relations. relations
team using an application (RDB).
routines,
The application
but also by accumulating
Such relations may be derived in the database,
generator
grows not only new
from already existing
as well as being inserted independently
as
a set of tup!es.
We do not want to argue here why we consider an application
generator
280
together this
with
a relational
combination
data-processing mean a group their
to be an a t t r a c t i v e professionals.
of h i g h l y
own d i s c i p l i n e
diverted
from their
technical
nature
individuals
consider
operation,
that
structure
to innovate without
of a p u r e l y
In p a r t i c u l a r
by q u e s t i o n s
involved
standard
with
database
similar
feature
of PRTV
a new relation
derived
called
to open the r e l a t i o n
as a r e a d - o n l y
a terminal existing
tuples
session
it contains. by e n t e r i n g
relations,
INTERSECTION
SELECTION
DIFFERENCE
JOIN
is a n a m e d
call
a ' (relational) entity
value'
specifies the tuples
value
there
or the values,
However,
briefly
but c o n v e n i e n t l y
as intact
note that by the term:
specifically
one w h o s e
relation,
say
in PRTV,
there
been obtained
'A',
rather
than
latter
case.
during
names
of
w h i c h we shall
to d e s c r i b e
so.
how
Within
in fact,
which
this
either
the
it was derived.
relation',
contains
just the value
is no way of e f f e c t i v e l y from A in the
contains
substrings
value
or to find
to the routines
from w h i c h
'derived
relational
is
it is a c h a r a c t e r
just h o w to go about doing exist,
file,
workspace
to say that
of the r e l a t i o n s
relations
are e x p l i c i t l y
operations:
the user's
Suffice
Peterlee,
can be d e f i n e d
It is not our purpose
is implemented.
string w h i c h
names,
entity within
Centre,
tuples
which
acted on by the r e l a t i o n a l
this
relational
an e x p r e s s i o n
PROJECTION
materialise
these
A new r e l a t i o n
UNION
The r e s u l t
operations
from e x i s t i n g
until
eg,
choice
is that of d e f e r r e d
into a set of tuples
for;
purpose
the wide
to a r e s e a r c h
not m a t e r i a l i s e d
out h o w m a n y
such
of c h o o s i n g
programming
and used at the IBM UK S c i e n t i f i c
The chief is,
within
being
for their p a r t i c u l a r
nor get
essentially
we shall
generator).
a relational
developed (i) o
who w i s h
by c o n s i d e r a t i o n s
to be d e f l e c t e d
database),
for d o i n g
called P R T V
one for use by a team of non-
"non-DP p r o f e s s i o n a l s '
individuals
true p u r p o s e
(hence the a p p l i c a t i o n
to say that we b e l i e v e
use of the computer,
or the best d a t a
of t e c h n i q u e s
prototype
skilled
Suffice
to do w i t h d a t a - p r o c e s s i n g .
(hence the r e l a t i o n a l
We shall
By
by m a k i n g
w i l l not wish
data p a t h w a y s
data base.
we shall m e a n
the name of another
of A.
This
recognising
that,
If for i n s t a n c e
is because, say,
B has
relation
A
28I
w e r e b u l k - l o a d e d from cards, next relation B created and simply a s s i g n e d the value of A, there w o u l d be nothing inherently d i f f e r e n t about A and B.
Indeed,
in PRTV as it stands there w o u l d be no way of
telling w h i c h came first~ another value,
M o r e o v e r either A or B could be r e a s s i g n e d
leaving the other unchanged.
case if B were derived from A.
This is clearly not the
Then w h e n e v e r A changed its value, B
w o u l d change correspondingly.
Since relational values are relatively small entities c o m p a r e d with the large sets of tuples they can p o t e n t i a l l y represent,
one must not think
that a computer process which forms new relations at run time out of e x i s t i n g relations is n e c e s s a r i l y going to be extravagant.
Thus PRTV
allows one to formulate as much of one's a p p l i c a t i o n as one cares to in a r e l a t i o n a l algebra, w h i c h on the face of it p e r f o r m s s e t - t h e o r e t i c o p e r a t i o n s upon whole sets of tuples.
However,
the operations are
really p e r f o r m e d on t~e relational values we have just described, with the result that the o p e r a t i o n of forming the union,
say, of two large
sets of tuples is d e f e r r e d until one actually lists a relation,
or
opens a sequential file b a s e d on that r e l a t i o n and scans the file. are going to formulate,
We
in a relational algebra, p r o c e s s e s w h i c h
e x p e r i e n c e d p r o g r a m m e r s w o u l d not c o n s i d e r h a n d l i n g in terms of e l e m e n t a r y o p e r a t i o n s w h i c h combine entire sets of tuples,
or as they
w o u l d see them, sets of records.
Instead of a r e l a t i o n a l algebra, used instead,
a relational calculus may of course be
eg the A L P H A language of E F Codd
support ALPHA, nor shown elsewhere,
(2).
any such relational calculus.
PRTV does not yet
However,
as Codd has
it is in principle feasible to translate from one to
the other in a natural way.
An A L P H A expression resembles a t h e o r e m in
the P r o p o s i t i o n a l Calculus.
To a logician,
this represents a n a t u r a l
and general way of m a k i n g an assertion about a given computer process. Other p r o f e s s i o n a l s have their own languages w i t h i n their own disciplines.
Whether or not they can u n d e r s t a n d a P r o p o s i t i o n a l
Calculus e x p r e s s i o n does not matter:
their own languages are likewise
amenable to m a c h i n e t r a n s l a t i o n into the r e l a t i o n a l algebra.
C o n s i d e r an a p p l i c a t i o n w h i c h accepts a batch of input and p r o d u c e s reports
(invoices, cheques, etc).
It is c o n c e i v a b l e in p r i n c i p l e to
load the input straight into a number of relations,
then p r i n t out the
reports d i r e c t l y from relations derived from the input relations.
How
far one p r o g r e s s e s towards this limit depends in p r a c t i c e on w h e t h e r
282
it appears easier to i m p l e m e n t a given step using the r e l a t i o n a l algebra,
or a c o n v e n t i o n a l p r o g r a m m i n g
is u n l i k e l y to be p r e d i s p o s e d
language.
A non-DP p r o f e s s i o n a l
towards the p r o g r a m m i n g solution,
p a r t i c u l a r l y if p r o v i d e d w i t h an a p p l i c a t i o n generator w h i c h constructs the r e l a t i o n a l algebra for him out of more familiar specifications.
The m a i n p r o b l e m s w h i c h the a p p l i c a t i o n g e n e r a t o r will have to h a n d l e are those of m a k i n g the w o r k of one team m e m b e r available to another in an orderly fashion,
and to stop t h e m u n s u s p e c t i n g l y cutting the ground
away from under each others'
feet.
This can so easily h a p p e n if the result of one i n d i v i d u a l ' s work, e m b o d i e d in a relation,
is p a s s e d to another, who i n c o r p o r a t e s it into
a derived r e l a t i o n w h i c h is in turn p a s s e d on.
It becomes a h e a v y
a d m i n i s t r a t i v e task to keep track of what changes to the o r i g i n a l r e l a t i o n are safe, p e r m i s s i b l e ,
or are n o n s e n s e in terms of the real-
w o r l d application.
Note that w i t h this remark we do not d i s t i n g u i s h b e t w e e n a p p l i c a t i o n d e v e l o p m e n t and o p e r a t i o n a l r u n n i n g of the application.
One p o s s i b l e way of coping w i t h this task is for the a p p l i c a t i o n g e n e r a t o r to a d m i n i s t e r a data dictionary. much cross-indexing,
Since the task involves
and the a p p l i c a t i o n generator is already served
w i t h a r e l a t i o n a l database,
it is a t t r a c t i v e to i n v e s t i g a t e m a i n t a i n i n g
the data d i c t i o n a r y itself as a r e l a t i o n a l database.
A range of tasks may be u n d e r t a k e n by the data dictionary, s i m p l e s t to the most ambitious.
(1)
Examples
from the
are:
r e p o r t i n g upon all relations w h i c h are a f f e c t e d by u p d a t i n g a
given relation, (2)
p r e v e n t i n g or o t h e r w i s e q u a l i f y i n g an order to destroy a
r e l a t i o n upon w h i c h further r e l a t i o n s are defined,
(3)
e n f o r c i n g semantic c o n s t r a i n t s
imposed by the nature of the
a p p l i c a t i o n at either a p p l i c a t i o n d e v e l o p m e n t time, i n s e r t i o n of
'nonsense'
relations
into the database,
eg, to prevent the or at run-time,
eg, to ensure that tuples are not i n s e r t e d into a given r e l a t i o n w i t h o u t c o r r e s p o n d i n g tuples b e i n g p r e s e n t in another relation.
283
(4)
p r o d u c i n g listings of all routines and reports relating to a
given d a t a b a s e relation,
(5)
for a u d i t i n g purposes,
m a i n t a i n i n g an up-to-date clerical procedures manual.
often requires c r o s s - r e f e r e n c e d reports,
This
lists of fields on input documents,
and domains in the database.
These tasks are r e p r e s e n t e d in order of increasing severity.
We shall
treat the first three only, d i s c u s s i n g some theoretical and technical p r o b l e m s w h i c h the data d i c t i o n a r y has to face. topics,
a l t h o u g h ambitious in practice,
The r e m a i n i n g two
are t h e o r e t i c a l l y much simpler
than the first three.
(1)
REPORTING
UPON
UPDATE
DEPENDENCIES
For the m o m e n t we are p r i m a r i l y concerned w i t h update d e p e n d e n c i e s between relations in the course of application development. sort of update dependency,
that between records,
will be treated later under the heading of
The other
or tuples in our case,
'semantic constraints'
This facility is s t r a i g h t f o r w a r d l y achieved by m a i n t a i n i n g a DDrelation,
call i% RDEPEND,
'DD-relation', we mean
on the domains RELIDI, RELID2, DEPTYPE.
'data dictionary'
relation,
By
to d i s t i n g u i s h it
from the relations b e l o n g i n g to the application itself.
DD-relations
may or may not be kept in the same database as a p p l i c a t i o n relations: for r e s e a r c h c o n v e n i e n c e the former is recommended due to the facility for b o o t s t r a p p i n g the data dictionary,
the latter advisable however for
security.
Note that we require some means of r e f e r r i n g to distinct occurrences of the same domain w i t h i n the c o m p o n e n t list of a relation. here by p o s t f i x i n g i, 2, etc, to the domain name etc,
for the d o m a i n name RELID).
derived relation, what capacity
Furthermore,
RDEPEND contains a tuple for each
stating what relation it depends on
(DEPTYPE).
other relations,
We do this
(eg, RELIDI, RELID2,
(RELID2) and in
Where a relation is derived from a number of
that n u m b e r of tuples is present in RDEPEND.
if the r e l a t i o n uses another in more than one capacity,
more than one tuple for that pair of
'RELIDs' Occurs.
Now comes the advantage of using a relational database for the data
284
dictionary.
The r e l a t i o n
Thus by j o i n i n g which
carries
those
appearing
it to i t s e l f
a tup!e
notation
on the r e l a t i o n a l
it freely
in order
is t r a n s i t i v e
repeatedly
value
as w e l l
as
in RDEPEND.
to p r e s e n t
algebra,
an example.
ISBL,
This n o t a t i o n
used by PRTV,
it b e t t e r
illustrate
In PRTV a user m a n i p u l a t e s
relations
within
of the
sense.
a relational
dependencies,
to make
expressions
in a logical
we r e c o v e r
for all the i m p l i c i t
explicitly
L e t us i n t r o d u c e based
RDEPEND
although
is
we m o d i f y
our points.
his w o r k s p a c e
by
form:
C = A * B
The
first
named
C = N~A
* B
command
would
entity
introduced
are a c c e s s e d 'A t and
construct
as yet)
earlier with
and a value
with
a RELID
its s y m b o l i c
equal
to the
of
'value';
'join'
'C'
(the
no tuples
of t h e v a l u e s
of
'B'.
The second
command would
relational
value
formed
should be read as
Suppose
a relation
incorporate for
the RELID:
'C' instead
'A' into the
of the v a l u e of A.
'N~A'
'name-A'.
we have d e f i n e d
'F ~ by the
following
sequence
of commands:
C = N~A D
=
N~B
E = N~C F = N~E Then
* D
RDEPEND would
In order RDEPEND
required
the f o l l o w i n g
( RELIDI C D E F F
RELID2 A B C E D
to o b t a i n
tuples
for every
with
(detected
contain
RDEPEND
itself
by t e s t i n g
repeatedly
each r e l a t i o n a l
operand
collating
values
equal
of n o t a t i o n a l
design
'equi-join'.
which within
components
each
Thus:
to
of F one m i g h t
until no further The type
tuples of
This means
certain
specified
an e q u i - j o i n
'overlap'
operation
that the tuples
from
are c h o s e n by
domains.
elegantly.
by p l a c i n g
join
appeared
'join'
are to be c o n c a t e n a t e d
to specify
the r e q u i r e d other.
an
)
dependency
its cardinality) o
is one called
tuples:
DEPTYPE N N N N V
component
It is a m a t t e r Here we
show
names b e n e a t h
285
RDEPEND RELID1 RELID2 DEPTYPE * RDEPEND RELID1 RELID2 DEPTYPE r e p r e s e n t s a r e l a t i o n a l value with five d o m a i n occurrences.
Each tuple
in the set so d e f i n e d is formed by taking a pair of tuples from RDEPEND for w h i c h RELIDI in one tuple equals RELID2 in the other.
There is a
c o m b i n e d tup!e for all such pairS.
We may further join to this a relation,
DTRANS, w h i c h contains a tuple
m a t c h i n g each pair of values of DEPTYPE w h i c h turns up in the above r e l a t i o n a l value. domain DEPTYPE,
E a c h tuple of DTRANS contains a third value from the
r e p r e s e n t i n g the r e s u l t i n g
(ie, transitive)
dependency.
After that, we can p r o j e c t out just those domains we wish to see, r e n a m i n g t h e m in the process.
Note that in a relation,
of a given tuple are suppressed.
three given objects are related in a given way. these three objects is w h a t comprises the in this case). 'occurrence'
all d u p l i c a t e s
A r e l a t i o n simply records that,
say,
The ordered set of
'tuple'
(3-tuple, or 'triple'
Thus it makes no sense to talk about more than one
of this tuple.
The three objects are either related,
or
they are not.
We may thus construct the relational assignment
statement:
RR = RDEPEND RELIDI RELID2 DEPTYPE * RDEPEND RELIDI RELID2 DEPTYPE * DTRANS DEPTYPE! DEPTYPE2 DEPTYPE3 % RELIDI RELID2 DEPTYPE The r e s u l t i n g r e l a t i o n RR has p r e c i s e l y the domains and domain-IDs of RDEPEND
(the final
once-removed. RR
'project',
%, has seen to that), but relates RELIDs
Thus RR contains the following tuples only: ( RELIDI
RELID2
E
A
F F
C B
DEPTYPE NN NN V
)
The r e l a t i o n DTRANS can be v i s u a l i s e d as a function with two arguments, D E P T Y P E I and DEPTYPE2, DEPTYPE3.
r e t u r n i n g the c o r r e s p o n d i n g object in the domain
Indeed in PRTV it can be implemented either as a PL/I
function or as an o r d i n a r y relation, with a tuple for every pair of values of DEPTYPEI and DEPTYPE2. following tuples DTRANS
Thus DTRANS m i g h t contain the
(among others): ( DEPTYPEI DEPTYPE2 DEPTYPE3 N N NN N NN NNN NN N NNN NN NN NNNN N V V V N V
Note that the last two tuples say, in effect,
)
that if A depends on the
286
name
'B'r
value
and that B has the v a l u e
connection
therefore depends
on the name
If C is changed,
'C',
assigns
will be
lost.
assigning
the
the c u r r e n t
value
A w i l l not changel
On the other hand, (current)
value
of C to A.
This
and
if B
of B to A is a m a t t e r
of
of convention°
RR can be
incorporated RDEPEND
and the process more° FULL
Co
this c o n n e c t i o n
effectively choice
with
of C, then A has only a c u r r e n t -
back
repeated
RDEPEND~
(eg, by the expression:)
+ RR
until
On the other hand
RDEPEND
into R D E P E N D
= RDEPEND
the c a r d i n a l i t y
it may be b e t t e r
by this process
may be m a i n t a i n e d
each time
more
easily
of RDEPEND
to derive
it is called
by simple
grows
no
a new relation, for,
so that
insertion
and d e l e t i o n
of tuples.
When
the owner
of the c a t a l o g u e d
relation,
F, w i s h e s
to m o d i f y
it,
the
command: List FULL m i g h t be issued. FULL R D E P E N D '°' stands
RDEPEND:
This
for
'SELECT'.
QUALIFYING
This m i g h t
lists
AN
ORDER
imposed
a very g e n e r a l
under
rigidly
feet.
into the s y s t e m
is s a t i s f a c t o r y
'F'.
tuples
The r e l a t i o n a l
DEPTYPE N V NN V NNN
DESTROY
topic.
to be e x p e c t e d
others'
TO
by the a p p l i c a t i o n
of an a p p l i c a t i o n
each
of just those
to
to be a special
themselves,
members
is equal
in operator
Thus:
be c o n s i d e r e d
facility
~F ~
RELIDI = 'F' ( RELIDI RELID2 F E F D F C F B F A
constraints
basic
=
a selection
such that R E L I D I
FULL DEPEND:
(2)
RELIDI
A
RELATION
case of e n f o r c i n g
mo d e l upon
However
There itself.
team
for one a p p l i c a t i o n
the d e v e l o p e r s
claims
ignores
the g r o u n d
to b u i l d
from
such a facility
the p o s s i b i l i t y
development
as a
to inhibit
from cutting
is a t e m p t a t i o n This
semantic
it can also be v i e w e d
of a s y s t e m w h i c h
development
)
that what
t e a m may not be so for
another.
The
simplest
relation
such
' q u a l i f i c a t i o n I is of course
from w h i c h
another
relation
to refuse
has b e e n derived,
to d e s t r o y
any
ie, upon w h i c h
287
there is a n a m e - d e p e n d e n c y ,
until those d e p e n d e n c i e s have been
eliminated.
(3)
ENFORCING MODEL
SEMANTIC
CONSTRAINTS
IMPOSED
BY
THE
APPLICATION
To the theorist this is p r o b a b l y the most interesting use to w h i c h a r e l a t i o n a l data d i c t i o n a r y might be put.
One o b j e c t i o n to the use of relational databases stems from the fact that certain p r o p e r t i e s of conventional files,
such as d e m a n d i n g a
unique value in the key field, or being hierarchical, c o n v e n t i o n a l programming,
these
are absent.
In
'structural' p r o p e r t i e s are e x p l o i t e d
to enforce certain semantic constraints arising out of the a p p l i c a t i o n model,
such as a p a r t i c u l a r child segment h a v i n g a single parent.
However the skills of a database specialist are often needed to exploit such r e s t r i c t i o n s inherent in the a v a i l a b l e structures.
It is up to
him to ensure that his model of the a p p l i c a t i o n in terms of key fields and segment d e l e t i o n rules behaves like the r e a l - w o r l d counterpart: yet it is often rather hard for a business to find a man with intimate k n o w l e d g e of b o t h realms.
Thus it is a t t r a c t i v e for our purpose that
the t r a d i t i o n a l restrictions of key-fields
and m a n y - o n e m a p p i n g s have
to be m o d e l l e d e x p l i c i t l y in PRTV, since the p r o b l e m of e n f o r c i n g semantic constraints can then be split off from that of p r o v i d i n g a structure capable of h o l d i n g the data in the first place.
How can one use the r e l a t i o n a l algebra here d i s c u s s e d to model these sorts of update constraints?
Suppose we have a standing relation, t r a n s i e n t relation,
X, in the database,
and a
UPD_X, h o l d i n g today's new additions to X.
We w a n t
to insert into X just those tuples of UPD X whose values of the keydomain,
KEY, do not already occur as values of KEY in X.
X % (KEY), is a r e l a t i o n a l value, with just one domain, o c c u r r i n g in X.
of current keys
By joining it to UPD X we express just those tuples of
UPD X whose keys already occur in X: X % (KEY) * UPD_X By forming the
I KEY I KEY
'DIFFERENCE'
of this e x p r e s s i o n w i t h the original UPD X
we express all those tuples of UPD X whose keys do not already occur in
288
X.
We now simply
'UNION'
these with X to get NEW X.
Ignoring the
m
special d o m a i n - o v e r l a p p i n g notation, N E W _ X = NiX +
(N~UPD_X -
NEW X is given by:
(N:UPD_X * (NIX % KEY)))
Note that we have made NEW X a derived r e l a t i o n by q u o t i n g the names of relations
(N:) instead of their current values.
materialised,
NEW_X, upon being
w i l l c o n t a i n the d e s i r e d set of tuples, w h i c h may be used
to replace the current value of X in the database. that X is only ever u p d a t e d in this way.
We m u s t then ensure
A crude way of doing this is
to have the data d i c t i o n a r y keep a list of p e r m i s s i b l e a s s i g n m e n t s given RELIDs,
into
so that the a p p l i c a t i o n g e n e r a t o r will not accept a
c o m m a n d c h a n g i n g X e x c e p t those, e x p l i c i t l y catalogued, w h i c h a s s i g n NEW X into X. m
C l e a r l y a similar t e c h n i q u e can be used to insert only those tuples into X w h o s e KEYs occur in another relation, W.
The tuples w h i c h fail
to get into X can of course be r e c o v e r e d in the expression:
It is a c r i t i c a l b u s i n e s s d e s i g n i n g facilities
U P D _ X - X.
for an a p p l i c a t i o n
d e v e l o p e r to impose c o n s t r a i n t s upon h i m s e l f or his colleagues.
It
p r e s u p p o s e s that both he and we know w h a t sort of security we are s u p p o s e d to be o f f e r i n g him.
If this q u e s t i o n is not resolved,
it is
so easy to end up w i t h a security s y s t e m w h i c h n e i t h e r deters d e l i b e r a t e abuse, nor p r o t e c t s
a d e q u a t e l y against a c c i d e n t a l misuse, but appears
d e s i g n e d solely to e n c u m b e r lawful operations.
Our i n t e n t i o n s w i t h the
data d i c t i o n a r y are p r i m a r i l y to reduce the incidence of subsequent mis-modificati'ons to an application.
As time goes on, or one gets
further away f r o m the d e s i g n e r of a p a r t i c u l a r component, of the c o m p o n e n t is u n a v o i d a b l y
the m o d i f i e r
less w e l l - i n f o r m e d as to the side-
effects such m o d i f i c a t i o n s may have.
On the other h a n d we make no
attempt as yet to p r o t e c t against d e l i b e r a t e wrecking.
C o m p a r e this w i t h the
~facility'
sometimes
found in d a t a b a s e p a c k a g e s
w h i c h simply refuses to accept data w i t h d u p l i c a t e keys.
The first
non-DP user of s~ch software is i n e v i t a b l y e n g a g e d in research,
even if
he does not k n o w it, and is in the typical r e s e a r c h p r e d i c a m e n t of h a v i n g a file of grubby data he wishes
to load up, p r e c i s e l y to use the
s o p h i s t i c a t e d q u e r y facilities the d a t a b a s e package may offer to report on such things as d u p l i c a t e keys. skills,
He q u i c k l y has to learn a few DP
like how to m a n o e u v r e around the trap for d u p l i c a t e keys.
The r e l a t i o n a l data d i c t i o n a r y a p p r o a c h allows just that structure to
289
be put up first inside the database, which suffices to store the raw data,
and adequate p r o t e c t i o n to be devised later for the use of the
various relations of the appl~cation.
C o m p a r e d to the task of d e f i n i n g relations to hold and m a n i p u l a t e the data of the application,
as e x e m p l i f i e d by X, it is m u c h harder to
w r i t e a s a t i s f a c t o r y UPD X to constrain its use.
The latter is as
d e m a n d i n g as w r i t i n g a foolproof macro, w h i c h is really w h a t UPD X is. Later w o r k might c o n c e n t r a t e on p r o v i d i n g relations peg',
like UPD X 'off the
so to speak, that is as a result of some n o n - p r o c e d u r a l
s p e c i f i c a t i o n by the a p p l i c a t i o n developer, develop the skill to c o n s t r u c t them himself.
rather than require h i m to Such
'constraining'
relations may resemble the facilities available w i t h C O D A S Y L / D B T G or IMS.
A l t e r n a t i v e l y the sort of
(3),
'semantic constraint' w h i c h w o u l d be
useful in p r a c t i c e m i g h t be thought out c o m p l e t e l y afresh.
Our a p p r o a c h contrasts w i t h the C O D A S Y L / D B T G a p p r o a c h of submitting the update c o n s t r a i n t s as part of the s e p a r a t e d from the
'data definition',
'data manipulation'
activity.
to be carefully
In the e n v i r o n m e n t
d e s c r i b e d it is d i f f i c u l t to d i s t i n g u i s h b e t w e e n the two.
SUMMARY
We have tried to indicate a r e l a t i o n a l approach to many w e l l - k n o w n problems of s o - c a l l e d data definitions. the use of a data dictionary,
The key to this a p p r o a c h is
itself m a i n t a i n e d as a relational
database.
Many details of this relational data d i c t i o n a r y clearly remain to be finalised.
However the essence of a r e l a t i o n a l data d i c t i o n a r y is
that it can be i m p l e m e n t e d even before such q u e s t i o n s need be resolved. Thus the basic structure of, and facilities offered by, such a data d i c t i o n a r y can be changed e x t e n s i v e l y w i t h o u t the need to reload the stored data.
This allows of c o n s i d e r a b l e e x p e r i m e n t a t i o n w i t h i n a
p a r t i c u l a r project.
290
REFERENCES
(1)
S J P TODD:
PRTV Overview,
IBM UK Scientific Centre report No 75, 1975.
(2)
E F CODD:
A Database
Sublanguage
founded on the R e l a t i o n a l
Calculus, Proceedings
of the 197! A C M S I G F I D E T W o r k s h o p on Data
Description,
(3)
C O D A S Y L DBTG: Available
Access and Control.
Data Base Task Group Report A p r i l
from ACM, New York.
1971.
Data Base System Evaluation
Harry L. Hill, IBM
The evaluation of data base systems embraces four very significant fields, the first being the design of resource management necessary to build into the product necessary performance attributes to make that product or system an attractive saleable item. The second part is the prediction of performance for a given configuration and workload.
The third is the ability to measure the performance and confirm or deny the
expectation obtained from the predictive process;
and finally the ability to tune the
system to accommodate changes made either in the configuration that exists or the user w o r k l o a d t h a t is c u r r e n t l y p r e s e n t e d to the s y s t e m .
To cover these four elements of data base evaluation, I have chosen to describe within this paper these topics: I. Concepts of system performance 2. Performance and the development process 3. Predicting and measuring system performance 4. System performance tuning
I.
CONCEPTS OF SYSTEMS PERFORMANCE
Let us look at some of the basic concepts behind system performance.
The key ques-
tion is one of systems performance sensitivity - the problem is always to find what is in the critical path.
Fig. 1 describes clearly the approach that is taken, given
t h a t one can i d e n t i f y the b o t t l e n e c k in the s y s t e m ;
the key q u e s t i o n is t h a t if I r e m o v e
t h a t b o t t l e n e c k , at w h a t p o i n t and u n d e r w h a t c o n d i t i o n s do I hit the n e x t one ( b e c a u s e t h e r e is a l w a y s a n e x t o n e ) .
292 When we talk about the goodness of performance, i . e . how well a system p e r f o r m s , it is necessary to establish measures of goodness.
We talk about performance in the
f o l l o w i n g w a y s , as shown in f i g . 2 - in terms of t h r o u g h p u t , jobs p e r u n i t time, system data rate, n u m b e r of accesses p e r second to a storage d e v i c e , etc. perhaps more sophisticated and better ways of d e s c r i b i n g performance.
T h e r e are
For example,
t h r o u g h p u t per r e n t a l , d o l l a r s per second per access to a storage device, cost per job, cost per transaction.
These latter measures of performance tend to be more
r e v e a l i n g of the ~value to the user ~ as we sometimes call it, i . e . the cost performance trade-off.
It should be o b s e r v e d , as in f i g . 3, that there are some v e r y s i g n i f i c a n t trends in performance evaluation,
In the e a r i y days when we d e s c r i b e d performance in terms
of component o r device p r o d u c t i v i t y , you w i l l recall the measures of CPU goodness w e r e in terms of add time, s u b t r a c t time, m u l t i p l y time, etc.
We have emerged from that
somewhat p r i m i t i v e measure of performance and today we talk about performance in terms of systems p r o d u c t i v i t y , w h e r e the system is the sum of the h a r d w a r e , the software and the w o r k l o a d effects.
T o m o r r o w I am confident that we w i l l be t a l k i n g
about systems p e r f o r m a n c e not so much in terms of j u s t the system but in terms of the user r e l a t i o n s h i p to that system.
I call that ~people p r o d u c t i v i t y t, w h e r e peoplers
p r o d u c t i v i t y is geared to maximise the objectives of a g i v e n e n t e r p r i s e o r business. computing system is then but one key element in meeting a business objective.
The
This
is p a r t i c u l a r l y i m p o r t a n t for live t e r m i n a l systems w h e r e the business of a company may be t o t a l i y dependent on the a v a i l a b i l i t y and u s a b i l i t y of the total system,
System performance is r e a l l y best d e s c r i b e d in terms of the management of time spent w a i t i n g for systems resources.
Fig. q d e s c r i b e s a r e p r e s e n t a t i o n of systems resources
because that is what performance is aii about, the management of resources w i t h i n a system allocated to a g i v e n p r o f i l e of w o r k , to date behaves in this w a y ,
E v e r y s i n g l e system that has been constructed
The element of w o r k is offered to the central processing
u n i t or w o r k engine and that w o r k is executed by m e r g i n g data w i t h a p r o g r a m to a point w h e r e more data o r p r o g r a m s are r e q u i r e d .
At that point in time the processing ceases
and a request is queued in f r o n t of a storage device ( i . e . a resource) in o r d e r to obtain additional data or programs to continue or complete the processing.
W h e n that work is
completed, the processing engine proceeds on to another task. What w e have is a serial processing engine operating on elements of work who's data and programs are queued
293 in parallel against system resources.
By placing a 'meter' in the line between the
storage and the queue for processing one can get a measure in terms of transactions per second o r
system data rate.
Fig. 5 shows a plot of system performance against the number of tasks, that is, the depth o r level of m u l t i p r o c e s s i n g and the consequence on the system of these tasks executing work.
Notice that as you increase the number of tasks, the system performance increases
to the point w h e r e a bottleneck is reached and I have chosen in this case to show the channel at the f i r s t bottleneck.
If I w e r e to add channels to the system I w o u l d r e l i e v e
that bottleneck w i t h i n the system and I would hit the next one w h i c h I have, in this case shown to be storage devices.
So performances p r o g r e s s t h r o u g h ' c e i l i n g s ' o r b o t t l e -
necks.
Work that is presented to a computing system does not r e p r e s e n t a constant load on all resources.
In f i g . 6 1 have shown d i a g r a m a t i c a l l y a time v a r y i n g w o r k l o a d effect on the
system w h e r e the height of each pedestal represents 100% u t i l i s a t i o n of that resource notice that I am showing only 3 resources, a channel, a CPU and a d r i v e device.
The
point is that not all of the time is any one resource the bottleneck, b u t the bottleneck changes from rsource to resource depending upon the demand of the time v a r y i n g w o r k load placed against it.
When that resource is 100% u t i l i s e d , it c l e a r l y forms a black mark
on top of the pedestal, so by r e m o v i n g that bottleneck, that is, b y p u t t i n g a more p o w e r ful CPU in or a l a r g e r number of channels, this serves to i m p r o v e the overall system performance.
C l e a r l y we are seeking an economic design w h e r e the number of black
marks on top of the pedestal is reasonably balanced, that is, resources are not wasted. Fig. 7 depicts a system transaction rate versus a time v a r y i n g w o r k l o a d , and a similar argument applies.
All transaction-based systems tend to behave in a s i m i l a r way and f i g . 8 shows a t h r e e dimensional plot of response times versus real storage v e r s u s transaction traffic rate. Notice that as the real storage available for processing is decreased, the response time increases.
S i m i l a r l y , as the transaction traffic rate increases, the response time increases
and all systems tend to behave this way.
It should be realised that in v i r t u a l o p e r a t i n g
systems the decrease of storage causes an increase in paging rate.
Under these c o n d i -
tions the CPU u t i l i z a t i o n g e n e r a l l y decreases and the system g r a d u a l l y becomes I/O bound.
294
2,
PERFORMANCE AND THE DEVELOPMENT PROCESS
As data base systems have g r o w n and become sophisticated, it is necessary to achieve not o n l y good p e r f o r m a n c e , but p r e d i c t a b l e performance. d e v e l o p m e n t process of the p r o d u c t .
This has to be b u i l t into the
I should like to take as an example the development
of storage w h i c h is a key resource in any data base system.
Fig. 9 shows a typical
d e v e l o p m e n t process w h i c h , in the e a r l y days of the c o m p u t e r i n d u s t r y ,
started off w i t h
the research and development of what I w o u l d d e s c r i b e as the basic parameters of the storage d e v i c e .
These parameters w e r e offered to e n g i n e e r i n g g r o u p s who designed them
into p r o d u c t s and we developed on that basis the w e l l - k n o w n d i s k d r i v e .
The d r i v e s
w e r e offered to the CPUs and were i n t e g r a t e d w i t h software systems w h i c h in t u r n were offered to i n d u s t r i e s to c o n f i g u r e and use on behalf of that i n d u s t r y , and those i n d u s t r i e s designed those systems together w i t h t h e i r a p p l i c a t i o n s to generate useful data processing facilities.
The p o i n t is that in the e a r l y days we started off w i t h the basic technology
and we d i d w h a t is d e s c r i b e d as a ' b o t t o m - u p ' design - that is how the technology of the i n d u s t r y g r e w up~ storage device
tf we look today at the basic r e l a t i o n s h i p of the d i r e c t access
(fig. 10) you w i l l see that only certain combinations of those basic
parameters are of i n t e r e s t to the systems d e s i g n e r , such as data rate and access times areal d e n s i t y is f r a n k l y not v e r y s i g n i f i c a n t to the system d e s i g n e r . size decreases data rate becomes less i m p o r t a n t than access time.
S i m i l a r l y as block The consequence of
this ' b o t t o m - u p ~ d e v e l o p m e n t process has been that we have decreased in a r a t h e r d r a m a t i c w a y the effective cost to the user of storage.
The decrease in storage cost as seen by the user is shown in f i g .
11, i . e . the r e l a t i o n -
ship between d o l l a r s per megabyte p e r month for a v a r i e t y of products versus the year of announcement.
In f i g .
12 you w i l l also notice the access rate c h a r a c t e r i s t i c s w h e r e the
accesses p e r d o l l a r and the accesses p e r second are shown for the same range of p r o d u c t s . If we are to look now at f i g .
13 we w i l l see that the storage technology spans a range of
access times, storage capacities and cost p e r b i t . the gap in the continum of storage d e v i c e s ,
T h i s f i g u r e is i n t e r e s t i n g - o b s e r v e
T h i s gap occupies the same time domain as
task s w i t c h i n g in several of the medium and high speed p r o c e s s o r s .
The technology for
storage and data base systems is rich - rich in function and rich in performance and in cost choices.
T h e r e is in fact sufficient technology to reverse the process and instead
of doing a ~bottom-up' d e s i g n , to take the r e q u i r e m e n t s of modern applications and do a ~top-down ~ design (again see f i g . 9 ) , that is, to define the systems and the a p p l i c a t i o n s that are r e q u i r e d in a business or e n t e r p r i s e and to map them into the technology.
295
3.
PREDICTING AND MEASURING SYSTEM PERFORMANCE
The timely development of performance tools forms an essential p a r t of d e v e l o p i n g a computing system.
It has two major c h a r a c t e r i s t i c s .
One, it is i m p o r t a n t to be able to
p r e d i c t the performance of a complex data base/data communications system p r i o r to either the h a r d w a r e o r the software being in existence and two, it is i m p o r t a n t that having p r e d i c t e d it and b u i l t it, it is i m p o r t a n t to be able to measure it and validate the p r e d i c tion.
The l e a r n i n g process is being able to d e s c r i b e differences.
The essential objective in developing performance tools is to be able to establish a d i s c i pline both for d e v e l o p e r s and subsequently for users of a v o i d i n g s u r p r i s e s in performance, since late d i s c o v e r i e s are hard to c o r r e c t .
Fig. 14 d e s c r i b e s this o b j e c t i v e and describes
the methods that are g e n e r a l l y used to achieve them, that is, to develop models, to v a l i date those models, to be able to t r a c k the i n s t r u c t i o n path length w i t h i n a system and, as k n o w l e d g e is gained, to be able to document that e x p e r i e n c e and c o n s t r u c t a v o c a b u l a r y that communicates both the p r e d i c t i v e and the measurement processes. the process.
Fig. 15 shows
T h e r e are r e a l l y two types of p r e d i c t i v e c a p a b i l i t i e s , one is analytic and
the other is s i m u l a t i v e .
In the measurement area there are two types of facilities r e q u i r e d
to produce the data necessary for measurement;
one is h a r d w a r e and the o t h e r is soft-
w a r e monitors.
Measurement is both time consuming and expensive, therefore there has been s i g n i f i c a n t emphasis and p r o g r e s s placed upon the development of models in o r d e r to d e t e r m i n e the performance of a system, w h i l e measurement techniques are i n c r e a s i n g l y used to validate these models so that performance information and g u i d e l i n e s can be generated spanning a range of a p p l i c a t i o n s , configurations and w o r k l o a d demands.
It should be recognised,
h o w e v e r , that m u l t i p l e sub-systems o p e r a t i n g w i t h i n one o p e r a t i n g system are often hard to handle by conventional a n a l y t i c means, and one is forced to c o n s i d e r h y b r i d s of a n a l y tic and simulative techniques.
It is most important that the d e v e l o p e r o r user of a model
has c l e a r l y in his mind the question he wants the model to a n s w e r .
Rarely is a general
purpose model sensitive to questions that w e r e not known at the time the model was developed.
It is perhaps useful to examine a data base/data communication system from a performance standpoint, and for this I have chosen IMS/VS and have constructed a flow chart for the main processing blocks of that system.
Fig. 16 shows the flow of such a transaction;
296 notice that it d i v i d e s itself into three major p a r t s . s w i t e h i n 9 and message queues are handled;
The communication p a r t w h e r e message
the processing of that message against
p r o g r a m and data and the m u l t i p l e calls to that data base for that p a r t i c u l a r transaction; the completion of that transaction and the generation of the o u t p u t message in the message queue, and the h a n d l i n g of that message t h r o u g h a t e r m i n a l access method to a t e r m i n a l . That is, if you l i k e , the life of a transaction;
it is born at the terminal w h e r e it enters
the system and it dies at the terminal when the transaction is completed.
If we were to
place 'meters' in the lines j o i n i n g those function to queues and l i b r a r i e s , e t c . , we could in fact measure the a c t i v i t y that is g o i n g on w i t h the system.
As we pass m u l t i p l e messages
into such a system, we see that the p r o b l e m of performance resolves down to the allocation of resources, CPUs, channels, p r o g r a m s and data to handle the r e q u i r e m e n t s of each d i f f e r e n t transaction.
The job, then, is to define a l g o r i t h m s for u s i n g resources and for
w a i t i n g for resources.
These a l g o r i t h m s s t a r t w i t h w h a t p r i o r i t i e s are associated w i t h
each transaction type and must include r e c o v e r y strategies in the event that a resource, a data path o r a queue d i s c i p l i n e fails.
A v a i l a b i l i t y and performance are becoming i n c r e -
a s i n g l y dependent upon r e c o v e r y schemes d e s i g n e d into the p r o d u c t .
There are really only two ways of improving the performance of a data base/data communication system.
One is to shorten the transaction path length and the other is to
provide either faster or parallel processing resources.
It is thus often desirable to be
able to calculate the n u m b e r of instructions executed on behalf of an IMS transaction. Fig. 17 shows a typical appraoch to such a problem, where T is the total instructions executed for the IMS transaction, KI through K5 are coefficients representing various IMS and V S releases; Q , U , N and C represent major parameters of most importance and significance in terms of o v e r a l l systems p e r f o r m a n c e .
N o w if we were to take these transactions and were to apply values to those parameters, it is conceivable that one could divide the instruction processing capability of the machine by the path length of the transaction and come up with a theoretical m a x i m u m number of transactions per second that that resource could process, given that the processing unit was in fact the major bottleneck in the system.
This has been done in fig. 18 and shows
the difference in transactions per second processed for an 85% utilised 158 and 168.
It
should be clear that these are not measured values, they are predicted values, and are shown merely to demonstrate the sensitivity of system performance to changes in the key parameter values that affect it.
297
Fig. 18 is, then, designed to show the s e n s i t i v i t y of a system to changes in the major parameters that affect the system performance.
Again this is not a measured e n v i r o n m e n t
this is a p r e d i c t e d e n v i r o n m e n t and i t is p r o b a b l y not possible to a c c u r a t e l y r e p r o d u c e this in a measurement e n v i r o n m e n t w i t h o u t r i g o r o u s l y d e f i n i n g several other i m p o r t a n t system and user dependent factors.
It does, h o w e v e r , also show on the same theoretical
basis the difference in path length between an MVS system and an MVT system. T r a d i t i o n a l l y , it is thought that the systems that have h i g h e r sophistication have longer path lengths and whereas in general this is t r u e , it is clear that in the MVS system, as the data base call s t r u c t u r e becomes more complex, the difference in path length d i m i n i shes s i g n i f i c a n t l y in favour of MVS.
Independent of the investment made in d e v e l o p i n g and using models of the system, it is essential to measure the real t h i n g as r a p i d l y as p o s s i b l e .
One method used in IBM is
shown in fig. 19, w h e r e a simulated n e t w o r k is represented in both h a r d w a r e and softw a r e and a data base is constructed to r e p r e s e n t the application and system data bases. The simulated n e t w o r k is p r o g r a m m e d to generate s c r i p t s at a g i v e n i n t e r v a l and w i t h a g i v e n think time, or range of think times, such that the system u n d e r test appears to be loaded w i t h transactions as though they w e r e coming from real t e r m i n a l s .
By the a p p l i -
cations of suitable h a r d w a r e probes and suitable software probes, we are able to measure the u t i l i s a t i o n of resources o c c u r r i n g w i t h i n the system u n d e r a v a r i e t y of transaction rates, types and call s t r u c t u r e s .
A typical measurement is shown in f i g . 20, in this
case an IMS/VS 1.0.1 system r u n n i n g u n d e r VS2 release 2.
Notice the l i n e a r CPU u t i l i -
sation as transaction rate goes up on this 158 CPU w i t h 2400 Baud lines and 4800 Baud lines.
The measurement in question is designed to e x p l o r e the s e n s i t i v i t y of line speeds to system performance.
Note that in the 2400 Baud lines case, w i t h ten lines, the line u t i l i s a t i o n
became a s i g n i f i c a n t bottleneck in the system and this is evidenced by the response times s t a r t i n g to rise r a t h e r r a p i d l y , whereas at 4800 Baud line speed, the response time is well contained.
System performance can be v i e w e d in two ways and f i g . 21 shows that we are e i t h e r using a resource o r we are w a i t i n g for it.
Let us now take the flow c h a r t (fig. 16) that we
developed to show the life of an IMS transaction.
Let us look at that flow c h a r t w i t h
respect to the time we spend w a i t i n g for a resource, that is, w a i t i n g for a l i n e , w a i t i n g
298
for b u f f e r s , w a i t i n g for a processing region, w a i t i n g for an application p r o g r a m to be b r o u g h t in, w a i t i n g for I/0, that is, storage accesses to b r i n g data o r p r o g r a m s into the system, w a i t i n g for Iines to handle the o u t p u t message and w a i t i n g for services to t r a n s mit that message to the t e r m i n a l . rces.
Let us also look at the amount of time using the resou-
Fig. 22 shows, and it is d r a w n to scale, w h e r e if this were 8 inches long, the
response time from b e g i n n i n g to end w o u l d be 1 second, making 3 loops around the DL1 call.
It is also clear, as we approach a 100% u t i l i s e d system, the units of processing "
occupy a smaller and smaller p o r t i o n of the total response time.
This c h a r t shows the
w a i t i n g time and processing time for o n l y one transaction w i t h i n a 75% toaded system.
4.
SYSTEM
PERFORMANCE
TUNING
T h e goodness of performance then, of a data base/data communication system is balancing or tuning two things.
It is balancing the supply of resources with the d e m a n d on them,
because we are either waiting for that supply or w e are using that supply.
Fig. 23 shows
this balancing scheme.
If w e have a high supply with respect to the demand,
are wasting resources.
If w e have a h i g h d e m a n d with respect to the supply of resources
w e are going to suffer poor response times.
In general, performance is a user option
since it requires the additon of resources and these generally cost money; is that the case.
but not always
!n some cases, it is necessary and possible that the resources be tuned
to meet the d e m a n d of the workload. elements s h o w n
then w e
'Performance tuning is concerned primarily with the
in fig. 24, being data base profiles, transaction profiles, profiles of the
IMS system, of the processing requirements of the region, of the hardware and software configuration, of the overall teleprocessing configuration, and importantly, the use of tools to measure these resources.
Fig. 25 shows the primary factors affecting the performance and the design of the system. T h e n u m b e r of transactions per second is typically in the range of I to 50, although within the next five years I a m confident that you will see that range g r o w towards 200 transactions per second,
in terms of E X C P s
0.1 to 5 per data base call.
per call, w e are looking today in the range
In terms of calls per transaction, w e typically find a n y w h e r e
from 5 to 50 calJs with several transaction types exceeding 50 and reaching close to 100 calls per transaction, so the data base designer is faced with designing a system of resources which can efficiently and economically accommodate the range of performance critical factors.
299
The tuning of data base systems is c l e a r l y a complex matter i n v o l v i n g f i r s t l y an awareness of u t i l i s a t i o n of resources, and secondly the u n d e r s t a n d i n g and k n o w l e d g e about the sens i t i v i t y of changing the resource allocation to achieve an o v e r a l l system performance level. The objective then is shown in f i g . 26 - either minimise the transaction path length a n d / o r invoke p a r a l l e l i s m of key resources.
The method recommended is f i r s t l y to q u a n t i f y
the p r o f i l e s of the transaction and of the system; in response to changes in the w o r k l o a d ;
understand the b e h a v i o u r of the system
use software monitors to q u a n t i f y that b e h a v i o u r
and r e s o r t to h a r d w a r e monitors which do not i n t e r f e r e w i t h the processing c h a r a c t e r i s t i c s of the system;
to define experiments to uncover and o r d e r the bottleneck;
changes, one at at a time, to the system and measure the effects.
and to make
Only by measurements
do we r e a l l y get smart.
Performance tuning can be an iterative process because what one is t r y i n g to do is to optimise the u t i l i s a t i o n of resources and match them against the w o r k l o a d .
F r e q u e n t l y that
w o r k l o a d is changing and one's job is not done until one has resolved the differences between what one expects, that is the expectation of performance, and what one has a c t u a l l y got.
If there is s i g n i f i c a n t differences between those two elements, then c l e a r l y
there must be an explanation w h i c h always seems to lie in better u n d e r s t a n d i n g of what the system is d o i n g . system.
I mentioned the c o m p l e x i t y of tuning a data base/data communication
It is c e r t a i n l y not true that e v e r y one behaves d i f f e r e n t l y .
T h e r e are some
typical causes of bottlenecks which are f r e q u e n t l y uncovered and those r e a l l y fall into three categories, as shown in f i g . 27 - resources of a teleprocessing n e t w o r k - balancing of those resources and the selection of b u f f e r sizes and message format buffers; r e g i o n resources, that is the amount of p r o g r a m loading that is done; the size of application p r o g r a m s ;
the s t r u c t u r e and
the s t r u c t u r e and the size of the data base;
of extended function w i t h i n that data base structure;
the
the use
and lastly, the CPU resources,
w h e r e its use is determined l a r g e l y by the amount of system and user I / 0 and the use of bufferpool services.
F i n a l l y , I should lilte to discuss trends w i t h i n data base/data communication system performance.
Those trends r e a l l y fall into three broad areas - trends in p r e d i c t i o n ,
t r e n d s in measurements and trends in t u n i n g .
1 t h i n k that over the next five years we
are g o i n g to see generalised use of analytic tools for dedicated systems and some g u i d e lines based on analytic tools for m i x e d systems.
We are going to see the specific use of
simulation and h y b r i d tools for m i x e d or complex systems.
We are also going to see the
a v a i l a b i l i t y of tools at an e a r l y p o i n t in the design of systems to help users choose
300
amongst d i f f e r e n t c o n f i g u r a t i o n s w h i c h have d i f f e r e n t p r i c e performance c h a r a c t e r i s t i c s .
In terms of measurement t r e n d s , we are going to see integrated software performance m o n i t o r s , because b a s i c a l l y performance is a user option and it is p r o p e r that the user understands what the system is d o i n g and w h a t choices he has to change it.
Where a
software m o n i t o r impacts the basic b e h a v i o u r of the system, we are g o i n g to see i n t e g rated h a r d w a r e b u i l t into the p r o d u c t to facilitate measurement and so be able to monitor the p e r f o r m a n c e w i t h little o r zero o v e r h e a d .
We are going to see selective performance
r e p o r t g e n e r a t i o n , and we are going to see d y n a m i c performance information and monitor i n g of key resources, so that information can be made a v a i l a b l e to a user to p e r m i t him to manage his system in line w i t h some o v e r a l l strategic d i r e c t i o n that has known cost p e r f o r m a n c e trade-offs ~
Lastly, in performance t u n i n g ,
I believe that we are going to see a family of tools a v a i l -
able for the design of major components.
That is, the design of TP n e t w o r k s , of data
bases, of m u l t i p r o c e s s i n g systems to p e r m i t the d e s i g n e r at an e a r l y stage to become f a m i l i a r w i t h the b e h a v i o u r of those elements of the system that are l i k e l y to be a system bottleneck.
We are going to see system-managed p e r f o r m a n c e generation r e p o r t s , and
t u n i n g controls that are made available on an open loop basis.
It is conceivable that in
the next five to ten years many of the t u n i n g controls can be architected into a closed loop system so that the system is able to tune itself, and at this p o i n t I refer to t u n i n g of the system in terms of allocating resources in accordance w i t h a p r e d e t e r m i n e d set of performance s t r a t e g i e s .
Some of these can be d e t e r m i n e d b y the m a n u f a c t u r e r and some
w i l l be d e t e r m i n e d and selected by the end u s e r .
T h i s concludes my presentation on the Evaluation of Data Base Systems.
301
Concepts of System Performance Sensitivity The Problem" Find What's in the Critical Path, i.e., What's the Bottleneck A n d . . . What's the Payoff When I Remove That Bottleneck and Hit the Next One. Fig. 1
Because... There Always is a Next One
Performance Measures of Goodness How Can We Talk About Performance? Thruput (Jobs/Unit Time) System Data Rate
# Accesses/Sec # Terminals Supported Terminal Response Time
Or Perhaps: Thruput/Rental $/Sec/Access Cost/Job
Fig.2
Cost/Transaction
302
Trends in Performance Evaluation Notice the Trend from" Component or Device Productivity To
System Productivity (System
= Hardware + Software
+ Workload)
To
People Productivity Fig. 3
(People Productivity
= Maximized
Enterprise Objectives)
A Representation of System Resources Q
Key CH - Channel D - Device
Work Demand Q
Q - Queue
Q
Fig. 4
Transactions/Sec
• •
t I
I i
•
I
I
303
A W a y to Think About Bottlenecks ~ , , \ \ \ \ \ \ \ \ \ \ \ \i \ \ \ \ \ \, \ \ \ \ \ \,\ \ \ , ,
System Performance
i
CPU
/
/
o.,,.
(e.g.: System Data Rate)
~/ Fig. 5
I
I
I
I
I
,
1
2
3
4
5
n Tasks
SYSTEMS PERFORMANCE VS TIME FOR A TIME VARYING WORKLOAD SYSTEMS
T
PERFORMANCE[
j~2..~ CPUBOUND
~CHANNE
J o
~ OR CPU CH Fig. 6
~-30
L
BOUND
304
TRANSACTION RATE VS TIME FOR A TIME VARYING DBDC WORKLOAD o
~RESOURCE | UTILIZATION
~ 4 X / / /
I
~-
l
~/~/_/ .I
J
I
TRANSACTION RATE
2X
(T/SEC) 1~6X
1~)0 ¢:. ~
/ / /
/
DASD R.O.T.P. CPU Fig. 7
DBDC PERFORMANCE RELATIONSHIPS
t j
REAL STORAGE Fig. 8
TRANSACTION TRAFFIC
RATE
305
The Development Process A View of the Development Process Parameters Researchand
"'
Develop
I Configure ~ ~ . t
Bits/Inch Tracks/inch " AccessTime RotationSpeed Capacity '
]. 135 :"' 745 158 ' ~ 168 155/165
Products "1 " "t ;'j
E " ng,neer
1 t t
t
~vs, L
Integrate ~ ~
~
VS2 vM/370 VS2/2
CPU's
Applications
Fig. 9
DASD Parameter Relationships
•
~
I Densityl
~
Capacity Fig. 10
~
I Ba"d I ~
IRotatio°f - - ~ I Period
Data Rate
Access Time to Data
306
The Cost of A t t a c h e d Storage 160
f ----F"
i
-~ 1 7
I
{
t
{ ---[-
1
{
~
~
i
t
1
I
1311 120
$/MB/Month
3O5
80
]
40I 2314 A - . . ~ . . .
ot
3330
3340 3330-11
,I
54
58
56
60
Fig. 11
62 64 66 Year of Announcement
68
70
72
74
Access Rate Characteristics 45 ,
4o
3340 ©
I
// // / /
3~
/ 25 I
~
3330
/
30
/
25
/0 / /'/" >~/~334
Accesses/$ (X 103)
20 ~
20
p~
15
15
I
10
///
i
/
5 ~0
Fig. 12
54
314
/
2311
5
T
I
4
1311 ~
1
J
~
t
1
56
58
60
62
64
66
68
70
72
Year of Announcement
0 74
Accesses/Sec to 1200 Bytes
307
Present Storage Technologies I00 I0 1
,I
i
'
.01 (Cost (C/bit) ,001 .0001 .00001 .000001 tM 10M ~0M lOB IOOB 1T
Fig. 13
10 ns
100 ns
1 ~s
10 tJs
100 1 10 100 pS ms ms ms Average Access Time
1 s
10 s
OBJECTIVES AND METHODS Objective • DON'T CREATE SURPRISES IN P E R F O R M A N C E LATE DISCOVERIES ARE H A R D TO CORRECT
Method • DESIGN TOOLS (MODELS) TO ASK/ANSWER QUESTIONS IN A DISCIPLINED WAY • DO IT E A R L Y TO I N F L U E N C E DESIGNERS • SPECIFY A N D TRACK PATH LENGTHS • V A L I D A T E MODELS AND MEASURE TO GET SMART • WHEN Y O U ' R E S M A R T Fig. 14
DOCUMENT IT
Storage Capacity (bits)
308
PERFORMANCE TOOL DEVELOPMENT ~PREDICTION ~
~E'ASuREMENT~
MODELS
MONITORS
+ ANALYTIC
+----
+
SIMULATIVEHARDWARE SOFTWARE
1
1 VALIDATE
s,,
I !
I -
PERFORMANCEINFORMATION AND GUIDELINES
Fig. 15
MAIN PROCESSING BLOCKS OF A TRANSACTION IMS/VS
I TO O~
~ MESS
4
I
~OFM )
,
MESSAGE
Fig.16 Q ~ )
k
uEuss \
309
IMS PATH LENGTH ANALYSIS HOW MANY INSTRUCTIONS ARE EXECUTED ON BEHALF OF AN IMS TRANSACTION? T = ( K t + K11) + ( K 2 x Q ) + ( K 3 x U) + N [ K 4 + (C x K 5 ) ] K 1.... K s ARE COEFFICIENTS REPRESENTING VARIOUS IMS AND VS RELEASES. Q = FRACTION OF INQUIRY TRANSACTIONS U = FRACTION OF UPDATE TRANSACTIONS N = NUMBER OF DATA BASE CALLS/TRANSACTION C = NUMBER OF DATA BASE lOS/CALL T = TOTAL INSTRUCTIONS EXECUTED FOR ONE IMS TRANSACTION Fig. t 7
IMS PATH LENGTH ANALYSIS IMS TRANSACTION PATH LENGTH (INST R x 103)
VS MVT
154
160
176
239
162
169
185
247
114
124
148
243
114
124
148
243
12.3 m
11.8
14IMS/MVS TRANSACTIONS PER SECOND FOR 85% CPU UTI L I ZATION ON 158, 168
12 -'
" '
12,5
11.3
10 -
8.3 8 6 -
• 5.5,
5.3
8.1
= 5.2
4.8
~
4.~
3.6
4-
116811581
10.8
I
3,4
2 0 3
5
10
30
3
5
10
30
I O'S/CAL L
3.3
2.0
1.0
0.3
3.3
2.0
1.0
0.3
O/O I N Q U I R Y
0.5
0.5
0.5
0.5
0
0
0
0
O / 0 UPDATE
0.5
0,5
0.5
0,5
1.0
1.0
1.0
1.0
CALLS/TRANSACTION
Fig. 18
310
PERFORMANCE MEASUREMENT ENVIRONMENT . . . .
CHECK . . . .
-tko~
~
TEST/360
SYSTEM
/\ DASD - SYSTEM - DATA BASE
SIMULATED NETWORK M, CTL UNtTS/L~NE N, TERMINALS/CTL UNIT Fig. 19
100 90
IMS PERFORMANCE MEASUREMENT LINE COMPARISON
tMS/VS 1.0.10NVSZ/2 10 LINES, 300 TERMINALS 2400, 4800 BAUD LINES 158 CPU
80 z O
-4
70 z
o
< 60. N_ -J
LU (/3
50/
40-
RESPONSE
LU
/
UJ 03
3020-
z o
10-
n¢
03 UJ
1 Fig. 20
2
3
4
5
TRANSACTIONS PER SECOND
6
7
311
WHAT IS PERFORMANCE A SYSTEM OF RESOURCES (CPU, CHANNEL, DASD, TP, STORAGE, PROGRAM, QUEUE, LOCKS,.. )
USE OF RESOURCES
WAITING FOR RESOURCES
(UTI LIZATION)
(WAIT/RESPONSE TIME)
DEFINE WHAT YOU MEAN BY PERFORMANCE
TIMING AN IMS TRANSACTION ELAPSED TIME
IMS FUNCTION
INPUT
WAIT FOR T.P. LINE
I
INPUT - OUTPUT F
INPUT TERMINAL
INPUT MSG HANDLING ....
MSG Q MFS LOG
V
PROCESSING WAIT FOR MPP
IMS/VS 1.0:1 370/158 4800 BAUD, 3270 R
V
PREPARATION OF L \ \ \ \ \ \ \ \ \ \ \ \ \ I - - - ---APPLICATION PROGRAM - -
ACB APPL PGM LIB MEG Q
PROCESSING PER DL/1
DATA BASES
CALL
DYN LOG
3x OUTPUT
WAIT FOR I T.P. LINE I ,,,
~ OUTPUT MEG HANDLING
Fig. 22
~
t
I --
MEG Q MFS
--
ouTPuT TERMINAL
- -
Fig. 21
312
DBDC PERFORMANCE TUNING Supply ~
Resource <,~ Demand App L
CPU WASTED RESOURCES
TP -
-
J
POOR PERFORMANCE
TRANS., RATE
BALANCED SYSTEM
STORAGE
DB DESIGN
DEVICES
DB CALLS
TUNING
> BALANCE RESOURCE SUPPLY AND DEMAND
Fig. 23
DBDC PERFORMANCE TUNING Primarily concerned with: DATABASE PROFILES • TRANSACTIONS PROFILES IMS PROFILES MPP PROCESSING REQUIREMENTS HARDWARE CONFIGURATION OPERATING SYSTEM PROFILE ® TELEPROCESSlNG CONFIGURATION . OTHER
and the use of tools to measure critical parameters Fig. 24
313
PRIMARY FACTORSAFFECTING PERFORMANCE/DESIGN PARAMETER
TYPICAL VALUES
- #TRANSACTIONS - #EXCPS/CALL - # CALLS/TRANS
1 - 50 0.1 5 5,0 - 50
Fig. 25
A DBDC TUNING APPROACH Objective • •
MINIMIZE THE TRANSACTION PATH LENGTH, INVOKE PARALLELISM OF KEY RESOURCES.
Method •
QUANTI FY PROFI LES - TRANSACTIONS, SYSTEM CON FIGU RATION AND PERFORMANCE GOODNESS, = UNDERSTAND SYSTEM BEHAVIOR IN RESPONSE TO WORKLOAD. • USE SOFTWARE MONITORS TO QUANTIFY BEHAVIOR (4 TIME), MAYBE - HARDWARE MONITORS AND DETAILED TRACE, • DEFINE EXPERIMENTS TO UNCOVER AND ORDER BOTTLENECKS, • FORM IMPROVEMENT HYPOTHESIS, MAKE CHANGE, MEASURE EFFECT. •
DOCUMENT EXPERIMENT AND RESULTS. GET SMART.
Result e OPTIMUM UTILIZATION OF SYSTEM RESOURCES TO MATCH WORKLOAD. • RESOLVE DIFFERENCE BETWEEN EXPECTED AND ACTUAL PERFORMANCE. Fig, 26
314
TYPICAL CAUSES OF DBDC RESOURCE BOTTLENECKS TP RESOURCES
BALANCING NETWORK LOADING SiZE OF TP BUFFERS SIZE OF MESSAGE FORMAT BUFFERS
REGION RESOURCES
AMOUNT OF PROGRAM LOADING STRUCTURE AND SIZE OF APPLICATION PROGRAMS DATA BASE STRUCTURE AND # CALLS ® USE OF EXTENDED IMS FUNCTIONS AMOUNT OF I/O
CPU RESOURCES
AMOUNT OF SYSTEM AND USER I/O ® USE OF BUFFER POOL SERVICES
Fig. 27
Datensicherheit in DatenbanksFstemen Hartmut Wedekind, Technische Hochschule Darmstadt
Zusammenfas.su.n~ Die Begriffe "Datenschutz","Datensicherheit"
und "Datenintegrit~t"
werden in der Einf~hrung gegeneinander abgegrenzt.
Im ersten Haupt-
teil werden die Sicherheitsmagnahmen behandelt, die sich auf technische und organisatorische Belange beziehen. Die Prozesse der Identifikation und Authentifikation,
die organisatorische Bildung yon
Schichten, Bereichen und Berechtigungsmatrizen sowie kryptographische Methoden stehen im Mittelpunkt der Betrachtungen. Der zweite Hauptteil befa~t sich mit Sicherheitsmodellen.
Unter Sicherheits-
modellen verstehen wir die sprachliche Fixierung der Sicherheitsbedingungen, um diese in ein Datenverwaltungssystem einbringen zu kSnnen. Eine Datenbank beinhaltet alle gespeicherten Daten, ein Datenverwaltungssystem alle Verfahren zu ihrer Handhabung. Wir unterscheiden deskriptive (nicht prozedurale, deklarative)
Sicherheitsmodelle
, die ffir
Relationale Datenbanksysteme vorgeschlagen wurden, von prozeduralen Modellen, wie sie z.B. im DBTG der CODASYL-Gruppe f~r hierarchische Datenbanksysteme vorgesehen sind. 1.
E..infahr.ung
Die Begriffspaare und Datensicherheit
Datenschutz
so nahe beieinander , da6 eine vor der
und Datensicherheit
und D a t e n i n t e g r i t g t gegenseitige
Behandlung yon Einzelheiten
auf der
auf der anderen
einen
Seite
Seite liegen
Abgrenzung der Begriffe
erforderlich
ist.
U n t e r dem Thema
"Datenschutz" soll die Frage beantwortet werden "Was und wovor ist zu sch@tzen"
(15). Man bemfiht sich in dieser Disziplin um die Erarbei-
tung yon Rechtsnormen und Organisationsvorschriften
die festlegen, was
aus ethischen, sozialen, wirtschaftlichen oder nachrichtendienstlichen Gr~nden nicht jedermann zuggnglich sein soll oder nicht in eine Daten-
316
bank eingebracht werden darf. Der Datenschutz ist besonders wichtig im Hinblick auf personenbezogene Daten. Aber auch for Firmendaten (z.B. Kundensta~daten,
patent- oder lizenzf~hige Daten) und for
Daten der 6ffentlichen Verwaltung (z.B. Daten Ober Baulandplanung) besteht ein Schutzinteresse. In Amerika ist ein Datenschutzgesetz ergangen, in Deutschland existiert ein Gesetzesentwurf der Bundesregierung. Innerhalb der Datensicherheit interessiert man sich for die Frage "Wie ist zu schOtzen"~ Wit wollen uns in dieser Arbeit beschr~nken auf die Fragen der Gew~hrleistung einer Zugriffssicherheit, dutch die unberechtigte Zugriffe abgewehrt werden. Was ein unberechtigter Zugriff ist, wird dutch Datenschutzvorschriften festgelegt. Wir klammern die physische Datensicherheit aus. Hierunter werden Probleme der baulichen Ma~nahmen in Rechenzentren, die Schl~sser- und SchlOsselverteilung, das Anbringen yon Schreibringen bei B~ndern, die Ber@cksichtigung des Feuerschutzes und die Wiederherstellung zerst6rter Dateien verstanden. Die physische Datensicherheit befa6t sich mit der Sicherung vor Datenverlust. Die Datenintegrit~t betrifft die Genauigkeit der Daten. Die Daten mOssen Integrit~tsbedingungen genOgen, die sich im einfachen Fall auf Datenfelder mit einer Datentypdeklaration beziehen; in komplizierten F~llen geht eine Integrit~tsbedingung @ber viele Dateien eines Datenflu~planes hinweg. Die Teile-Nr. einer Auftragsdatei mQssen z.B. eine Untermenge der Teile-Nr. sein, die in der Teilestammdatei aufgef@hrt werden. Abstimmkreise der kaufm~nnischen Praxis sind Integrit~tsbedingungen, die sehr komplizierter Natur sein k~nnen. Integrit~tsbedingungen sind Qualit~tsbedingungen der Datenbank. Datensicherheit und Datenintegrit~t nennt Date ( 9 )
zu recht Zwillings-
probleme. Datenintegrit~t ist die Forderung nach Fehlerlosigkeit der Datei; demgegenOber orientiert sich die Datensicherheit am Zugriff. Beide Probleme erfordern die Formulierung und das Einbringen von zus~tzlichen Bedingungen. W~hrend bei der Datensicherheit die Bedingungen aus den abstrakten Normen des Datenschutzes abgeleitet werden, ber~cksichtigen die Integrit~tsbedingungen das verwendete Datenmodell und die konkreten Spezifikationen der Miniwelt der Benutzer. Im Rahmen der Datenintegrit~t wird auch das Problem des m0glichen Integrit~tsverlustes durch einen gleichzeitigen Knderungszugriff behandelt
(shared access).
317
2. Sicherheitsma~nahmen 2.1 Identifikation Wenn ein Benutzer
und Authentifikation zu einer Datenbank
mu~ er sich zuerst identifizieren, wer er ist. Diese Identifikation sie mu~ authentifiziert tifikation Kennwort
werden.
(DB) Zugang haben will,
mu~ auf Richtigkeit
Tabelle kann noch weitere ferner vermerkt werden, werden,
@berhaupt
vom DB-System verwaltet.
Die
wie Personal-Nummer,
enthalten.
welche Terminale ob der Benutzer
In der Tabelle
kann
benutzt werden d@rfen.
an einem vorher
heute dadurch vereinfacht und
da~ maschinenlesbare
Personenidentifikation einher.
Die Kenn-
Ausweiskarten
Damit
identifizierten
arbeiten darf. Der Proze~ der Identifikation
bei manchen Terminalen gemacht,
sein,
zur Iden-
ist bei DB Systemen das
Personenstammdaten
Name und Datum der Kennworterteilung
Terminal
Mittel
Jeder Benutzer bekommt ein Kennwort.
worte werden in einer Kennworttabelle
kann @berpr~ft
~berpr~ft
Ein weitverbreitetes
aber auch zur Authentifikation
(password).
so
d.h.~ er mu~ dem System sagen,
wird
auch sicherer
verlangt werden.
Mit der
geht im System die Terminalidentifikation
Dem System wird so bekannt,
wo die Terminalsitzung
stattfindet.
Wenn man davon ausgehen kann, da~ die Kennworte und ihre Abspeicherung geheim bleiben,
so ist der Identifikationsproze~
Authentifikationsproze~. k~nnen besondere Sehr sicher,
Methoden
fur die Kennwortvergabe
aber auch aufwendig,
benutzt werden kann
(one time password).
zwischen dem berechtigten bleiben,
Identifikation
vertretbare
Infermationen
auszunutzen,
der Identifikationsprozedur ein weiteres
von Finderabdr~cken,
wird. zusammen-
die AnaFUr auch
Identifikations-
Bei einer Authentifikation
die nur der Person bekannt
und
sind
sind, die sich in
als solche ausgegeben hat. Man kann zum
Kennwort
"0ber-die-Schulter-gucken" Zweckm~ssiger
verschl@sselt
und Authentifikation
Verfahren sind getrennte notwendig.
(infiltra-
mu~ auf jeden Fall geheim
Stimme oder die Unterschriftskontrolle.
Authentifikationsprozesse
werden.
in den Terminalbetrieb
wenn sie nicht in einer Geheimschrift
wirtschaftlich
(2o) f@hren
Benutzer und dem System einschalten
fallen zu lassen, w~re die 0berpr~fung
Beispiel
Petersen und Turn
Die Kennworttabelle
Eine weitere M6glichkeit, lyse der maschinellen
werden.
das nur einmal
auch nicht vor solchen Eindring-
die sich mit einem Terminal
tion between the lines).
ein
wird,
eingef~hrt
ist ein Kennwort,
aus, da~ diese Art der Kennwortvergabe lingen sch@tzt,
auch gleichzeitig
Damit die Annahme realistischer
angeben m~ssen.
kann auch dieses
ist es deshalb,
Allein schon durch ein
Kennwort
allgemein bekannt
da~ das System dem Benutzer,
der
318
einen Zugang w~n~cht~ eine Frage stellt~ die nur dieser beantworten kann. Auf Vo~schlag von L. Earnest empfiehlt Hoffman (16, S. 92) wie folgt vorzugeheno
Beim "log-in" identifiziert der Benutzer
sich; er bekommt daraufhin vom System eine Pseudozufallszahl
ange-
boten, die wir x nennen wollen. Durch eine einfache Transformation T, die vom Benutzer im Kopf durchzuf@hren ist, wird eine Zahl y ermittelt. Das Ergebnis y = T(x) wird eingegeben.
Das System vollzieht
ebenfalls die Transformation T(x) und pr~ft, ob das Ergebnis tats~chlich y ist. Ein potentiel!er Eindringling kann hSchstens x und y sehen. Die Transformation T ist fur ihn kaum in Erfahrung zu bringen, wenn die Prozedur im Rechner geschQtzt ist. Die Prozedur ist aber gesch~tzt, da nur der Zugriff hat, der authentifiziert worden ist. Es kann f@r T z.B° die folgende Transformation vorgeschlagen werden, die kaum yon einem Dritten ermittelt werden kann: T(x) = ( ~ i - t e Ziffer yon x) 2 + (Stunde des Tages) i=ungerade Es werden also die Ziffern auf den ungeraden Stellen summiert. Die Summe wird quadriert und zur Stunde des Tages addiert. Die dargestellte Methode zur Authentifikation ist sehr einfach und wenig aufwendig.
Sie hat dar@ber hinaus den Vorteil, da~ die Kennwort-
tabelle jedermann bekannt sein kann, da sie zur Identifikation ben6tigt wird. Geheim bleiben mu~ lediglich T(x). Weitere Methoden zur Authentifikation werden yon Evans u.a. Purdy (21) vorgeschlagen.
(10) und
Beide Verfahren ~hneln sich sehr stark und
bauen auf Erkenntnissen auf, die innerhalb der Kryptographie schrift oder Chiffrekunde, kryptos=geheim)
(Geheim-
entwickelt wurden. Wegen
der gro~en Bedeutung der Kryptegraphie f~r die Sicherheit von DBSystemen werden wit in einem gesonderten Abschnitt auf diese Verfahren eingehen. Auf die erw~hnten Verfahren von Evans und Purdy, die auf der Methode der "Ein-Weg Chiffre"
(one way cipher) yon Wilkes
(27)
aufbauen, sell hier in diesem Rahmen nicht eingegangen werden. 2.2
Schichtungen,
Bereichsbildungen und Berechtigun~stabellen
Es gibt drei einfache Strukturen, um ein Sicherheitssystem zu organisieren. Es handelt sich um die Schichtung bildung oder Sektionierung der Zugriffsberechtigung
(stratification), die Bereichs-
(compartmentalization)
und die Anordnung
in einer Berechtigungstabelle
(authorization
taSle). Bei der Schichtung werden die Benutzer im Hinblick auf die Zugriffsberechtigung
im Sinne einer Hierarchie
in Gruppen eingeteilt.
Die
Schichten der Daten oder die Benutzergruppen erhalten z.B. von oben
319
nach unten die folgenden Bezeichnungen:
stren~ geheim
I. Schicht
geheim
2. Schicht
~treng vertraulich
3. Schicht
vertraulich
4. Schicht
nicht klassifiziert
5. Schicht
Eine Person, die z.B. Zugriff zu streng geheimen Daten hat, um diese zu sehen, zu l~schen oder zu ~ndern, hat auch Zugriff zu Daten in darunter liegenden Schichten. Kann eine Person hingegen nur zu streng vertraulichen Daten zugreifen, so bleiben die Schichten "streng geheim" und geheim" f@r sie unzug~nglich. Allgemein gilt: Eine Person darf nur zu den Daten der Schicht, f@r die sie klassifiziert wurde, und zu Daten in darunter liegenden Schichten zugreifen. Die Schichtung yon Personen und Daten aus Gr~nden der Sicherheit stammt aus dem milit~rischen Bereich.
In zivilen Sicherheitssystemen
ist diese Sicherheitsorganisation ungebr~uchlich. Aber auch im milit~rischen Bereich kombiniert man h~ufig die Schichtung mit der Bereichsbildung, die im Englischen "compartmentalization" heigt. Bei der Bereichsbildung werden die Daten in disjunktive Teilmengen zerlegt. Eine Teilmenge oder ein Bereich (Sektion) wird einer Person oder auch einer Personengruppe zugeordnet. Daten d@rfen nur genau einmal in einem "Bereich" vorhanden sein. Das Sicherheitssystem mug gew~hrleisten, da6 zwischen den Bereichen Sperren liegen, die nicht durchbrochen werden k6nnen. Martin (19,S.151) sieht die Bereichsbildung als eine vertikale Aufteilung der Daten. Die Schichtung wird yon ibm auch horizontale Aufteilung genannt.
I
z~ 4~ ©
In einem vertikalen Bereich sind bei Personengruppen auch horizontale Schichten denkbar. Diese Form wird h~ufig bei milit~rischen Sicherheitssystemen vorgefunden. Auch hier m6chte man, da6 eine
320
Person nur Zugang zu den Daten hat, die yon ihr auch wirklich gebraucht werden.
Friedman
@4,S.269)nennt die Bereichsbildung eine Um-
setzung des milit~rischen Postulats des "Need-To-Know".
Jeder soll
nur das wissen, was er wirklich benStigt. Eine sehr bekannte Anwendung der Bereichsbildung ist die speichergeschgtzte Aufteilung des Zentralspeichers
ffir Einzelprogramme.
beim Multiprogrammingbetrieb,
Das Betriebssystem gew~hrleistet
dag in einem Programm nicht der Speicher-
bereich eines anderen Programms adressiert werden kann. Unterstfitzt wird das Betriebssystem dabei hgufig hardwaremggig durch Begrenzungsregister
(base limit register).
Ein Register dieser Art nimmt eine
Basisadresse und die Bereichslgnge auf. Durch Vergleich der Programmadressen mit dem Registerinhalt kann eine Bereichsdberschreitung
ent-
deckt werden. Die dritte Form der einfachen Strukturen f~r ein Sicherheitssystem ist die Berechtigungstabelle. Kennwort, die Personal-Nr. tigung.
Die Tabelle enthglt das
und ein n-bit-langes Feld for die Berech-
Ist das i-te Bit eine I, so ist ein Zugriff zum Sicherungs-
objekt D i erlaubt, bei O wird der Zugriff verwehrt. auch hgufig Benutzerprofil
(user security profile)
Die Tabelle wird genannt. Der Nach-
teil ist, da~ die bin~re Regel "entweder Zugriff oder kein Zugriff" gilt. Bei Datenbanksystemen wird diese Tabelle hgufig als Matrix ausgebildet, wobei die Zeilen die Benutzer und die Spalten die Sicherungsobjekte darstellen.
Eine Berechtigungsmatrix soll an einem Beispiel
erkl~rt werden, da~ in einer ~hnlichen Form auch bei Conway u.a. zu finden ist. Wit gehen aus v o n d e r
Relation PERSONAL
GHT, LMB, VST), die auSer der Personal-Nr Personaldaten Leistung
(8, S.212)
(PNR, LSTG,
(PNR) die sehr sensitiven
(LSTG), Gehalt (GHT), letzter medizinischer
Befund (LMB) und Vorstrafen
(VST) enth~it. Die bereits behandelte
Kennworttabelle dient zur Zei!enidentifikation.
LSTG ! GHT
Kennwort 13 C 151
R
74 Q 028
R,~
Bemerkung
R~W
R,W
R
R,W
R
R
N
N
Personalchef Organis.-Chef
43 F 9 7 4
R
N
N
N
N
Programmierer
14 Z 234
R
N
N
R,W
N
Mediziner
28 R 862
N
R
R
R
R
Statistiker
R = Lesen erlaubt, W = Ver~ndern erlaubt N = Weder Lesen noch Ver~ndern erlaubt
321
Damit in einem System die Berechtigungsmatrizen entities nicht zu speicheraufwendig
for Mengen von
werden, wird empfohlen,
Zonen
und Kategorien zu bilden ( 19. S.6). Eine Zone ist dabei die Zusammenfassung mindestens zweier Mengen yon Sicherungsgegenst~nden. Aus den Mengen Verkaufsteil, Einkaufsteil und Fertigungsteil wird die Zone Teil. Eine Kategorie ist die Zusammenfassung mehrerer Attribute. Aus Kunden-Name, Wohnort und Umsatz kann die Kategorie "Kundeninformation" entstehen. Eine weitere Reduktion des Speicheraufwandes ist die Bildung yon Benutzergruppen. Alle Mitglieder einer Benutzergruppe haben genau gleiche Zugriffsrechte. Zwecks leichter sprachlicher Unterscheidung wird eine Benutzergruppe, die aus sicherungstechnischen Gr~nden gebildet wird, yon Friedman (14.S.269)"Clique" genannt. Zonen, Kategorien und Cliquen sind drei sehr einpr~gsame Begriffe. Gegen~ber den Schichtungen und Bereichsbildungen l~Bt die Berechtigungsmatrix schon die Darstellung von wesentlich subtileren Sicherheitsbedingungen zu. Die Sicherheitsbedingungen h~ngig sein. Eine tabellarische
d@rfen jedoch nicht wertab-
Darstellung in der Form des Daten-
schemas "Matrix" ist dann nur noch sehr schwer m6glich. einer sprachlichen gangen werden.
Es mu~ zu
Formulierung der Sicherheitsbedingungen
Eine Bedingung ist dann wertabh~ngig,
@berge-
wenn Attribut-
auspr~gungen zu ihrer Formulierung ben6tigt werden. Man kann wertabh~ngige Sicherheitsbedingungen yon beliebiger Komplexit~t angeben. Die Vorschriften des Datenschutzes verlangen h~ufig die Einhaltung nur einfacher wertabh~ngiger Bedingungen, wie z.B. die Person) nalsatzbedingung.Wertunabh~ngige Bedingungen k6nnen zur Obersetzungszeit, wertabh~ngige Bedingungen erst zur Ausf~hrungszeit fiberpr@ft werden. Wertabh~ngige Bedingungen sind sehr zeitaufwendig. 2.3 Umgehung der Sicherheitsvorkehrungen In diesem Abschnitt werden Methoden zur Umgehung der Sicherheitsma~nahmen beschrieben. Sicherheitsma~nahmen
Gleichzeitig wird die Frage behandelt, welche gebraucht werden, um mit Vorsatz arbeitende
Eindringlinge abzuwehren. Bei den Methoden der Eindringlinge handelt es sich um "Schurkereien", die den "naiven" und "rechtschaffenen" Benutzer sehr esotorisch anmuten. F@r viele Angriffe der Eindringlinge ist die Verschlfisselung der Datenbank im Sinne der Kryptographie eine wirkungsvolle Gegenma~nahme. Die Attacken und Verteidigungsma~nahmen auf ein DV-System sind in vorz@glicher Weise in dem viel beachteten ~) d.h. jeder darf nur seinen eigenen Personalstammsatz lesen.
322
Aufsatz yon Peterson und Turn
(20) dargestellt.
Die Ziele eines vors~tzlichen Eindringens sein: I) Gewinnung yon Information,
in ein DB-System k~nnen
2) Herausfinden, welches Informations-
interesse ein Benutzer hat, 3) ~ndern und Zerst6ren yon Information, 4) Kostenlose Nutzung von Resourcen des Systems oder Nutzung von Systemresourcen auf Kosten eines anderen. Von Peterson und Turn werden die Methoden
(2~ zum vors~tzlichen
Eindringen in das System in zwei Kategorien eingeteilt.
Es wird
yon passiver Infiltration gesprochen, wenn der Eindringling sich auf irgendeine Weise in das DV-System einschaltet, um zu wissen, was vor sich geht. Mine aktive Infiltration liegt dann vor, wenn der Eindringling entweder Systemressourcen nutzen will oder gezielt Informationen gewinnen, ~ndern oder zerst6ren will. Die Methoden der passiven Infiltration sind das Anzapfen yon 0bertragungsleitungen (wiretapping) vom System zum Terminal und das Anbringen von Sonden (electromagnetic pickups) Die 0bertragungsleitungen
in CPU und diversen Speichern.
gelten als der Teil des Gesamtsystems, der
am leichtesten ver!etzbar ist (20,S~291). Nach Peterson und Turn setzt man sich gegen diese beiden aufgefOhrten Angriffe am besten durch eine Verschl~sselung in eine Geheimschrift zur Wehr. Dem Eindringling wird dann aufgebOrdet, die Chiffre zu "knacken". Wenn der Aufwand
(work factor) zum Brechen der Chiffre gr6Ser ist als
der Wert der gewonnenen Information,
lohnen sich diese Angriffe nicht.
Diese Aussage ist sehr abstrakt, da zwar der Aufwand zur Codebrechung nicht aber der Wert der Information fur einen Eindringling abgesch~tzt werden kann. Im Hinblick auf die aktive Infiltration k6nnen die folgenden Methoden aufgez~hlt werden: 1)'~asquerading'.
Der Eindringling hat sich z.B. ~ber ein Anzapfen
der Leitung das Kennwort eines Benutzers besorgt und "maskiert" sich nun mit diesem. Durch VerschlOsseln kann verhindert werden, dan der Eindringling durch Anzapfen das Kennwort erf~hrt. Das VerschlOsseln und Entschl~sseln mug selbstverst~ndlich am Terminal stattfinden. 2) "Browsing"
(Schn@ffeln). Der Eindringling ist ein r e c h t m ~ i g e r
der den Identifikations- und Authentifikationsproze~
Benutzer,
erfolgreich
passiert. Er versucht jedoch Daten zu lesen oder zu ver~ndern,
zu
denen er nicht zugreifen darf. Eine gut funktionierende Zugriffskontrolle ist die beste Abwehr gegen diese Art der Infiltration.
323
3) In die 0bertragungsleitung
zwischen Benutzer und System wird vom
Eindringling ein eigenes Terminal eingebracht. WNhrend der rechtm~Sige Benutzer am Terminal sitzt, kann sich folgendes ereignen: a) Der Eindringling l~scht das "sign-.off" Kommando und f~hrt fort, im Namen des Benutzers sein Terminal zu bedienen. Dieser glaubt, da$ die Terminalsitzung beendet sei.
b) W~hrend da~ Terminal des rechtm~Bigen Benutzers inaktiv ist, schaltet sich der Eindringling ein, um mit der Datenbank zu arbeiten ("between the lines").
c) Der Ei~idringling sucht sich die spezielle Information aus dem Verkehr zwischen dem rechtm~Bigen Benutzer und dem System aus, ver~ndert diese und l~St die modifizierte, fehlerhafte Information zum Terminal des Benutzers 0bertragen.
("piggy-
back entr~'). Die VerschlNsselung is t fNr diese drei F~lle die beste Gegenwehr. 4) Diebstahl eines auswechselbaren Datentr~gers. Neben der physischen Absicherung durch speziell verschlie~bare R~ume ist auch bier die Verschl@sselung zu empfehlen. 5) Die Eindringlinge sind Systemprogrammierer mit Detailkenntnissen auf dem Gebiet des S~eicherschutzes, des Programmierens gierten Modus und des Betriebssystems. sind n a t u r g e m ~
die gef~hrlichsten.
im privile-
Eindringlinge dieses Typs
Sie k6nnen absichtlich undichte
Stellen fn Systemprogramme einbauen (trap doors) oder sich yon Zeit zur Zeit den Zentralspeicher herausdrucken lassen. Die Systeme sind heute so kompliziert,
da~ nur ein Team yon Eindringlingen
erfolgreich arbeiten kann, was einen gewissen Schutz darstellt, da ein ganzes Team sich nur in seltenen F~llen auf eine "Scburkerei" dieser Art e i n l ~ t .
Durch das Protokollieren gewisser Operationen,
wie zum Beispiel das Herausdrucken des Zentralspeichers, kbnnen nachtr~glich unzul~ssige Eingriffe bekannt werden. Peterson und Turn nennen diese Ma~nahmen "tNreat monitoring"; sie messen ihnen eine groSe Bedeutung 5ei. Besonders schwer zu erkennen sind Angriffe, die durch S~ftware-Modifikationen, 0bersetzern, Vor@be~setzern,
z.B. durch Nnderungen yon
Texteditoren etc. zustande kommen.
Da fas.t alle Programme durch andere Programme verarbeitet werden, stellt Bayer (I,S.78)
zu recht den Grundsatz auf : Kein Programm
i~t sicherer als diejenigen Programme, durch die es bearbeitet wird".
324
Systemprogrammierer mit Detailkenntnissen k~nnen auch die Methode des "eingepflanzten Satzes" benutzen, um die Chiffre schneller zu brechen. Der Eindringling bringt Klartextfragmentein
die Datei und
sp~rt dann die Chiffre zur Codebrechung auf. Insbesondere Bayer
(I)
hat auf diesen Vorgang aufmerksam gemacht. 2.4. Krypto~raphische Methoden Kryptographie
ist die Lehre yon der Erzeugung eines Geheimtextes
aus einem urspr~nglichen Text und v o n d e r
Wiedergewinnung eines
urspr@nglichen Textes aus einem Geheimtext. Chiffrieren;
Der erste Vorgang heist
sein Umkehrung wird Dechiffrieren genannt. Andere
Bezeichnungen f~r "urspr~nglichen Text" und "Geheimtext" sind "Klartext" und "Kryptogramm" oder "Chiffre". Eine Chiffre ist eine unverst~ndliche Folge yon Schriftzeichen.
Die Sicherheit eines Chiffrier-
verfahrens beurteilt man nach dem Widerstand oder Aufwand f@r den unberufenen Eindringling.
(work factor)
Im einem kryptographischen Code, der
keine Chiffre ist, k6nnen Teile der Schriftzeichenfolge yon einem Dritten zwar verstanden,
aber nicht richtig gedeutet werden. Es werden
~6rter und ganze Satzteile ziemlich willk~rlich ausgetauscht. Wie dieser Austausch durchzuf~hren ist, wird in einem W6rterbuch, das Codebuch geannt wird, festgehalten.
Bei kryptographischen Codes soll in der Re-
gel auch eine Datenkompression erzielt werden. Wegen des Speicheraufwandes~ der durch das Codebuch verursacht wird, kommen kryptographische Codes f@r DB-Systeme nicht in Betracht. Wir ben6tigen algorithmische Chiffrier- und Dechiffrierverfahren und keine tabellarischen.
Es k6nnen
drei Arten yon algorithmischen Verfahren unterschieden werden: a) Ersetzungsverfahren
(substitution methods), b) Versetzungsverfahren
(transposition methods) c) Block-Chiffrierverfahren Die Kryptographie hat eine !angeGeschichte.
(block cipher methods)
Liebende und Diebe haben
ihre Verbindungen immer so gut es eben ging verheimlicht, bemerkt Feistel
(13~S.21),um in scherzhafter Weise auf die vorwissenschaftliche
Kryptographie einzugehen. wurde die Kryptographie
Erst etwa Mitte des vergangenen Jahrhunderts
langsam zu einer Wissenschaft.
Der geheime Nach-
richtenaustausch bleibt jedoch bis tier in dieses Jahrhundert hinein auf Bleistift und Papier beschr~nkt.
Durch den Computer hat die Krypto-
graphie dann einen neuen, kaum erwarteten Aufschwung genommen. Alle historischen Anmerkungen,
die wir im Verlauf der Darstellung machen
werden, sind aus dem ber@hmten Buch von Kahn "The Codebreakers" a) Ersetzungsverfahren Bei diesem kryptographischen Ve~fahren wird
ein
Zeichen
(18).
325
des Klartextes setzt.
durch ein Zeichen
Im Gegensatz
Identit~t
zum einfacheren
eines Klartextzeichens
setzen gber eine Tabelle hang algebraisch stungsfRhigkeit
der Chiffre
Versetzungsverfahren
nicht erhalten.
0der algorithmisch,
durchf~hren.
haben die additiven kommen.
aus dem Alphabet
er-
bleibt die
Man kann das Er-
d,h.
in diesem Zusammen-
Innerhalb der Datenbank-Kryptographie
Substitutionsverfahren
bei gro~en Datenmengen
wegen ihrer hohen Lei-
eine besondere
Bedeutung be-
Von den additiven Verfahren wollen wir hier die Verfahren
"Addition modulo q" oder die Vign4re-Vernam-Chiffren Im frNhen
16. Jahrhundert
hat der Benediktinerm6nch
Nberhaupt
erste gedruckte
Buch @ber Kryptographie
Trithemus
beschreibt
Buchstaben
Trithemus
das
ver6ffentlicht.
in diesem Buch eine quadratische
als Elemente,
Matrix mit
deren Zeilen von oben nach unten jeweils
um eine Position versetzt werden.
Er benutzte
Schl@ssel
zum Chiffrieren
in einem additiven
Im sp~ten
16. Jahrhundert
wurde diese
wieder aufgegriffen
hervorheben.
und verbessert.
wird das im folgenden dargestellte genannt.
diese Matrix als Substitutionsverfahren.
Idee von Blaise de Vign~re
Historisch nicht ganz korrekt Verfahren
Vign@re-
Verfahren
Klartext .....
O
I
2
3
A
B
C
D .....
25 Z
O
A
A
B
©
D .....
Z
I
B
B
C
D
E
A
2
C
C
D
E
F ..... ,
25
Z
Z
Schl~sselmatrix
A
B
.....
o
.
B
°
°
.
,
.
°
.
°
,
,
.
.
.
°
°
o
.
C
.....
Y
f@r die Vign~re-Chiffre
Die Spaltenbezeichnungen Die Zeilenbezeichnungen
(A,B,C,
Klartext
wir als Chiffrezeichen
etc) gelten f@r den Klartext.
werden f@r den Schl@ssel
rezeichen wird im Schnittpunkt Wenn einem D i m
.
zwischen
ben6tigt.
Zeile und Spalte
ein B im Schl~ssel
E. Beim Dechiffrieren
Das Chiff-
gefunden.
gegen~bersteht,
so linden
geht man umgekehrt
vor.
326
Wir w o l l e n
"klassische"
an e i n e m B e i s p i e l Gegeben
Chiffrierung
mit der V i g n @ r e - M e t h o d e
demonstrieren:
sei der K l a r t e x t
:"KEIN VERPC4TER" u n d d e r S c h l ~ s s e l
"KAISERBALL" Klartext
: K E I N V E R R A E T E R
SchlOssel:
K A I S E R B A L L K A !
Chiffre:
U E Q F Z V S R L P D E Z
W e n n wir den B u c h s t a b e n wie
dasi~derABb,
prozess
A bis
geschehen
als A d d i t i o n
f i n d e n w i r dann,
Z die
ist,
modulo
Zahlen
26 a u f f a s s e n .
4 =
4 +
0 =
I + I =
8 +
8 = 16 = 16 rood 26 = Q
4 rood 26 = E
N + S = 13 + 18 = 31 =
+
I
=
17
+
8
Im v e r g a n g e n e n
25
5 rood 26 = F
25 m o d
=
Jahrhundert
ind6chiffrable". Chiffre
=
vorhanden
der S c h l O s s e l tor e r z e u g t Je l ~ n g e r
ist.
nur e i n m a l
wird,
=
Z
m a n die V i g n @ r e - C h i f f r e
~ 3 ), k a n n kein
Nicht
zeigen,
groSes
und d u r c h
periodischen
desto
"le c h i f f r e
d a 6 das B r e c h e n
Problem
zu b r e c h e n
benutzt
der k e i n e
der S c h l O s s e l ,
26
nannte
Tuckerman
mit C o m p u t e r m e t h o d e n
Geheimtext
Beispiel
zu m O s s e n :
26 = U
E + A =
R
25 z u o r d n e n ,
FUr das obige
ohne die M a t r i x b e n u t z e n
K + K = 10 + !0 = 20 = 20 m o d
0 bis
so k a n n man den C h i f f r i e r u n g s -
ist, w e n n
dieser
genOgend
ist ein G e h e i m t e x t , einen
wenn
Zufallszahlengenera-
Pseudozufallszahlen produziert.
schwieriger
wird
es, die C h i f f r e
zu
brechen. Der a m e r i k a n i s c h e Vign@re-Chiffre angewendet.
zum e r s t e n Mal
Vernam
zu c h i f f r i e r e n , und
Nachrichteningenieur
das
Gilbert
auf e i n e n d i g i t a l i s i e r t e n
sah sich v o r die A u f g a b e aus
in e i n e m D ~ g i t a l c o d e
25 = 32 Z e i c h e n dargeboten
s u c h u n g e n w a r eine A d d i t i o n
V e r n a m hat
modulo
gestellt,
Datenstrom
ein A l p h a b e t
:fOr F e r n s c h r e i b e r
wurde.
1917 die
Das E r g e b n i s
bestand,
seiner
Unter-
2.
Beispiel Dechiffrieren:
Chiffrieren: Klartext:
0 100
SchlSssel:
O ] 0 1 O'10
Chiffre:
I'I
1 I O O' I O I'
O O O I I'O ] O O I'
Chiffre:
O O O I 1 'O 1 O O I'
Schliissel:O Klartext:
I O I O'I
O ] O I'
O ] O O ]'J
1 J O O'
327
Die Addition modulo 2 und die logische Operation sind identisch. Das "kxklusive
Mit dem "Exklusiven
ODER" hat eine eindeutige
Inverse
rung spielt in der Datenbank-Kryptographie ist eine spezielle
"Exklusives
ODER"
ODER!' wird auch wieder dechiffriert. Die Vernam-Chiffrie-
eine besondere
Rolle.
Sie
Chiffre nach dem "Addition modulo q" Verfahren,
yon Tuckermann Vign@re-Vernam-System
oder abgek~rzt V-V-System
das
genannt
wird. b) Versetzungsverfahren Bei der Anwendung
yon Versetzungsverfahren
Zeichens
es ~ndert sich lediglich die Position.
gewM~rt,
bleibt die Identit~t
zieren z.B. eines der vielen Versetzungsverfahren,
des
Kinder prakti-
wenn sie ihren
Namen von hinten nach vorne hinschreiben. kine gebr~uchliche
Methode
f@r eine Versetzungstransformation
das Aufteilen des Klartextes ordnung
in einer Matrix.
wendet.
Der so entstandene
in BlScke mit einer anschlie~enden
An-
Auf die Matrix werden einige Operationen
ange-
Text wird dann wieder nach einer bestimm-
ten Regel in die lineare Form gebracht. leine angewendet
ist
zu wenig Sicherheit
Da Versetzungsverfahren
al-
bieten - eine H~ufigkeits-
analyse der Zeichen kann schon zum Ziele fUhren - wollen wir sie nicht genauer beschreiben. c) Block-Chiffrierverfahren kin Chiffrierverfahren, umwandelt,
das n Informationsbits
wird Block-Chiffrierverfahren
Transformation
in n Chiffrebits
genannt.
Die Bit-Vernam-
ist in diesem Sinne ein Block-Chiffrierverfahren.
man heute jedoch den Begriff "Block-Chiffrierverfahren" Zus~tze benutzt, Eigenschaften I~ krsetzen
dann denkt man an ein Verfahren,
Wenn
ohne weitere
das die folgenden
hat: (substitution)
weise hintereinander
und Versetzen
angewendet.
Prozessen eine "multiplikative" Chiffrierverfahren 2. Im Gegensatz
(transposition)
Da das "Hintereinanderschalten" Verkn~pfung
bedeutet,
auch Produkt-Chiffrierverfahren
zu den "Bit fur Bit" oder "Buchstaben
Ersetzungsverfahren,
genannt. fur Buchstaben"
Bits oder Buchstaben bestehen,
Block aus n Bits als Ganzes behandelt, anderen abh~ngig
von
werden Block-
bei denen in der Chiffre keine Abh~ngigkeiten
zwischen den einzelnen Symbolabh~ngigkeit
werden stufen-
wird der
ks liegt in der Chiffre eine
vor. Wenn ein Symbol in der Chiffre von einem ist, kann dutch einen 0bertragungsfehler
Symbol ein ganzer Block nicht mehr dechiffrierbar
sein.
bei einem
328
3.
Die Stufe
"Substitution"
ist eine nichtlineare
Auf diesen Eigenschaften
Transformation.
werden wir im folgenden noch genauer
eingehen. Block-Chiffrierverfahren
f~r die Rechneranwendung
sondere yon Feistel entwickelt sich durch gro~e Sicherheit zu brechen,
(13),
(12) und
aus, d.h. der Aufwand,
ist betr~chtlich.
(18). Zwischen den Weltkriegen
entwickelt,
zeichnen
eine Block-Chiffre wurden schon
unter dem Namen ADFGVX
wurden dann Chiffrier-Maschinen
die mit einem Pseudozufallschl@ssel
Rechneranwendungen
.Sie
Block-Chiffrierverfahren
yon der deutschen Armee im ersten Weltkrieg benutzt
wurden insbe-
(11)
arbeiteten.
Erst die
haben Mitre der 60-iger Jahre wieder das Interesse
f@r Block-Chiffrierverfahren Chiffrierverfahren
belebt(13,S.99)~ir
bier so darstellen,
werden die Block-
als w~rden die Transformationen
yon Get,ten besorgt. Wir beginnen mit der Beschreibung
der nichtlinearen
Substitution.
~eT~t S 2" "e,
n~3
2" =8
2
z Trens~orrnator: yen hoher
11o
4 Basis (8) in
7
~_
,l
Nichtlineare
J
Substitution
Es wird angenommen, gegeben wird.
nach Feistel
da~ der Klartext
addiert,
einen ganzen Eingabeziffernblock Das Substitutionsger~t
zwei Basistransformatoren. in der Darstellung
zu n = 3 Bits einge-
durch einen beliebigen Ausgabeziffern-
2 auf und verwandelt geht umgekehrt
bits gibt es 2 n Substitutionszeichen, durch Basistransformation
im wesentlichen
Der Eingabetransformator
zur Basis
ist auch als ein gutes Verfahren
aus
nimmt einen Block ihn in eine Oktal-
vor. F~r die n Eingabe-
die auf nichtlineare
gefunden werden.
Bevor eine Substitution
eine Null oder
sondern man substituiert
(Ger~t S) besteht
zahl. Der Ausgabetransformator
kannt.
in Bl~cken
Es werden nicht wie beim Vernam-Verfahren
eine Eins zu den Eingabeziffern block.
(13, S.25)
Weise
Die Basistransformation
zur Erzeugung yon hash-Adressen
im umgekehrten
Sinne stattfindet,
bewird
329
eine Versetzung vorgenommen, die man sich durch eine einfache Verdrahtung realisiert vorstellen kann. Eine m6gliche Verdrahtung wird in der Abbildung gezeigt. Imsgesamt gibt es 2ni = 8! = 40320 solcher Verdrahtungen
(Hardware) oder Tabellen
(Software). F~r n = 3 oder 4 ist ein
Substitutionsger~t mit einer beliebigen Verdrahtungsm~glichkeit
noch
zu realisieren. Durch Eingeben gewisser "Tricknachrichten" kann der Eindeingling jedoch die Chiffre brechen (13,S.26).Das ist nicht mehr der Fall, wenn z.B. n = 128 gew~hlt wird. Mit 128 Ein- und Ausg~ngen mdBte der Eindringling 2 1 2 8 ~ I O 3 8 verschiedene Bl6cke eingeben, um die Arbeitsweise des Ger~tes zu erforschen. Das ist nicht durchffihrbar. Diesem Vorteil steht der entscheidende Nachteil gegenOber, dab das Verfahren for, groBe n, z.B. f~r n = 128, technisch nicht realisiert werden kann. Dies ist die Ursache, die dazu fOhrte, dab man mehrere k leinere Substitutionsger~te,
z.B. mit n = 3, in einer Stufe parallel
anordnete. Da eine Substitutionss-tufe mit mehreren S-Ger~ten noch sehr leicht ~berwunden werden kann, wurde ein Versetzungsger~t nachgeschaltet. Da in einem Versetzungsger~t eine Permutation yon Symbolen vorgenommen wird, spricht man auch yon einem Permutationsger~t
(P-Ger~t).
Kaum zu brechen ist eine Chiffre dann, wenn mehrere P-Ger~te und S-Ger~ter hintereinander angeordnet werden. Das "Durcheinanderwirbeln" der Bits durch eine komplizierte Transformation H ist so groB, da~ eine Inversion ffir jemanden, der nicht eingeweiht ist, kaum noch nachvollzogen werden kann. Die Abbildung zeigt, wie aus einer einzigen I durch mehrmaliges nichtlineares Substituieren und Permutieren
(Versetzen)
eine "Lawine" von Ein~en entstehen kann. Bei der Dechiffrierung werden die Stufen in umgekehrter Richtung durchlaufen.
Hm das Block-Chiffrier-
verfahren f~r den Gegner noch schwieriger zu machen, k~nnen f~r die SGer~te jeweils andere Schl6ssel vorgegeben werden, so dab wir zwischen den Ger~ten $I, $2, .......... $20 unterscheiden k6nnen.
- -
P
__.
S
__,
!~
S
--b
~ S _ _
--
~
o-~ o ~
o--J o ~
~
-/ s
p
S
o ~ o
---P
0 O ~
Block-Chiffrieren nach Feistel (13, S.IOO)
S
.
~
1 Chiffre
330
3. Sicherheitsmodel!e
in Datenbanksxstemen
3.1 Deskriptive Modelle Relationale DB-Systeme unterscheiden
sich yon anderen DB-Systemen
dadurch, da~ mehrere Entwurfsebenen deutlich erkennbar sind. Es k6nnen drei Ebenen erw~hnt werden, die in einem relationalen DBSystem mindestens vorhanden sein mfissen: I. Die logische Ebene, 2. Die Ebene der Zugriffspfade, physischen Abspeicherung
3. Die Ebene der
auf den Ger~ten.
In DB-Systemen, die auf dem Relationenmodell der Daten basieren, wird davon ausgegangen, da~ Sicherheitsbedingungen zur Miniwelt des Benutzers gerh6ren und da~ ihre Formalisierung deskriptiv auf der logischen Ebene erfolgen mu~. Sicherheitsbedingungen und auch Integrit~tsbedingungen werden prinzipiell genauso behandelt wie Anfragen (queries) der Benutzer.
Sicherheits-
und Integrit~tsbedingungen
k6n-
nen sehr komplex sein° Es werden keine Sicherheitsorganisationen Form von Schichtungen,
Bereichsbildungen
in
oder Berechtigungstabellen
vorausgesetzt. Gegeben sei eine relationale Datenbank mit n Relationen RI, R2,...,R n. Mit Hilfe einer Datenmanipulationssprache (DML) mit dem Relationenmodell als Grundlage (etwa ALPHA ( 7 ) , SQUARE ( 3 ) oder SEQUEL ( 4 ) ) ist es m6glich, logische Bedingungen so zu formulieren, da~ das Ergebnis der Qualifikation
die Beantwortung
einer Anfrage
ist. In
ALPHA wurde die Beantwortung einer Anfrage Zielliste (target list) genannt. In Anlehnung an Chamberlin ( 5 ) und Boyce ( 2 ) wollen wir das Ergebnis der Anfrage "Sicht" (view) nennen. Eine Sicht ist eine Relation,
die nicht abspeichert,
sondern nur durch logische Bedingun-
gen definiert wird. "Views" sind virtuelle Relationen.
Wenn die Basis-
relationen (base relations) RI, R2... , R n ver~ndert werden, die tats~chlich zur Abspeicherung anstehen, so werden auch die abgeleiteten, virtuellen Reiationen einer ~nderung unte~orfen. Eine "Sicht" ist im Sinne yon Chamberlin ~ 5) ein dynamisches Ergebnis einer Anfrage, in der auch built-in Funktionen wie COUNT, SUM etc. benutzt werden dfirfen. Der Begriff "Sicht" stammt yon Boyce ( 2 ) . Er wird eingef~hrt, um insbesondere die Sprache SEQUEL auch als Datenbeschreibungssprache (Data Description Language, DDL) herauszustellen. Im Sinne einer DDL liegt nur dann eine vollst~ndige Sicht vor, wenn alle Beschreibungsparameter einer Basisrelation deklariert werden. Sichten im Sinne einer DDL sollen an den Basisrelationen LAGER_GUT
CNR, BEZ, MENGE, PREIS)
LIEFERUNG
(NR~ LNR, DATUM)
331
veranschaulicht werden. Es bedeuten: NR = Nummer des Lagergutes, LNR
=
Lieferanten-Nummer,
BEZ = Bezeichnung.
Die formale Beschreibung yon LAGER-GUT lautet: DEFINE LAGER GUT TABLE AS: NR(SCOPE=POSINT,REPR=DEC BEZ(SCOPE=ALPHA,
(6))
DOMAIN=NAME, REPR=CHAR(~)
MENGE(SCOPE=REAL,DOMAIN=SCHUETTGUT,UNITS=TONNE, REPR=FLOAT DEC (15,4) PREIS(SCOPE=REAL, DOMAIN=GELD, UNITS=DM PRO TONNE, REPR = FLOAT DEC (8,2) KEY=NR, ORDER=ASCENDING TNR INDEX,BEZ DEFINE LIEFERUNG TABLE AS: NR LIKE LAGER GUT.NR LNR LIKE NR EXCEPT(REPR=DEC
(8))
DATUM (SCOPE=POSINT, REPR=DEC
(3))
KEY NR,LNR ORDER=DESCENDING NR, ASCENDING LNR Eine Tabelle wird zun~chst duTch ihren Tabellennamen, die Namen der Spalten und - wenn notwendig - duTch die Ordnung der Zeilen beschrieben. Eine Spalte (Attribut) kann kenntlich gemacht werden duTch einen Namen, einen WerteSereich eine Vergleichbarkeit
(SCOPE) z.B. positive ganze Zahl (POSINT),
(comparability DOMAIN), die aussagt, ob zwei
Werte vergleichbar sind, eine Ma~einheit Darstellung
(UNITS) z.B. Tonne und eine
(REPR, representation). Die Begriffe SCOPE,UNITS und
REPR erkl~ren sich selbst.
Zu bemerken ist nuT, da~ gewisse Standard-
auspr~gungen w ie POSINT, REAL etc. fur SCOPE und etwa FIXED BINARY, DECIMAL etc. f@r REPR bereitgestellt werden sollten. Was die Vergleichbarkeit anSetrifft,
so sind zwei Werte nut dann vergleichbar, wenn sie
aus Spalten stammen, fur die der Parameter DOMAIN gleich ist. "Sch~ttgQte~" kSnnen nut mit "SchQttgfitern" und "Geld" kann nur mit "Geld" verglichen werden~
In~besondere dann, wenn mit zwei Relationen ein Ver-
bund gebildet werden soll, spielt die Vergleichbarkeit eine gro6e Rolle. Zur Veranschaulichung des Begriffes "Sicht" im Hinblick auf Sicherheitsbedi~gungen wird im folgenden ein Beispiel in der Sprache SEQUEL dargestellt: Aus der Basisrelation "LAGER GUT" soll die Sicht "VERKAUFS GUT" entwickelt werden. Efn Verkaufsgut unterscheidet sich dabei yon einem Lagergut dadurch, da6 das S~h~ttgut in S~cke abgepackt und nach St~cken
332
gez~hlt wird. Die Sicht "VERKAUFS_GUT" enth~it das Lagergut in einem verkaufsf~higen~
abgepackten Zustand. Ein Stfick, d.h. ein
Sack soll I/]00 Tonnen wiegen. DEFINE VERKAUFS GUT TABLE AS: LIKE LAGER GUT EXCEPT MENGE.UNITS=STUECK,
(MENGE. DOMAIN=SAECKE,
PREIS.UNITS=DM PRO STUECK)
Ober die folgende Deklaration wird dem System die Umrechnung Tonnen in StNck mitgeteilt. DEFINE CONVERT
(TONNE TO STUECK):
]/]O0 TONNE CONVERT kann als Umrechnungsroutine
aufgefa~t werden. Um die Um-
rechnung selber braucht sich der Benutzer der Sicht "VERKAUFS_GUT" nicht zu kfimmern. Um die abgeleitete Sicht "VERKAUFS GUT" zu einer Sicherheitsbedingung
zu vervollst~ndigen, mu~ gekl~rt werden, was
dem Benutzer einer Sicht alles erlaubt ist. Wit stellen uns dabei vor, da$ wit der Eigentfimer der Basisrelation LAGER_GUT sind und volle Verf0gungsgewalt 0ber diese Relation haben, Der Begriff "Eigentfimer" wird in diesem Sinne definiert.
Chamberlin u.a.
(5) schlagen
nun die folgenden Verfflgungsrechte vor: I) GRANT (Gew~hren): Hiermit wird verffigt, da~ der Benutzer der abgeleiteten Sicht diese Sicht jedem beliebigen anderen Benutzer zeigen darf. Anders ausgedrfickt: Die Weitervergabe der Leseerlaubnis wird gew~hrt. 2) REVOKE
(Widerrufen): Die Verf~gung Nber die Sicht wird wider-
rufen. 3) DESTROY
(Zerst~ren): Hiermit wird die Erlaubnis
zum ZerstOren
der Sicht erteilt. 4) INSERT (Einf6gen): Es wird zugestanden, Tupeln in die virtuelle Relation
(Sicht) einzuffigen.
5) DELETE (L~schen): Es dfirfen Tupeln gelOscht werden. 6) UPDATE
(Modifizieren): Attribute dfirfen ver~ndert werden.
Eine Sicht und Verffigungsrechte machen eine Sicherheitsbedingung aus. Dem Benutzer mit der Nr. X sollen die folgenden Rechte zugestanden werden: GRANT VERKAUFS GUT TO BENUTZER.BNR = 'X' (GRANT='NO'
REVOKE = 'NO'
DESTROY = 'NO'
INSERT='NO', DELETE = 'NO', PREIS.UPDATE='YES ')
333
Wir wollen die folgenden Merkmale
fur Sichten und Sicherheitsbedingungen
herausstellen: I. Verschiedene
Sichten k6nnen hierarchisch
Geschlechtern"
aufgebaut werden.
von "Sohn-Sichten". die Basisrelation,
die dem Eigent@mer
m~ssen
der Grundsatz
nur beim "Ur-Vater"
2. Die Verf~gungen
"Vater-Sichten"
geh6rt. Alle ~nderungen
werden in den diversen
Es wurde bereits
im allgemeinen
Wir unterscheiden
von
An der Wurzel des Baumes steht der "Ur-Vater",
einer "Vater-Sicht" sichtigt.
wie "Generationen
'Sohn-Sichten'
aufgestellt,
~nderungen
vorzunehmen.
oder Berechtigungen,
die einer "Sohn-Sicht"
zustehen,
immer im Umfang kleiner oder gleich sein dem Umfang,
"Vater-Sicht"
hat. Die "Sohn-Sicht"
duzlert werden k6nnen.
in
ber@ck-
den die
mu~ aus der "Vater-Sicht"
(Nemo plus juris transferre
potest,
pro-
quam
ipse habet) . 3~ Beim Widerrufen
einer Sicht werden die zugeh~rigen
"Sohn-Sichten"
zerst~rt. Aus der Sicht VERKAUFS_GUT
soll in einem weiteren
Benutzer Y eine Sicht entwickelt
werden,
Beispiel
fur den
damit er nur die Felder
NR und BEZ les.en kann. Wit s chreiben
in der Sprache SEQUEL:
DEFINE LI~STE FOR YI TABLE AS: S'ELECT NR, BEZ FROM VERKAUFS
GUT
Es folgt dann: GRANT LISTE FOR Y I TO BENUTZER.BNR
= 'Y'
(GRANT = 'NO', REVOKE = 'NO', DESTROY = 'NO' INSERT = 'NO', DELETE 3.2 Prozedural
'NO', UPDATE = 'NO')
Modelle
Bevor wir zu einer kurzen Darstellung ~bergehen,
wie sie im DBTG-Report
yon prozeduralen
(6)
und bei Hoffman
Modellen (16) aus~uhrlicher
zu finden sind, verweisen wir hier auf die Sicherheitsmerkmale
des
Systems
gehen
IMS (siehe
(26)). Alle Zugriffe
~ber einen zugeordneten Subschema
beschreibt.
Sicherheitsstufe
her eine besondere
Es kann yon einem Programm nur zu Daten
die durch ein PCB sensitiv
Programm darf zweitens nur solche Operationen PROC-OPTIONS
Block), der ein
Damit wird yon der Architektur
vorgesehen.
zugegriffen werden,
zu einer IMS-Datenbank
PCB (Program Communication
des PCB definiert wurden.
gemacht wurden. ausf~hren,
Die kleinste
Das
die im Felde
Sicherungseinheit
334
ist ein Segment~
Eine Berechtigungsmatrix
~ber PROC-OPTIONS
implementiert
werden.
kann mit Hilfe des IMS
Weitere
Sicherungsm~glich-
keiten bietet das IMS an, wenn der Datenkommunikationsteil liert wird.
Ms kann dann z.B. spezifiziert
werden,
instal-
da~ Programme
hei Angabe von Kennworten
aufgerufe n werden darfen und dab auch
ein Kennwort
ist, um an Terminalen
benutzen FUr
er£orderlich
nut
gewisse Kommandos
zu d@rfen.
Datenbanksysteme
Hoffman entwickelte
sind das System DBTG "Formulary
Model"(16)
(6)
und das yon
zwei wichtige
Repr~sen-
tanten far Sicherheitsmodelle,
in denen die Sicherheitsbedingungen
yon einer Zentralstelle
in Sicherheitsprozeduren
"Formularies" grammierer
genannt)
direkt
umgesetzt
eine Tr~gersprache,
Wit wollen uns in unserer
werden mSssen.
(von Hoffman
Dabei steht dem Pro-
etwa PL/] oder COBOL zur Verfagung.
Darstellung
auf die wichtigsten
Merkmale
im System DBTG beschr~nken. In der Sprache Dateneinheit
des DBTG ist die kleinste~
ei~ "data-item".
ein "data-item"
nicht weiter
aufl~sbare
Ein Name und eine Auspr~gung
aus. Mehrere
"data-items"
machen
mit einem gemeinsamen
Namen sind ein "data-aggregate".
Mehrere
tes" bilden einen "data-record",
der wie alle Einheiten bezeichnet
sein mu~.
Die wesentliche
DBTG i st der "set". (Vatersatz) pr~gung
Struktur
oder "aggrega-
im Datenmodell
und solche,
die "member"
(Sohnsatz)
(set occurrence)
besteht
hei~en.
"member"
einem oder mehreren"member
records".
In einem "set" ist ein "record" Typ entweder
(aber nicht beides)•
in mehreren
"sets" sein.
Eine weitere
Speicherbereich "areas"
zerlegt.
yore mathematischen
Organisationseinheit
Ein "schema"
Benutzer ausgew~hlte
formalisierte
ist grob gesprochen
eines "schema".
Das "sub-schema"
eine vom
definiert
durch einen PCB festgelegt
ist auch im DBTG dutch das "sub-schema"
Beschrei-
Ein Programm kann nur
die in einem "sub-schema"
einer ersten Stufe gew$[hrleistet.
"set"
Der gesamte
wird in eine Anzahl yon bezeichneten
Ein "sub-schema"
Wie im I]MS, in dem das "sub-schema"
darstellbar.
Begriff
ist eine "area".
enth~it die gesamte
Untermenge
zu solchen Daten zugreifen,
beschriebeno
"owner" oder
Damit sind dann Netzwerkstrukturen
einer DBTG-Datenbank
bung einer Datenbank.
Ein "member (Integrit~ts-
Ein "member record" kann "member record"
Ein "set" ist streng zu unterscheiden (Menge).
Eine Aus-
aus genau einem "owner
record" kann ohne einen "owner record" nicht existieren bedingung).
des
In einem set gibt es solche S~tze, die "owner"
eines "set"
record" mit keinem,
hierarchische
"data-items"
eine gewisse Sicherheit
sind. wird, in
Die zweite Stufe wird im folgenden
]
335
Sicht -Konzept
aufgefa~t werden.
Das System DBTG unteystNtzt alle angef~hrten "schema".
Sicherheitsprozeduren
Organisationseinheiten
Um nur das Wesentliche
auf das "record-Niveau"
im Hinblick auf
vom "data-item"
hier darzustellen,
bis zum
werden wir uns
beschr~nken.
Vom DBTG werden zwei Klauseln bereitgestellt:
Zu einem PRIVACY
LOCK im "schema" und zum anderen ein PRIVACY KEY im Programm des Benutzers.
Mit Hilfe der Datenbeschreibungssprache
"schema" kann ein "Schlo~" werden,
das mit Hilfe eines "SchlNssels"
in der Datenmanipulationssprache einfach~ten
Fallen wird eine Prozedur
aufgefa~t
ein Kennwort,
aufgerufen.
als Kennwort
COBOL als Tr~gersprache 'KAIS~RBALL'
Namen PERSONAL Schema
zun~chst den Fall
ist. In der Sprache
soll im folgenden das Programm des Benutzers
verfNgt,
m6ge lauten:
darf einen beliebigen
Nur wer @ber das Satz mit dem
l@schen: :
RECORD NAME I S PERSONAL i PRIVACY LOCK FOR DELETE I S
Pro~ramm:
Im
in komplizierteren
Wir stellen
aufzufassen
sein. Die Sicherheitsbedingung
Kennwort
(PRIVACY KEY), formuliert
(DML), geSffnet werden kann.
Fall i~t der Schl~ssel
dar, da~ der Schl~ssel
(DDL) f@r das
(PRIVACY LOCK) vor die Daten "geh~ngt"
'KAISERBALL'
IDENTIFICATION DIVISION
PRIVACY KEY FOR DELETE OF PERSONAL RECORD I S
'KAISERBALL'
PROCEDURE DIVISION
DELETE PERSONAL Das DBTG sorgt darer,
da~ die Zeichenkette
PRIVACY LOCK mit der Zeichenkette
'KAISERBALL'
KEY verglichen wird.
Bei Gleichheit
tion DELETE PERSONAL
im Hinblick
erlaubt.
Bei Ungleichheit
auf einen beliebigen
der Zeichenkette
Wir kommen nun zu dem komplizierteren
in der Klausel
in der Klausel PRIVACY
der Zeichenketten
drNckt und ein Fehlerstatus-Anzeiger
AusfShrung
'KAISERBALL'
wir die OperaSatz PERSONAL
wird die Operation unter-
gesetzt. Fall, der dann vorliegt,
einer Operation nicht yon einem Kennwort
wenn die
sondern yon Bedin-
336
gungen abhRngt°
Es soll von dem folgenden
Ein Personalleiter
Beispiel
wenn 1. der Inhalt des Feldes Gehalt kleiner
Die erste Bedingung
Schema:
mit dem Namen DELTA realisiert.
wird angenommen,
der I. Bedingung
gleich
15 ist.
sei in der Prozedur mit dem Namen GAMMA und die
in der Prozedur
den Anweisungen
werden:
als 10.OOO ist oder
wenn 2. der Inhalt des Feldes Abteilungs-Nummer
zweite
ausgegangen
darf SRtze mit dem Namen PERSONAL dann 16schen~
da~ der Personalleiter
In den folgennur aufgrund
einen Zugriff wiinscht.
RECORD NAME IS PERSONAL PRIVACY LOCK FOR DELETE Z
IS PROCEDURE
GAMMA OR PROCEDUgE
DELTA
Programn: IDENTIFICATION DIVISION Z PRIVACY KEY FOR DELETE OF PERSONAL RECORD IS PROCEDURE GAMMA PROCEDURE DIVISION
DELETE PERSONAL Mehrere
PRIVACY LOCKS k6nnen dutch ein OR zusammen
definiert gepr~ft. DBTG.
werden.
Die Prozedur Nbergibt
Die Prozeduren
Hoffman
den Parameter
selber werden
'Ja' oder
im DBTG nicht weiter
'Nein' dem spezifiziert.
(16) jedoch gibt die Struktur einiger Sicherheitsprozeduren
Man kann davon ausgehen, Aktionen
in einer Anweisung
In einer Prozedur wird die Zugriffsberechtigung
im Sinne des vorherigen
Abschnitts
eine Prozedur
fNgung gestellt werden mu~. Bei einem komplizierten mit vielen verwickelten
Sicherheitsbedingungen
eine beachtliche
aufgeb~rdet,
w~hlt wird.
Arbeit
an.
daS fNr jede Sicht mit den dazugeh6rigen zur Ver-
Sicherheitssystem
wird der Installation
wenn die prozedurale
L6sung ge-
337
Literaturverzeichnis
I)
Bayer, R. und Metzger, J. U.: On the Encipherment of Search Trees and Random Process Files, Institut f~r Informatik, TU M~nchen, M~rz 1975.
2)
Boyce, R. F. und Chamberlin, D. D.: Using a structured English query language as a data definition facility, IBM Research Report, Rj 1318, San Jos~, Dec. 10, 1973.
3)
Boyce, R. F. u. a.: Specifying Queries as a Relational Expression: SQUARE, in: Proc. ACM SIGPLANSIGIR Interface Meetings Gaitherburg, Maryland, Nov. 4-6, 1973.
4)
Chamberlin, D. D. u. a.: A Structured English Query Language, in: Proc. ACM SIGFIDET Workshop on Data Description Access and Control, Ann Arbor, Mich., May I-3, 1974.
s)
Chamberlin, D. D., Gray, J. M und Traiger, I. L.: Views, Authorization and Locking in a Relational Data Base System, IBM Research Report, Rj 1486, Sam Jos~, Dec. 19, 1974.
6)
CODASYL DATA BASE TASK GROUP (DBTG) REPORT, April 1971, erh~itlich bei IFIP Administrative Data Processing Group, 40 Paulus Potterstraat, Amsterdam.
7)
Codd, E. F.: A data base sublanguage founded on the relational calculas, in: 1 9 7 1 A C M S~GFIDET W~rkshop on Data Description, Access and Control, San Diego, Nov. 11, 1971, S. 35-68.
8)
Conway, R. W., Maxwell, W. L. and Morgan, H. L.: On the Implementation of Security Measures in Information Systems, in: Com. ACM, Vol. 15 (1972), No. 4, S. 211-220.
9)
Date, C. J.: An Introduction to Data Base System, Addison Wesley, Reading (Mass.), 1975.
10) Evans, A. und Kantrowitz, W.: A User Authentication Scheme not requiring Secrecy in the Computer, in: Com~ ACM, Vol. 17 (1974), No. 8, S. 437-442. 11) Feistel, H., Notz, W. A. und Smith, J. L~: Cryptographic techniques for machine to machine data communication, IBM Research Report, RC 3663, Yorktown Heights, Dec. 27, 1971. 12) Feistel, H.: Cryptographic coding for data-bank privacy, search Report, RC 2827, Y~rktown Heights, 1970.
~BM Re-
13) Feistel, H.: Chiffriermethoden und Datenschutz, in: ~BM Nachrichten, Teil I, 24. Jg. (1974), Heft 219, S. 21-26. Teil 2, 24. Jg. (1974), Heft 220, S. 99-102. Obersetzung aus dem Englischen: Feistel, H., Cryptography and computer privacy, in: Scientific American, Vol. 228 (1973), No. 5, S. 15-23.
338
14) Friedmann, T. D.: The authorization problem in shared files, in: IBM Systems Journal, Vol. 9 (1970), No. 4, S. 258-280. 15) Hentschel, B., Gliss, H.~ Bayer, R. und Dierstein, B.: Datenschutzfibel, Verlag J. P. Bachem, KSln 1974. 16) Hoffman, L. J.: Computer and Privacy, A Survey, in: Computing Surveys, Vol. I (1969), No. 2, S. 85-103. 17) IBM-Brosch~re: The Consideration of Physical Security in a Computer Environment, Oktober 1972, Fr. Nr. 6520-2700-0. 18) Kahn, D.: The Codebrakers, McMillan, New York, 1967. 19) Martin, J.: Security, Accuracy and Privacy, Prentice Hall, Englewood Cliffs, 1973. 20) Petersen, H. E. und Turn, R.: System Implication of Information Privacy, in: AFIPS Conf. Proc., Vol. 30 (1967), SJCC, Thompson Book, New York, S. 291-3OO. 21) Purdy, G. B.: A High Security Log-in Procedure, in: Com. ACM, Vol. 17 (1974), No. 8, S. 442-445. 22) Stonebraker, M. und Wong, E.: Access Control in a Relational Data Base Management System by Query Modification. University of California (Berkeley) Research Report ERL-M438, 14 May, 1974. 23) Tuckerman, B.: A Study of the Vigen@re-Vernam Single and Multiple Loop Enciphering Systems, IBM Research Report, RC 2879, Yorktown Heights, May 14, 1970. 24) Turn, R.: Privacy Transformation for Databank Systems, Rand Corporation, Forschungsbericht f~r die National Science Foundation, AD-761563, March 1973, ver~ffentlicht auch in: AFIPS Conf. Proc. Vol 42 (1973), S. 589-601. 25) Turn, R.: Privacy and Security in Personal Information Databank Systems, (Prepared for the National Science Foundation), Rand Corporation, R-IO44-NSF, March 1974. 26) Wedekind, H. und H~rder, Th : Datenbanksysteme II, Bibl.iographisches rnstitut, Mannheim, i975. (noch unver6ffentlicht) 27) Wilkes, M. V.: Time-Sharing Betriebe bei digitalen Rechenanlagen [Obersetzung aus dem Englischen), Carl-Hanser Verlag, M~nchen, 1970. 28) Scherf, J.A~: Computer and Data Security, A Comprehensive Annotated Bibliography, MIT Project MAC, January 1974
On the Integrity of Data Bases and Resource Locking Rudolf Bayer, Technische Universit~t M~nchen
Abstract The problem of providing operational integrity of data bases as opposed to operating systems is discussed. Techniques of resource locking, mainly individual object locking and predicate locking, are surveyed, improved, and unified. An efficient on-line transitive closure algorithm for deadlock discovery is presented and analyzed. Several strategies for preventing indefinite delay of transactions are proposed. Phantoms and the need for predicate locking are surveyed and reconsidered. Several strategies for handling phantoms are proposed:
one without predi-
cate locking and two in ~aich predicate locking is needed for writing transactions only, and in which individual object locking sufficies for pure readers.
!.
INTRODUCTION
PrQviding data base integrity means to guarantee the correctness of the data (more precisely their accuracy, consistency, and timeliness) through 1) the proper operation ~f the hardware, 2) the proper operation of the software, as well as 3) the proper use of the system. This paper only covers part of the software aspect of integrity. The problem of guarding data bases against hardware failures has been covered extensively by M.~. Wilkes EWil 72]. Proper use of the system is mainly concerned with quality control in data acquisition and with prevention of accidental or mieschievous misuse, i.e. with the security of computer systems.
3~
As opposed to many other computing environments~
data bases give rise
to especially high integrity requirements for at least the following reasons: i) Longevity~
Even rare errors will in the long run lead to a certain
contamination and degradation of the quality of a data base. pletely purging ~rroneous data and all their consequences
Com-
from a
data base is difficult. 2) Limited repeatability: covered,
Even if data or processing errors are dis-
it may be impossible
due to time constraints, unavailability
or useless to rectify the situation
unavailability
of the correct source data,
of a correct system state preceding the fault.
3) The need for immediate and permanent practice o~ten used elsewhere,
availability:
This prevents a
namely running a program and then
checking by careful inspection and analysis whether the result is or at least "looks ~' right, correcting and rerunning the program otherwise° 4) Multiaccess:
Data bases are manipulated by many users with probably
quite different quality standards.
It is infeasible to completely
entrust the quality control to these users and difficult
to track
the source and the proliteration of errors.
II.
SEMANTIC AND OPERATIONAL INTEGRITY
We wish to distinguish between semantic and operational integrity of data bases: By semantic intesritv we mean the compliance of the data base contents with constraints data. Semantic
derived from our knowledge about the meaning of the
integrity might be enforced by allowing on certain data
only a limited set of precisely
specified meaningful operations, by
adopting a set of programming and interaction conventions, by dynamically checking the results of updates,
or by proving for each program
manipulating the data base, that the semantic integrity constraints are satisfied. Little is known about how to describe, such semantic integrity constraints.
to enforce~ and to implement
Still we believe,
that semantic
integrity is of a much more basic nature than operational integrity, and that a better understanding of semantic integrity would greatly
341
help the solution of other integrity problems
as well.
been described
integrity via the defi-
in [Bay 74] to obtain semantic
nition of "aggregates" of a set of carefully
which limit the processing designed
operations
An approach has
of data to the use
directly
associated with the
data. Operational action"
Integrity:
For the purpose
for external data base manipulation. primitive
"actions".
operating
A transaction
is a sequence
arising from the activity
ahd
of more
of the
system:
i) the effort to schedule transactions far as possible
[EGLT 74],
individual
data objects
3) the induced problems lock discovery, sources
and of preemption
INTEGRITY
integrity there is at least a brute force,
solution for operational
of transactions.
integrity,
completely
namely to avoid
and to sequence
This is unsatisfactory
have been developed
for data base applications.
Presequence
Processes:
must be presequenced base transactions will be needed.
systems.
why they are not
As usual in this field we use
as the analogon for "transaction".
is adopted from G.C. Everest
in time
for many reasons,
for use in operating
We will survey these solutions briefly and indicate, satisfactory
of dead-
of ~e-
to resolve deadlocks.
SYSTEMS AND OPERATIONAL
and better solutions
or
and to
of deadlock among locking transactions,
parallelism between transactions the execution
sets of data objects
or shared use by a transaction
of deadlock prevention,
As opposed to semantic straightforward
in particular
in parallel as
[CBT 74],
accordingly,
from transactions
OPERATING
[KiC 73],
(also called'~ecord~' in [CBT 74] and'~ntitieg'
in [EGLT 74], for exclusive lock those resources
to be processed
[Eve 74],
2) the need to acquire resources,
"process"
let a "trans-
for scheduling purposes
Most work to date concerned with integrity has been
limited to those integrity problems
ili.
of this discussion
[EGLT 74] be the unit of processing
The list of techniques
[Eve 74]:
Processes
potentially
competing for resources
and must execute one after the other. For data
it is often not known a priori, which data resources
This means that any two transactions
competing and must be sequenced.
As a consequence,
will be potentially
no parallelism
is
342
possible and we have the unsatisfactory brute force method mentioned before.
Still presequencing transactions,
may be useful for other purposes~
e.g. through time-stamping,
like preventing indefinite delay of
transactions by introducing an aging mechanism to increase the prior~ ities of transactions. ~reempt Processes:
This technique relies on discovering deadlocks after
they have occurred.
It then terminates
one of the processes
(or backs up to an earlier state)
involved in the deadlock,
the resources
locked by
that process are freed. As we shall see s this technique plays an important role in data base locking,
too, butthere
its application is
much more difficult due to the large number of transactions
and re~
sources involved. This makes deadlock discovery and preemption quite complicated and expensive~ ~Fegrder all System Resources: their resources
The processes are then required to claim
according to such a total order.
It has been Shown~
that more general than linear orders, e.g. hierarchical sufficient
to support a deadlock-free
locking strategy
orders, are [Ram 7~]. In
data bases the resources are data objects, which often do not have such an natural order. Furthermore
a process might not be able to claim re-
sources according to such an order, data dependent
[EGLT 74],
[CBT 74].
Preclaim needed Resources: claim all the resources
since his needed resources might be
Before starting to execute,
a process has to
it will ever need. Typically they are specified
on the control cards preceding a job or job-step,
and the process is
not started until the operating system has granted to it all the requested resources~
This is probably the most common technique for assign-
ing non-sharable resources. In a data base environment this technique requires considerabl~ modifications
to become feasibles
Claiming resources may itself be a com-
plicated and lengthy task requiring searching through large areas of a data base. These searches should run concurrently if possible. Deadlock Prevention Algorithms:
They often rely on too special proper~
ties of resources - like Habermann's banker's algorithm
[Hab 69] - or
on too special models of computation ~ like Schroff's algorithm to be generally applicable here.
[Sch 74]~
343
IV.
THE CHAMBERLIN,
BOYCE, TRAIGER METHOD
In [CBT 74] a technique is proposed to provide operational integrity for data bases. The technique can be considered as a modification and combination of several methods described in section III. Integrity of the data base must be guaranteed at the beginning and again at the end of a transaction,
it may be - and generally must be - violated by the
single actions. Due to the potential interference of two or more transactions executing in parallel, transactions must lock certain parts of the data base for exclusive or shared use. The scheme proposed in [CBT 74] therefore requires each transaction to lock all its resources (parts of a data base, e.g. individual records or fields of records) during a so-called "seize phase" before starting the "execution phase". During the seize phase the data base must not be modified by the seizing transaction and therefore i) preemption of locked resources from a transaction still in its seize phase is feasible,
and
2) backing a transaction in its seize phase up to wait for the preempted resource is rather easy. Once a transaction has started its execution phase, it is not allowed to claim more resources,
thus no backup will be necessary.
At the end of
an execution phase a transaction must free all its resources before starting a new seize phase. The seize phase may be a rather complicated task, thus seize phases of transactions
should be run in parallel. This raises the deadlock problem
again as usual: Let tl, t2 be two transactions, source rl already locked by tl must wait
t2 trying to seize re-
until rl is freed by tl. But
since resources are not locked in any particular order, tl may wish to lock first rl, then r2. If tl successfully seizes rl and t~ successfully seizes r2, then a deadlock has occurred.
Such deadlocks must be dis-
covered and a resource must be preempted from a transaction involved in the deadlock,
say r2 from t2, causing t2 to wait for tl on r2.
In [CBT 74] an aging mechanism is attached to transactions to avoid dead~ lock due to indefinite delay of transactions.
It is then shown in [CBT 74]
that the scheme described is deadlock-free in the sense, that each transaction will eventually be processed. per algorithms seize phases,
This requires,
of course, the pro-
for discovery of deadlocks between transactions in their~ for preemption or resources,
and for backing up trans~
344
actions
to certain points within
It is now clear~ that fication
their seize phases.
the scheme proposed
and combination
in [CBT 74] is a shrewd modi-
of the following:
1) Try to preclaim needed resources. 2) If 1) would
lead to deadlock,
3) S u p e r i m p o s e
a presequencing
timestamping
The deadlock discovery really applicable, at most
Fig.
a l g o r i t h m mentioned
since it requires
t! may be waiting
many transactions
The resource
as useful
to release resources.
in [CBT 74] is not tl may wait for
In the CBT-scheme,
to be released by arbitrarily
... twk as the result
of arbitrarily
many
from t1:
tl w a i t i n g for other transactions.
state of a t r a n s a c t i o n
A i = {ril,
CLOSURE ALGORITHM
that a t r a n s a c t i o n
for resources
tw1~ tw2,
of resources
1: T r a n s a c t i o n
- e.g. through
and to avoid deadlock
AND AN 0N-LINE T R A N S I T I V E
one other t r a n s a c t i o n
preemptions
scheme for transactions
delay of transactions.
SOME M O D I F I C A T I O N S
however,
resources.
- to enforce an aging m e c h a n i s m
due to indefinite
V.
preempt
~..~ r i
t i is determined
by the set
} qi
of resources
which it has so far acquired,
B i = {(r~', ) .... ±i til ' where
(r i , tlj . ) indicates
and the set of request
pairs
(r i' ~ t i )} Pi Pi
that resource r.' l~ is desired
from transaction
d
ti
° Any t r a n s a c t i o n 0
t i for which B i is non-empty
is in a wait
state.
S45
We may then define
the wait relation wcTxT where T is the set of transm
actions,
such that
(ti, tj)Ew iff 3 r : (r, tj)EB i. We say that t i is waiting for tj (to release r). t i may be waiting several transactions
as noted above,
and for several resources
for
from the
same transaction.
The wait graph ~w is the directed graph
a w = (T, w). Deadlock discovery finding pairs
amounts
to finding cycles
(t~ t) in the transitive
in G w or, equivalently,
(but not reflexive)
closure w
to +
of w. Thus deadlock exists iff 3t£T:(t,t)Ew +.
Maintaining maintained
quite O(n.m)
w is trivial, in any case.
expensive,
the
since something like the Bi's will have to be + Calculating w from w is, on the other hand,
best
known a l g o r i t h m s
requiring
O(n 3)
[War 62]
[Bay 74] steps, where n is the number of nodes in G
or
and m the W
number of arcs.
It would be sufficient, closure algorithm
however, to have a good "on-line" transitive + since w need only be partly modified as arcs are
added to and deleted from w.
More precisely,
"on-line"
transitive
rithm solving the following Given w, w +,
closure algorithm means
an algo-
problem: calculate
W' ~ W '+,
where
w'
= wU{(ti,tj)}
or
w'
= w~{(ti,tj)}.
Although it is quite simple to add an arbitrary arc and calculate w' + from w , i t s e e m s i n t h e g e n e r a l case notoriously difficult to delete an arbitrary
+
arc and calculate w '+ from w . No better alternative
to be known than c a l c u l a t i n g
w '+ f r o m s c r a t c h , +
ignoring the fact that we already have w .
i.e.
starting
+
seems
w i t h w' a n d
346
For our purpose~ we need a highly simplified version of the on-line algorithm for the transitive
closure only. By closer i n s p e c t i c n one ob-
serves, that we need to delete sinks of G
and the arcs leading into w sinks of G w only. This is the decisive p r o p e r t y w h i c h makes the diffi+ + cult general p r o b l e m t r a c t a b l e in our special case. To get w' from w now simply amounts to d e l e t i n g or zeroeing out a column from the B o o l e a n + matrix describing w .
We w i l l now develop such an on-line t r a n s i t i v e detail.
closure a l g o r i t h m in more
We assume that transactions w i l l wait in queue q(r)
for an al-
ready locked r e s o u r c e r. The first t r a n s a c t i o n on a queue has successfully locked cution phase. blocked).
Fig.
(or seized)
the resource,
All other t r a n s a c t i o n s
We indicate this as in Fig.
it may be in its seize or exe-
on the queue are waiting
(or
2.
2: T r a n s a c t i o n s w a i t i n g for r e s o u r c e r.
tl has locked r, ti+ I is w a i t i n g for t i to release
(or free) r; i=l,2,o..,k-i,
w h e n t i e v e n t u a l l y releases r (and no p r e e m p t i o n s have occurred in the meantime),
then ti+ i will seize r.
Let us first (Fig.
consider the state t r a n s i t i o n d i a g r a m of a t r a n s a c t i o n
3) and the operations relevant to that diagram, w h i c h a trans-
action may perform:
Fig.
3: The stat
A t r a n s a c t i o n t. can p e r f o r m the f o l l o w i n g operations i n v o l v i n g r e s o u r c e l r and another t r a n s a c t i o n tk:
347
Seize
r:
vrEAi: Free r:
A i :: AiU{r) ; update q(r); A i :: ¢; Vr£A i do if tk is next in queue for r then begin
(r,t i) must be in Bk; B k := Bk~{(r,ti)} ; A k :: AkU{r) ; update q(r); if B k : ¢ then make t k continue to seize
end; +
update w and w ;
Seize ~nsuccessfully:
t i is still in the seize state, let t k be last in queue for r: Case i: no deadlock
arises,
if t i is queued behind
t k in q(r): B i :: {(r,tk)) ; put t i into wait state; +
update w and w ; Case 2: A deadlock would arise,
if t i were queued as
in Case I. This deadlock is discovered by tentatively, but not
definitely queueing t i as i~ Case 1, updating
+
w
@
and checking, whether w
contains
cycles.
In this
case t i might have to preempt r from t k. t k must be in wait state,
since we have a cycle:
In this situation t. should move forward in q(r) uni
til it can be inserted and no deadlock q(r)
arises;
update
accordingly.
Let tz be the first transaction
in q(r)
(starting
from t k) such that inserting t i between tz and t~_ i causes no deadlock,
then we have Case 2a. If there
is no such tz then we have Case 2b. Note:
In [CBT 74] t. is always inserted as close to I
the head of the queue as possible. vors the younger transactions
analyzed.
fa-
and must rely heavily
on an aging mechanism to prevent The processing
This strategy
indefinite
delay.
costs of this aging mechanism are not
348
Case 2a; B~
/ \ := ~ B ~ { ( r , t ~ _ i ) ) j U ( ( r ~ t i ) }
;
B i := {(r,t~_i)}; update q(r); t i goes into wait + update w and w ;
state;
Case 2b: tl cannot be executing,
otherwise
t i would
queue behind tl according to Gase 2a. Therefore tl is seizing or waiting.
We make t i
preempt r from tl, i.e. we queue t i in front of tl; if BI = @, then make tl wait; BI
:= B10((r~ti));
AI
:= A1~{r};
A i := AiU{r} ; + update w and w ;
Necessary
Changes
For the following
to w and w + and Analysis
of their Complexity
analysis
+ . is represented
we a~sume
that w
in an nxn
Boolean matrix K with the meaning K~i~j]~
(ti,tj)£w +.
Complexity of + the Change to w
Operation
Description
of Operation
Seize r:
No change to w or w
0
Yr£Ai:
Since t i frees all its resources
0(n)
+ Free r:
at the end of its execution phase, we can remove all arcs from w, and delete
(tk,t i)
or zero out
column i of K. For the analysis
of the following
operations we need two auxiliary procedures seize
first.
Let t i be in its
state. To insert
an arc
+
(ti,t k) into w and to update w accordingly we need the procedure INSERTI.
349
Operation
Description of Operation
Complexity of + the Change to w
To insert (t~,t i) we need the procedure !NSERT2. INSERTI (ti,tk): Comment t i is in seize state; w :: wU{(ti,tk)}; Vt 0.
:
Vtz
:w+:=wCU{(tj,t~)};
(tj,ti)Ew + (tk,t£)Ew + Vt~
Constant O(n 2) at worst, 0(m) average, see lager analysis
:w+:=w+U{(ti,t£)};
O(n)
:w :=w+U{(tj,tk)};
O(n)
(tk,t~)EW + Vt.
J
+
(tj ,ti)£w +
w
:= w+U{ (ti,tk) } ;
~onstant
INSERT2 (t~,ti): Comment t i is in seize state; w := wU{(t~,ti)} ;
constant
Vtj :w+:=w+U{(tj,ti)}; (tj,tz)Cw +
O(n)
+ w
:= w+U{(t~,ti)};
constant
Note: Since t i is in the seize state, there is no t such that (ti,t)Ew +. Consequently no cycle in w +, and therefore no deadlock can arise due to the operation INSERT2 (t£,ti). Seize r unsuccessfully:
As before, let t k be last in queue for r: for j := k step -i until i do begin tentatively INSERTI (ti,tj) ; if no deadlock then begin £ := j+l; exit to perform Case 2a end end; perform Case 2b;
for each deadlock O(n ~) or O(n+m) resp.
350
Operation
Description
Complexity of @ ........... the Chang,e ~o w
of 0paration
Case 2a: make last INSERTI
at worst O(n 2 ) or O(n+m)
operation definite; if ~ # k+i then begin INSERT2
O(n)
(tz,ti) ;
if ~r'#r:(r',t~_1)6B ~ then else w::w~{(t~,t~_i)} Case 2b: INSERT2 Analysis
search of B~
end;
O(n)
(tl,ti) ;
of INSERTI:
Adding a single arc to w, according
to INSERT1,
say (ti,tk) , requires
oring row k of the Boolean matrix K to all rows j with worst this part of INSERTI requires
O(n 2) operations.
(tj,ti)6w +. At If, however,
there
are m arcs in w +, then each node on the average will have m/n arcs into it and m/n arcs out of it. Accordingly
the average number of operations
will be O(n-(m/n))
VI.
= O(mD.
FOUR STRATEGIES
FOR PREVENTING
With the locking and preemption that a transaction
is delayed
deal with this problem, increasingly
DELAY
schemes proposed
indefinitely
it is still conceivable,
from its execuiton
we propose four increasingly
costly strategies.
eral strategies
INDEFINITE
effective,
It seems quite reasonable
within one system successively
phase. To but also
to employ sev-
in order to force trans-
actidns which have passed a certain age threshold
into their execution
phase and out of the system. Strategy
t,
i: Let t e be the eldest transaction°
such that
teW+t, with
highest
priority.
Schedule all transactions
This
clearly
has a tendency
to speed up the processing
of t e. It is easy to find those t from the + t -row of the Boolean matrix describing w . e
Strategy
2! Stop all transactions
in seize phases from further
seizing
except those t for which teW+t. Strategy
3: For all r such that t e is waiting
transaction
that has locked r. If t
in q(r)
let t r be the
is seizing or waiting,
r from t r and give r to t e. If t r is executing,
preempt
r
insert t e in q(r) directly
351
behind t r. No new deadlocks can arise if we assume that all these pre~ emptions are performed together in one step. Then recalculate the new + w t
Strategy 4: Stop all transactions, which are not executing from seizing further. Then apply strategy 3 for t e until t e has reached its execution phase. Then let the other transactions proceed. ~ome Oberservations
on Strategies
i~ 2~ 3~ 4: It is clear that all straw
tegies will tend to bring t e closer to its execution phase,
Strategy i can be generalized to establish a partition of the transactions into a linearly ordered set of priority classes, which can serve as the basis for a general scheduling strategy. still allow indefinite delay.
Strategies
i and 3 might
It is easy to construct a plausibility
argument, that strategies 2 and 4 will prevent indefinite delay of transactions.
VII.
AN ALTERNATIVE APPROACH:
PREEMPTION AND PARTIAL BACKUP
Although it seems feasible to maintain the basic locking and preempting mechanism proposed in [CBT 74] using the special algorithms described in the preceding sections, there is another argument supporting a more radical preemption than that proposed in the CBT-scheme: Let us assume that rl is preempted from tl by t2, which probably updates rl. Depending on the value of rl, t~ might have locked other resources ri', r1",..,
already.
tl to lock r1', r1",
But since the value of rl changes, the decision of ... should be reconsidered.
In other words, t~
should be backed up within its seize phase to precisely the state it was in just before seizing r1~ it should then be waiting for t2 on rl, and the resources r1', r1",
... locked by tl should be freed again.
In such a preemption scheme a transaction tl will generally be waiting for at most one other transaction t2 on precisely one resource rl. The wait relation tlwt2 shall now mean that tl waits for the holder t2 of rl and not for the predecessor in q(rl),
since we do not need to main-
tain such queues. The resulting G w is obviously a forest of oriented trees, the arcs pointing towards the roots. Only roots are processing in the execution or seize phases. All other transactions are waiting.
352
Since a t r a n s a c t i o n tl is w a i t i n g for tz on p r e c i s e l y one r e s o u r c e r1~ we may label that arc w i t h rl.
The f o l l o w i n g simple a l g o r i t h m s then d e s c r i b e the n e c e s s a r y operations.
Seize u n s u c c e s s f u l l ~ £ Case 1, no d e a d l o c k arises:
tl trying to lock r already locked by t2
means that the tree w i t h root tl, i.e. T(tl),
is a p p e n d e d as a subtree
to t2, the new arc b e i n g labelled w i t h r. Case 2~ d e a d l o c k arises:
D e a d l o c k d i s c o v e r y is quite simple:
Each
s e i z i n g or e x e c u t i n g t r a n s a c t i o n is the root node of one tree. Deadlock arises p r e c i s e l y w h e n tl is also the root node of the tree in w h i c h t2 is. To find this out, just follow the arcs from t2 tO the root. In this case a cycle w o u l d be g e n e r a t e d by i n s e r t i n g an arc
(tl,t2).
The d e a d l o c k is r e s o l v e d by p r e e m p t i n g the r e s o u r c e r f r o m t2.
P r e e m p t i o n works
as follows:
after r. In the process
tz must free r and all resources
- see the F r e e
it locked
operation - corresponding
sub-
trees of tz will be d e t a c h e d - a l l o w i n g their roots to continue seizing and the arc
(t2~t3) from t2 to its father t3 in T(tl) will be deleted.
The tree T'(t2)
r e m a i n i n g after p r u n i n g T(tz) will be attached as a su~-
tree of tl by i n t r o d u c i n g the new arc
Free r':
(t2,tl) w i t h label r.
If t2 frees a r e s o u r c e r' either due to being backed up in its
seize phase or due to f i n i s h i n g an e x e c u t i o n phase and there is an arc (t~,t2)
labelled r'~ then this arc can be deleted,
thereby t4 becomes
a root and can proceed in its seize phase. To free such arcs one must represent
these trees by data structures
in w h i c h it is p o s s i b l e to
follow arcs in both directions.
VIII.
P R E V E N T I N G I N D E F I N I T E DELAY
It is p o s s i b l e that for i n d i v i d u a l t r a n s a c t i o n s
t a situation similar
to a d e a d l o c k might again arise due to t b e i n g p r e e m p t e d and backed up in its seize phase again and again.
Strategies
i and 2 of section ~I are
easily adapted to w o r k for the p r e e m p t i o n and backup technique of sec~ tion VII.
353
The analogon to strategy
3 of section VI is much easier to implement
now: Let t t
be the eldest transaction in the system again. Assume that e is waiting for t on r and t is not executing. (If t is executing,
e nothing can be done except scheduling t with highest priority until t has finished executing.)
Then t
e
will preempt r from t and t will be
backed up in its seize phase to a state just before seizing r. t e becomes a root and continues troduced.
seizing.
(t,t e) labelled r is in-
The preemption process works precisely
tion VII. The main difficulty appeared,
A new arc
as described
in sec-
of strategy 3 of section VI has dis-
since we do not explicitly
store w +. Instead,
cycles are dis~
covered by just following the path from an arbitrary node of a tree to its root, a simple and fast operation.
To prevent pathological
cases
of data bases changing faster than t e being able to catch up in its own seize phase, we can apply an analogon to strategy again.
Instead,
however,
it suffices to prevent
4 of section V!
that transactions
enter from their seize phases into their execution phases. strategy
5. Since only finitely many transactions
will
Let this be
are in the system at
any one time~ and since each executing transaction will run only a finite time, t e will eventually be able to finish both its seize and execution phase,
IX.
and indefinite
PHANTOMS AND PREDICATE
In [EGLT 74] a technique ("predicate
locking")
delay of t e cannot occur.
LOCKS
is described
to use so called predicate
for locking logical,
tential subsets of a data base instead of locking individual jects
("individual
"phantom problem".
object
locking").
This technique
To explain briefly,
that there is a u n i v e r s e ~
what phantoms
of data objects
locks
i.e. existing as well as podata ob~
also solves the are, let us assume
(called "entities"
in [EGLT 741
and "records"
in [CBT 741) which are the potential data objects in the
data base B.
Thus B c ~ .
Two transactions
locked all their needed resources, add a new object r i £ ~
tl, t2 may have successfully
and they may be executing,
tl may
to B and t 2 may add a new object r 2 E ~ t o
B, such
that tl would have locked r2 and t2 would have locked rl, if tl or t2 would have seen r2 or rl resp. during their seize phases.rl
and r2 are
called "phantoms",
since they might, but not necessarily will appear
in B (materialize)
while tl or t2 are in their execution phases.
The appearance culty,
of just a single phantom,
say rl,does net cause any diffi-
since this has the same effect as running the transactions
tl,t2
354
serially, namely in the order t2 followed by tl. In this case also t2 would not see the object rl created by tl and therefore t2 could not be able to lock rl. It is the goal of predicate
locking to schedule trans-
actions in parallel as far as possible under the restriction, parallel schedule is equivalent
effect on the data base as - a serial schedule. a schedule is a "consistent
that the
to - i.e. has exactly the same total
schedule",
One also says that such
or that each transaction sees a
"consistent view" of the data base. To enforce consistent
schedules each transaction t is required to lock
(for read or write access)
all data objects E ( t ) ~
- irrespective of
whether they are in B or are just phantoms - which might in any way influence or be influenced by the effect of t on B. E(t) shall be locked by specifying a predicate P defined o n ~ relation ments o f ~
(or on a part o f - ~ ,
e.g. on a
[Cod 70]) such that E(t)~S(P) where S(P) is the subset of elesatisfying P.
Two transactions
t1~t2 are then said to be in conflict,
predicates PI, P2 it is true that 3r£S(PI)flS(P2)
if for their
and tl or t2 performs
a write action on r~ Thus conflict can arise even if r is a phantom. this case tl, t2 cannot run in parallel, but must be run serially.
In
The
order in which they are run is irrelevant for consistency. This order might be important for other reasons which are not of interest here,
The main difficulties
in using such a locking and scheduling method seem
to be the following: i) Find a suitable predicate Pt for t. Ideally E(t) but then Pt might be too complicated.
: S(Pt)
should hold,
If Pt is chosen in a very simple
way, then S(P t) might be intolerably large, increasing the danger of phantoms~ which are really artificial phantoms.
2) The problem " S ( P I ) N S ( P 2 ) ~ " is even undecidable.
may be very hard~ In general this problem
Thus for practical applications and a given
it is necessary to find a suitable class of locking predicates, which the problem " S ( P I ) N S ( P 2 ) ~ "
is not only decidable,
for
but for
which a very efficient decision procedure is known. For more details and a candidate class for suitable
locking predicates
see [EGLT 747.
3) Phantoms might turn out to be a very serious but mostly artificial obstacle to parallel processing in the following sense: phantoms in
355
S(PI)NS6P2)
prohibit tl and t2 from being run in parallel.
phantoms do not materialize,
and if furthermore S(PI)NS(P2)OB=¢,
of course, t~ and t2 could have been run in parallel. artificial obstacle phantoms
But if these then,
How much of an
are to parallel processing
seems to be un-
known and can probably be answered only for concrete instances of data bases.
X.
A UNIFICATION OF INDIVIDUAL OBJECT LOCKING AND PREDICATE LOCKING
Let us start with the crucial observation for this section: "Transactions,which
are pure readers, do not need to lock phantoms".
A transaction is a "pure reader",
if it is composed of read actions only.
Obviously for many data base applications the pure readers are a very important class of transactions.
To understand our observation,
consider two pure readers tl, t2 first.
Since there are no write actions at all, there is no possibility phantoms to materialize,
for
thus they need not be locked. Phantom locking
is only necessary to control the interaction with a transaction,
say
t3, which also performs write operations. We call t3 a "writer".
Con-
sider the interaction between tl and t3. Let us assume that there is a phantom rES(P~)NS(P3)
such that t3 might perform a write on r. Then tl
and t3 could not run concurrently, If however,
if tz would use predicate locking.
tl uses individual object locking and successfully termin-
ates its seize phase, then tl can run in parallel with t3 provided that
s(P1)ns(P3) where S(PI)
= S(PI)nB,
:
i.e. the set of real data objects
(without phan-
toms) inlB which tl needs to lock in order to see a consistent view of B. But now S(PI)
can be locked by tl using
oQnventional "individual
object locking" as e.g. described in [CBT 74] instead of predicate locking.
If t~ should materialize phantoms, then running tl and t3 in
parallel still is consistent and has the same effect as the serial schedule tl followed by t~.
The following observation should also be claar now: To control the interaction between the writer t~ and the pure reader tl if suffices, that t 3 use individual object locking according to [CBT 74]. t~ need not lock its phantoms,
since tl is not interested in phantoms anyway. We can con-
356 clude that the problem of phantoms
- and therefore
predicate
locking -
arises only between writers~ The preceding
observations
suggest
several alternative
approaches
for
handling the phantom problem: Strategy
i - Serialize
Writers:
Since, as we just observed, writers,
the simplest
concurrently. readers
phantoms
Concurrency
scribed
only between
is possible between arbitrarily
and at most one writer.
dual object
cause difficulties
solution is, not to schedule any writers Consistency
locking and by handling
is guaranteed
deadlocks
in the earlier part of this paper.
many pure by indivi-
and preemptions
The problem
to run
as de-
of phantoms
does
not arise. As mentioned readers. ficant
before~
Serializing
in many applications writers
lo~s of concurrency
with its associated ~trategy
most transactions
are pure
in those cases should notcause
a signi-
and has the advantage
difficulties
locking
is not needed.
2 - Predicate Locks between Writers:
Use predicate two writers
locks as described
in [EGLT 74] only to determine whether
t3~ t~ can run in parallel.
proceed on account of his predicate
After a writer
which are pure readers, the individual see strategy
exactly as in strategy
object locking phase,
i. For more details on
in particular
locking and individual
object
a more general notion of conflict
Let Ul or RI be the set of objects or only read respectively
Then define
with other transactions, the types of locks,
3.
Using predicate
analogously.
is allowed to
locks, he then starts individual
object locking to compete for further processing
allows
that predicate
Obviously BI
than that used in [EGLT 74].
including phantoms which are updated
by a transaction
UINRI:~
locking at thia point
tl. Define Uz and R2 for t2
and U2NR2:@.
:: UI~U2
Bz :: UI~R2
(X.1)
B3 :: R~nU2 B4 :: RlnR2 Diagrammatically
this can be shown as in Fig.
4.
357
BI
B2
Updated by tl
UI
R21
U2
RI Read by t l
B3
Fig.
4: Possible intersections
B~
of update and read-only sets.
For tI and t 2 to proceed in parallel with individual object locking the following conditions must hold: B~ = B2 =
¢ @ v B~ : @
(x.2)
Without individual object locking the stronger condition B~=@^Bs=¢
is
required in [EGLT 74]. To see that our weakend condition suffices let us assume without
loss of generality that B2=¢ and B3#¢.
B3 is read only by tl, but is updated by t~. Also B3 may contain phantoms which are materialized by t2. Let us assume that both tl and t2 are successful in their seize phases, i.e. while locking individual objects excluding phantoms,
and then continue to run in parallel.
We claim
that this is equivalent to the serial schedule tl followed by t2. Since BI. and B2 are both empty, the effect of tl on B cannot in any way influence t2, thus t2 has the same effect on B if it is run after tl or parallel to tl. B3 is not empty, but tl successfully locked all the resources it needed to see a consistent view of the data base. tl may have missed phantoms materialized by t~ , thus the effect of tl will be the same as in the serial schedule tl tz. Consequently running tl and t2 in parallel is equivalent to the serial schedule tlt2, and is therefore consistent. The conditions
(X.2) for tl and t2 to proceed in parallel can be gener-
358
alized for ti,t2,.oojt n to proceed concurrently.
This is left to the
reader.
Strategy 3: This strategy sacrifices
some concurrency,
but is much
simpler to implement than strategy 2. There a writer t i was required to perform individual object locking both in the sets U i and R i. It turns out that with the conflict condition of [EGLT 74] between writers, writers need perform object locking only within Ui, but they need not set any read locks. Assume that a writer ti first the predicates
locks the sets U i and R i by specifying
i and PR" l The condition for two writers t i and tj to PU
run concurrently then is:
s(P~) n s~P u) = ¢ s(P~) n s(P~) = ,
(x.3)
s(P~) n s(P~) = ¢ After successfully locking U i and R i the writer then proceeds to perform individual object locking within U i by setting "u-locks" use of data objects to be updated.
for exclusive
These u-locks are necessary for pre-
venting pure readers from reading those objects while they are being updated.
Since the sets S(P$) are pairwise disjoint,
there is never any
possibility for conflict between u-locks of different writers. We observe that writers need not set any individual read~lo~ks, "r-locks",
i j since S(PR)OS(Pu)=¢,
called
and conflict of u-locks of one writer
and r-locks of another would not be possible anyway. Furthermore, eral
r-locks of readers and writers
allowed,
since data objects are shareable
as long as they are only read.
The only potential conflict still remaining is between read-access pure readers and update-access which is not a phantom.
of a writer to the same data object s, s to be read. This must happen during
Several r-locks can be on s, but not both r-locks of pure
readers and a u-lock of a writer. Thus if a reader lock (u-lock)
of
To control this we require pure readers to set
r-locks on individual data objects a seize phase.
sev-
on the same data object would be
first then a writer
(reader)
(writer) sets an r-
trying to set a u-lock
lock) on the same data object must wait for the reader
(r-
(writer) to re-
lease s. This leads to the usual wait situations with the possibility for deadlock and the need for preemption and backup as described in the
3~
first part of this paper. If a deadlock is discovered then a reader or a writer is backed up within its seize phase for setting r-locks or u-locks resp. as described before. For simplicity we can assume that locking with the predicates PU and PR is one indivisible operation, thus deadlock between writers is not possible during this phase of predicate locking. To
summarize, a writer t i proceeds as follows:
i If conditions (X.3) are satisfied for all i) Lock predicates PUi and PR" other writers tj which have successfully locked their predicates P$• and P~" then proceed with step 2), otherwise wait, until PUi and PRi can be successfully locked, then proceed with step 2). Start a seize phase setting u-locks on individual data objects to be updated within S'(P~).- In case of conflict with r-locks wait or be backed up within this seize phase.
3) A pure reader performs a seize phase setting r-locks on data objects to be read. In case of conflict with u-locks the reader must wait or be backed up within this seize phase. Summarizing the main advantages of strategy 3 we observe: o
Only writers use predicate locking to handle phantoms.
o
Concurrency between writers is possible.
o
Writers need an individual object locking phase for setting u-locks
o
Pure readers do not use predicate locking, they set r-locks during an
in their update areas only. In this phase phantoms are ignored. individual object locking phase only and ignore phantoms completely. Note: Since predicate locking is now needed for writers only, it might be quite feasible to replace arbitrary predicates by a fixed partitiening of the data base or by a fixed family of subsets of ~ ,
whose inter n
section properties are known once and for all and recorded in a Boolean matrix (intersection between two subsets is empty or not). Instead of locking predicates the above subsets are then locked by writers. Acknowledgement:
I wish to thank Mr. John Metzger, with whom I had many
useful discussions during the writing of this paper.
380
Bibliography [Bay 74] Bayer~ R., "AGGREGATES: A Software Design Method and its Application to a Family of Transitive Closure Algorithms". TUM-Math. Report No. 7432, Technische Universit~t M~nchen, Sept. 1974 [Bjo 73] Bjork, L.A.~ "Recovery Semantics for a DB/DC System". Proceedings ACM Nat'l. Conference [CBT 74] Chamberlin~
1973, 142-146
D.D., Boyce, R.F., Traiger,
I.L., "A Deadlock-free
Scheme for Resource Locking in a Data Base Environment". formation Processing
In-
1974, 340-343
[Cod 70] Codd, E.F., "A Relational Model for Large Shared Data Banks". Comm. ACM 13, 6 (June 1970), 377-387 ICES 71] Coffman~ E.G. Jr., Elphick, M.J., Shoshani~ A., "System Deadlocks".
Computing Surveys 3, 2 (June 1971), 67-78
[Dav 73] Davies, C.T.~ ~'Recovery Semantics ceedings ACM NatTl. Conference
for a DB/DC System".
Pro-
1973, 136-141
[EGLT74] Eswaran, K.P., Gray, J.N., Lorie, R.A., Traiger I.L., "On the Notions of Consistency and Predicate Locks in a Data Base System". [Eve 74] Everest, grity".
IBM Research Report RJ 1487, Dec. 30, 1974 G.C., "Concurrent Update Control and Data Base InteIn: Data Base Management
(ed. Klimbie, J.W., and Koffe-
man~ K.L.), North Holland 1974, 241-270 [Fos 74] Fossum, B.M., ~'Data Base Integrity as Provided for by a Particular Data Base Management'System". (ed. Klimbie, J.W., and Koffeman,
In: Data Base Management
K.L.), North Holland 1974,
271-288 [Hab 69] Habermann~
AoN., "Prevention of System Deadlocks".
Comm. ACM
12, 7 (July 1969), 373-377, 385 [KiC 731 King, P.F., Collmeyer, A.J., "Database Sharing - an Efficient Mechanism for Supporting Concurrent Processes". AFIPS Nat'l. Comp. Conf. Proceedings
1973, 271-275
361
[011 741011e, T.W., "Current and Future Trends in Data Base Management Systems". Information Processing 1974, 998-1006 [Ram 74] Ramsperger, N., 'Werringerung yon Proze~behinderungen in Rechensystemen". Dissertation, Technische Universit~t M~nchen, 1974 [Sch 741Schroff, R., "Vermeidung von totalen Verklemmungen in bewerteten Petrinetzen". Dissertation, Technische Universit~t M~nchen, 1974 [War 62] Warshall, S., "A Theorem on Boolean Matrices". Journal ACM 9, 1 (January 1962), 11-12 [Wil 721 Wilkes, M.V., "On Preserving the Integrity of Data Bases". The Computer Journal, 15, 3 (August 1972), 191-194
D A T A EASE STANDARDIZATION
A STATUS REPORT
Thomas B.
Steele
Jr.
Equitable Life Assurance Society New York, N.Y.~ USA
This paper is a report on the current
(1975 September)
status of the
Study Group on Data Base Management Systems in the United States, together with some remarks on the ISO activity in the area.
While
the official purpose of this Study Group is an investigation of standardization potential in the area of data base management systems,
an important by-product of the work of the Group has been
the development of a set of requirements for effective data base management systems.
As no existing or proposed implementation of a
data base management system satisfies these requirements,
it is
appropriate to expose these ideas as widely as possible for evaluation. Among the responsibilities Requirements Committee
of the Standards Planning and
(SPARC) of the American National Standards
Committee on Computers and Information Processing generation of recommendations
(ANSI/X3)
is the
for action by the parent Committee on
appropriate areas for the initiation of standards development.
For
some time it has been evident that data base management systems are in the process of becoming central elements of information processing systems,
and that there is less than full agreement on
appropriate design.
In addition to the existence of a number of
implementations of such systems
(CODASYL 1969), there are several
documents generated out of the collective wisdom of some segment of
363
the information processing community which are either proposals for specific systems
(CODASYL 1971) or statements of requirements
(GUIDE-SHARE 1970),
(CMSAG 1971).
As is well known there is a
debate in the community on whether existing and proposed implementations meet the indicated requirements or whether the requirements as drawn are all really necessary.
Further,
there are
serious questions about the economics of meeting all the stated requirements. In addition to the above considerations there is argument on the appropriate data model to use:
relational,
hierarchical,
network.
This particular debate has been referred to as the "theological" discussion of the data base management system theorists. been criticism of the use of this word; criticism by quoting Hilaire Belloc: ultimately theological".
There has
I can only respond to that
"All political questions are
Indeed, such it seems to be, from which it
follows that the correct answer to the question of what data model to use is necessarily
"all of the above".
One of the outcomes of
the work reported in this paper is a mechanism that permits this answer in a meaningful
sense.
In the autumn of 1972, responding to the clearly perceived need to rationalize the growing confusion,
SPARC, then under the
Chairmanship of the author, took formal action to initiate investigation of the subject of data base management systems in the context of potential standardization. o
Consistent with its normal
practice when confronted with a complex subject,
SPARC established
an ad hoc Study Group on Data Base Management Systems, under the Chairmanship of D.
M.
initially
Smith of the EXXON Corporation and
now under the Chairmanship of the author.
This Study Group was
convened with a charge to investigate the subject of data base
364
management
systems with the objective of determining which, i f an[,
aspects of such systems are at present suitable candidates development of American National Standards.
for the
The "if any"
qualification is important because a negative response is just as meaningful as a positive response in a standards context. present" qualification is equally significant, continuing need for review as the requirements,
The "at
indicating the technologies and
economics change over time. The eventual result of the deliberations of this Study Group will be a series of reports in a specified format
(SPARC 1974), identifying
potentially standardizable elements of data base management systems and recommending whether or not there is a need, technological feasibility and economic justifications
for the initiation of a
standards development project in the area. be examined is 7 with respect to COBOL.
The first interface to
The present target date for
completion of this work is the beginning of 1976. Report the Study Group has prepared a document
As an Interim
(SPARC 1975) which
has had wide circulation and is soon to be generally published.
It is appropriate at this juncture to provide a list of the members of the Study Group and their affiliations to indicate the breadth of representation.
It is worth noting the extent to which the user
community is participating in this effort, a rare event in data processing
standardization on any continent.
Bachman,
Honeywell Information Systems
C.Wo
CohnF L.
IBM Corporation
Florance, W.E.
Eastman Kodak Company
Kirshenbaumt
Equitable Life
F.
365
Kunecke, H.
Boeing Computer Services
Lavin, M.
Sperry Univac
Mairet, C.E.
Deere and Company
Sibley, E.H.
University of Maryland
Steel, T.B., Jr.
Equitable Life
Turner, J.A.
Columbia University
Yormark,
The RAND Corporation
B.
The initial tasks of the Study Group were the difficult ones of understanding and coming to respect the varying views of the different individuals--all
theologies were
(and still are)
represented--and developing a vocabulary that was consistent and mutually comprehensible.
It is not clear whether this last task has
yet been fully accomplished,
although considerable closure has been
attained.
In the course of the early discussions it emerged that what any standardization
should treat is interfaces.
There is no merit and
potential disaster in developing standards that specify how components are to work.
What is potentially proper for standards
specification is how the components are meshed together; words, the interfaces.
in other
With this notion in mind a generalized model
of a data base management system has been developed that highlights the interfaces and the kind of information and data passing across them.
Figure I is a simplified diagrammatic view of this model.
It should be noted that, except for the man-system interfaces, technological nature of the interface is not determined; hardware,
software,
firmware or some mixture.
the
it could be
Indeed, some of the
366
interfaces could be man-man, germane to what follows.
although pursuit of that notion is not
The important point is that the
implementation of the system is not prescribed, only the requirements
that must be satisfied.
simplified diagram, detailed picture,
As was noted above, this is a
but in order to maintain consistency with the
the numerical identifications of the exhibited
interfaces have not been changed so there are some numbers missing.
The hexagonal boxes depict people in specific roles. rectangular boxes represent processing functions,
The
the arrow
terminated lines represent flow of data, control information, programs and descriptions,
and the dashed boxes represent program
preparation and execution subsystems
(including compilation and
interpretation
the solid bars represent
functions).
essential interfaces, Group's deliberations.
Finally,
the ultimate subject matter of the Study These interfaces are numbered rather than
conventionally named for simplicity of discussion and to avoid confusion. Among the processes and interfaces omitted on this cut down version of the diagram are the various ways that system programmers and machine operators can invade the system to make ad hoc repairs, certain bypasses of the system mechanism that are asserted to promote efficiency but of debatable desirability in view of their impact on data independence,
integrity and security,
and the entire
structure of physical mapping of data onto specific storage devices. All of the latter structure is to be found to the left of interface 21, much of it will be dictated by the laws of physics and, as such, is of little concern to the current investigation.
The principal
elements of the Study Group's view of a data base management system are displayed and, in particular, the three schema approach,
367
reflecting the new element introduced by this work,
is illustrated.
The lower right hand side of the diagram, the hexagon labelled "application programmer",
the dashed rectangle labelled "application
program subsystem" and the two interfaces labelled "7" and "12" comprise the entire non-data base activity of preparing and executing an application program.
This structure may be viewed as
replicated into a variety of subsystems,
all interfacing with the
data base management system through interface
12, differing in the
nature of the language used by the programmer to communicate across the man-machine interface.
This language may be a conventional
procedure language such as COBOL, ALGOL or PL/I, recognizable special languages like report generators,
inquiry languages or
update specifiers,
or some potentially new type of procedure or
problem language.
The critical thing to note here is that all data
description passes into the application program subsystem across interface
12 from the data base system itself.
This, of course,
is
nothing new°
The lower left hand side of the diagram, "system programmer",
the hexagon labelled
the dashed rectangle labelled
"system program
subsystem" and the two interfaces labelled "16" and "18" comprise the entire normal interface available to the system programmer when it is necessary to bypass the ordinary mode of access to the system. Routine system maintenance and modification will occur through this subsystem.
There are some exceptions,
not concern the thrust of this paper.
as noted above, but they do It should also be noted that
there is clearly available the installation option of permitting application programmers to operate across this interface, potentially dangerous as that may be. in this construction.
Again,
there is nothing new
368
It is the upper portion of the diagram that is of concern in this paper.
Current data base systems envision a two level structure;
the data as seen by the machine and the data as seen by the programmer.
A plethora of confusing terminology has been employed
to distinguish between these views.
The Study Group has chosen to
employ the terms ~internal" and "external" to make this distinction. In addition,
the Study Group has taken note of the reality of a
third level, which we chose to call the "conceptual",
that has
always been present but never before called out explicitly.
It
represents the enterprise's view of the structure it is attempting to model in the data base.
This view is that which is informally
invoked when there is a dispute between the user and the programmer over exactly what was meant by program specifications.
The Study
Group contends that in the data base world it must be made explicit and, in fact, made known to the data base management system. proposed mechanism for doing this is the conceptual other two views of data,
schema.
The The
internal and external, must necessarily be
consistent with the view expressed by the conceptual schema. This required consistency can be maintained and verified in a reasonably fail safe manner only if the conceptual schema is machine processable.
The bulk of the remainder of this paper will discuss
the nature of the conceptual to the system.
However,
schema and how it may be made explicit
it is worth examining what its presence
means to the dynamics of the data base management system operation in terms of the diagrammatic representation of Figure Ignoring the system programmers, operation,
who are extraneous to normal
there are four human roles identified:
administrator,
I.
the data base administrator,
the enterprise
the application
administrator(s) s and the application programmer(s).
Notice that
369
these are roles as opposed to individuals.
The same individual may
function in different roles and one role may involve several individuals simultaneously.
It is critical, however, that there is
only one enterprise administrator and one data base administrator (viewed as roles) while there may be several application administrators and several application programmers.
This leads to
the notion that there can be several external schemas, each representing a different view of the data, provided each is consistent with and derivable from the single conceptual schema. extension there can be several application programmers, necessarily working on the same program,
By
not
that use the same external
schema. Each "administrator"
is responsible for providing to the system a
particular view of the necessary data and the relevant relationships among that data.
The central view, as noted above, is that of the
enterprise administrator who provides the conceptual schema. must be emphasized,
It
and apparently with repetition as this point
seems to be the most frequently missed by those not on the Study Group who have examined its work, that the conceptual schema is a real, tangible item, made most explicit in machine readable form, Couched in some well defined and potentially standardizable syntax. Much of the remainder of this paper is concerned with conceptual schemas and the author's view of the possibilities for the semantics of such schemas.
In order to provide a context, however, a
preliminary examination of the dynamics of the process envisioned is appropriate.
The enterprise administrator defines the conceptual schema and, to the extent possible and practicable, validates it.
Some, but in
general not all, of this schema can be checked for consistency by
370
mechanical means.
As the conceptual schema is a formal model of the
interesting
(for the data base management system)
enterprise,
if the situation is at all complex then the problem of
logical incompletability will be encountered conceptual
aspects of the
(Godel 1931).
The
schema will contain, among other things, definitions of
all the entities to be comprehended--up to the isomorphism determined by identity of those properties defined in the schema as relevant.
Relatonships amongst these entities will also be
explicated,
as will the constraints on allowable values of "data".
By defining those persons with some access to the data base management system as entities of interest,
it is possible to
directly model the rules of access and, thus, provide security control at the level of the conceptual
schema.
This is a key point.
It is well known that there are substantial problems with security control and the importance if a centralized point having a view of the entire system must not be overlooked. The data base administrator
(a definition of this role somewhat at
variance with the conventional conception of the task) responsible
for defining the internal schema.
is
This schema contains
an abstract description of the storage strategy currently employed by the data base management system. stored flat, hierarchical,
Whether the data is actually
networked,
including any meaningful combination, schema.
inverted or otherwise, is contained in the internal
The "internal syntax" of the data values will also be found
in the internal schema;
such items as the radix for numeric values,
coding schemes used, units of measure,
and the like.
Access paths
and the relational connectivity between data representations will be defined. conceptual
All of this must be consistent with and derivable from the schema, which,
therefore, must be available for display
to the data base administrator,.
The internal schema processor
(see
371
Figure
1) provides a mechanical check on this consistency.
Within
the limits imposed by this requirement of consistency with the conceptual schema,
the data base administrator is free to alter the
internal schema in any way appropriate to optimization of the data base management system operation.
Indeed, by use of suitable
interpreters it will be possible to reorganize the internal structure of the data base dynamically while normal operations continue.
In view of the massive size of some data bases currently
comtemplated,
this is an essential requirement,
and it would seem
that only the guarantee of separation of the users' view and the system's view of data provided by interposition of the conceptual schema permits this. The third "administrator"
role, the application administrators,
provide the external schemas
(analogues of the DETG "sub-schemas")
which define the application programmers'
views of the data.
These
external schemas are a multiplicity in concept and will, in general, only encompass the portion of the data base relevant to a particular application.
It is envisioned that each general application area
will have its own application administrator who provides the appropriate schemas for that area. descriptions
(schemas)
These are the only data
seen by an application program and provide
the only avenue of data name resolution.
It would carry this essay
too far afield to discuss the complexities of name resolution and symbol binding;
suffice it to say that all external name resolution,
whether performed at compile time, program invocation time, or module execution time are done across interfaces
7, 12 and 31
through the intermediation of the appropriate external schema across interface 5. Exactly the same remarks about the consistency of the various
372
external schemas with respect to the conceptual schema as was noted about the internal schema are to be understood, with the qualification
that one external schema may be a true subset of
another and, under the hypothesis that consistency in this sense is transitive,
the external schema processor may only validate one
external schema against a more comprehensive one known to be consistent with the conceptual
schema.
After the appropriate schemas are defined,
the system dynamics
becomes quite straightforward and little different from current systems. specifier,
The application programmer etc.)
external schema, declarationst
(report specifier,
inquiry
does his job in the usual way, using the provided both explicitly and implicitly,
providing procedural
invoking compilation,
input across interface 7 and
generation or other relevant processes through
the application program subsystem.
Upon entry to execution mode,
requests for data are passed across interface conceptual/external
as his set of data
12 to the
transformer which computes the mapping between
the external data description and the conceptual data description. This description passes across interface 31 to the conceptual/internal
transformer which in its turn computes the
mapping between the conceptual data description and the internal data description©
In general,
the internal and conceptual schemas
will be static, so, depending upon the mapping complexity and the nature of the implementation, the two transformers description)
it may well be possible to collapse
(into and out of the conceptual data
by computing the composite mapping function.
This
should not obscure the face that in order to maintain true data independence it must always remain possible to force this process to occur in two steps.
373
Finally,
the data request as transformed is passed across interface
30 to the internal/storage transformer.
The internal schema will
recognize storage as something like a linear, multiorigined address space, and it will be necessary to remap this abstract model of storage onto hardware constructs such as tracks, cylinders and the like.
This "dirty" description then is passed across interface 21
into the bowels of the machine transformations
therein)
process reversed.
(and may go through other
until actual data is obtained and the
This brief description has been couched in terms
of obtaining data but, of course, storage of data proceeds in the same way, mutatis mutandis. Question of locks, avoidance of "deadly embrace",
security,
integrity and other data base managemen t system problems all have their place in this scheme of things, but it is beyond the scope of this paper to consider them.
By and large they present no distinct
aspects in this three level view from those found in conventional approaches,
except that in some instances--security,
for
example--the solutions may be both easier and more assured. Before turning to a discussion of the conceptual schema it is appropriate to insert a brief excursus on the status of data base management
system standardization in ISO.
Meeting of ISO/TC97,
At the Eight Plenary
held 1974 May 14-17 in Geneva, Resolution
passed with 14 affirmative and two negative
(Canada, France)
11,
votes,
assigned responsibility for data base management to Subcomittee 5 (Programming Languages) group on the subject
and instructed SC5 to establish a study
(ISO 1974).
Such a Study Group was established by SC5 and several countries submitted position papers. Interim Report.
The USA position paper was the SPARC
An 1975 June 24-26, the Study Group met in
374
Washington, USA.
DC with delegations from France, Germany,
Sweden and the
Written input was also available from Switzerland and the
United Kingdom.
The following six points are the conclusions of
that meeting: I.
The Study Group concludes in response to the Netherlands Proposal on Data Base Management
(ISO/TC 97/598), that any
standardization action in the area of data base management systems based on existing proposals is premature in the absence of criteria against which to measure such proposals.
2.
The Interim Report of the ANSI/X3/SPARC Study Group on Data Base Management Systems
(ISO/TC 97/SC 5 (USA-75)
N359)
is accepted by
the ISO/TC 97/SC 5 Study Group on Data Base Management Systems as an initial basis for discussion on a gross architecture of data base management systems.
3.
The Study Group acknowledges the need to identify all types of data base management systems users and to specify their requirements~
4.
The Study Group proposes to review and augment the terminology used in N359 and the concepts therein.
As the initial effort,
the Study Group will establish priorities in terms of the interfaces identified in N359 for further investigation.
These
priorities will be chosen to optimize the benefits derived from standardization. 5.
As a parallel activity to those identified above, the current CODASYL data base specifications will be evaluated.
The Study
Group notes at this time that preliminary studies by various national and internationl bodies have indicated that the CODASYL specifications are not suitable for standardization as they
375
stand.
6.
The Study Group will recommend development work for those interfaces appropriate for standardization for which no adequate candidate exists.
The next meeting of this Study Group will be in Paris,
1976 January
12-15. The underlying notion behind the conceptual schema as envisioned by the Study Group is the "entity-property-value"
trinity made explicit
in GUIDE-SHARE requirements study
1970).
(GUIDE-SHARE
There is
general agreement among the members of the Study Group on the overall nature and objectives of the conceptual schema, but in my judgment there is less real agreement on its exact place in the scheme of things than might seem the case from the Study Group reports.
To a considerable extent this lack of agreement does not
hamper progress,
and may well not matter in the long run provided
the distinct views are carefully articulated.
What follows is the
author's view of the conceptual schema notion and some indications on how it can be formalized.
Figure 2 is a schematic illustration of how one can proceed from "reality" to the data models actually used by application programs. It is derived from a metaphysics that may not be wholly congenial to everyone but should at the very least be familiar to those acquainted with the principles of scientific explanation (Braithwaite
1953).
It is assumed that a "real world" exists in
some meaningful sense.
Subordinate to this "true" reality can be
found the "perceived" reality obtained through our sensory inputs as transformed by our brains.
This immediate, primitive image of
reality is, or at least can be, transformed into a rational mental
376
model of reality by a process known as scientific abstraction. This process can be roughly described as: one's perceptions);
(2) experimentation
(1) observation
(noting
(stimulation of the
perceived reality to generate new perceptions);
(3) ~eneralization
(intuiting that similar stimulation will generate similar perceptions);
(4) theorizin~
generalizations);
(identifying fundamental
(5) ~
(inferring that new and different
stimulations will produce new, albeit expected, finally,
(6) verification
observing the results).
perceptions);
and,
(initiating these new stimuli and Repeated iteration of this sequence leads
to a gradually more refined mental model of the real world.
In order to communicate this model to someone--or something--else, it is necessary to use a language.
As is well known, natural
languages are unsatisfactory media for ~recise communication of the content of scientific models.
At present the best available vehicle
for such precise communication
is that of formal languages
1930).
(Tarski
While there are complications in the reduction of scientific
descriptions of reality to existing formalisms, most of these problems are to be found on the outer limits of the models. Generally one does not really wish to describe a total model of all reality--the
"best ~ model whose boundary is fuzzy and moves with the
growth and modification of scientific knowledge.
What is desired is
to describe some limited model of a portion of reality,
extracted
from the "best" model by a process we can call "engineering abstraction".
While it may be the case that the universe is "best"
described by the interactions of 3.10 ~0 quarks,
the typical engineer
is more apt to build his bridge by combining girders, and rivets.
cross braces
The molecular biologist may view the human being as a
complex structure of water, protein molecules,
DNA and other,
377
assorted chemicals,
but to the insurance agent a human being is not
much more than an age, sex and checkbook. abstracts
those aspects of "reality"
the rest.
Thus, formal descriptions
appropriate
For any application
considered
relevant
one
and ignores
need only deal with the
level of abstraction.
This resultant
formalism--the
"symbolic"
model--is
model of the interesting
derived from the
limited,
"engineering"
embodied
in the mind of the perceiver by a process we will call
"symbolic abstraction", conventional,
subset of reality as
and is the linguistic expression
predetermined
in some
syntax of a set of forms to which
suitable semantic content is given by the adoption of rules of designation
and rules of truth
(Carnap 1942).
totality of what is known and interesting modeled.
It i__ssthe conceptual
schemma.
It expresses
about the enterprise being The processes of mapping
from this formal model to the data models we call "internal and "external
schemas" may be complex and difficult
they are straightforward conceptual
in principle,
schema has sufficient
the
schema"
in practice,
but
providing only that the
detail to permit all necessary
expression. In the author's view the proper choice of formalism--indeed, only acceptable
choice--is
that of modern symbolic logic;
order predicate calculus with identity together with a suitable axiomatic
1958), augmented by appropriate modal logics finally,
supplemented
associated
by "individuals"
non-logical
predicates
formalisms
of symbolic
invocation of all the analysis
(Bernays
1938),
& Fraenkel
(von Wright
1951), and,
(Quine 1961) and the
and the axioms for their behavior.
The reasoning behind this position conventional
the first
(Hilbert & Ackermann
set theory
the
is quite simple.
Use of the
logic and set theory permit the
that has been devoted to this topic
378
by three generations of logicians.
Both the pitfalls and
possibilities are well understood and the limitations clearly defined.
Further,
available.
it is in some sense the most general scheme
If one accepts Church's Thesis
contemporary logicians,
(Kleene 1952), as do most
it is the most general scheme that can be
contemplated for use with digital machinery.
From this it is
possible to deduce that anything expressible to a machine with precision at all is necessarily expressible in this fashion. As an aside let me emphasize a point which should be obvious but is, perhaps, worth making explicit for clarity.
Whenever in this paper
I use the word "set" I intend it in the strictly logical sense as a synonym for "collection" or the German "Mange" or the French "ensemble",
not in any way as that linguistic atrocity perpetrated
by the DBTG Report wherein the nineteenth,
fifth and twentieth
letters of the Roman alphabet are used in that order as the name of a peculiar object.
This may seem harsh, but the point at issue
represents a prize example of the manner in which the information processing sciences generate confusion for themselves and others by casual misuse of words.
Indeed, it reminds me of Orwell's Newspeak.
In a paper of this character it is not possible to probe the possibilities of the sketch above in any depth.
However, certain
examples may clarify the power of the approach.
It is unequivocally
precise in any modern version of set theory as to what is meant by a "relation".
A relation is a set of ordered pairs
<x,y> being definable as ~ x ~ , t x , y ~ )
(the ordered pair
and one can say that x bears
the relationship R to y provided that <x,y> g R ("~" being the predicate of set membership).
Thus, the confusion between a
"relation" and a "relationship", terminological
idiocy,
which is another example of
is made quite precise.
379
Relations of interest can be given names and defined either by enumeration of their members or by any property that must be possessed by a pair to enjoy membership,
in exactly the same fashion
that any other set is completely defined by its members. The equally troublesome concept of "order" can be explicitly defined.
A partial ordering is any relation having the properties
of reflexivity,
anti-symmetry and transitivity.
A linear ordering
is a partial ordering where any two elements in its field are comparable and a well-ordering is a nowhere dense linear ordering. Structures of arbitrary complexity can be constructed. of a general array
The concept
(Steel 1964) developed out of some early data
structure studies, and it can be shown that any nondense complex is expressible as a general array so defined.
As digital computers
cannot deal with dense structures except in finite approximation, this would seem to be sufficient. The modal predicate of deontic logic, its derived predicates "-0-"
"O-"
"O"
(for "obliged to"), and
("obliged to not" E "forbidden to"), and
("not forbidden to" I "permitted to") provide the required
paradigm for expressing either legal constraints in the model or defining the rules of access.
These examples could be multiplied a considerable length, but should be sufficient to illustrate the point.
From a theoretical point of
view there is no more suitable vehicle for expressing a conceptual schema.
This is, of course, not the whole story.
First, theoretical possibility and practical possibility are not identical.
There is the danger that the necessary expressions get
too large and cumbersome for effective use. with million instruction operating systems,
In an age where we deal this is not a fully
380
persuasive argument in any event.
It is, however, moot.
The number
and character of the necessary expressions do not get excessive; unlike,
say, the contrast between conventional procedure languages
and Turing machines.
On the contrary, nearly a century of search
for compact notation has resulted in definitional sequences that provide more compact expression than one typically finds in programming language data descriptions perform less of the task.
(or sub-schemas)
which
Some of this is due, of course, to the
use of large character sets, but in any case economy of notation is not a problem. A second potential difficulty is the actual use of the tools to construct the desired models, which is a task that is necessarily an art rather than a science.
Clearly,
if the process of constructing
a model could be itself formalized one would already have the model in the input.
To this point I can only say that I have personally
been partially successful
in constructing models of relatively
complex insurance procedures,
and in a matter of a few days,
inventing notation as I went along.
This effort was only partially
successful in the sense that, while I was able to generate static models with no difficulty,
the problem with time and the dynamic
behavior of the model caused difficulties of two types.
First,
thre was the philosophical problem of the potential as opposed to the actual.
How does one treat the property "age at death" prior to
the actual death of the individual?
Formally,
of course, this is
trivialF but obtaining some assurance that the formalism does not hide an ambiguity or paradox is far from trivial. The second problem with time has to do with the inelegance of making the variable denoting time distinguished and, therefore, case.
a special
While there is nothing inherently wrong with mathematical
381
inelegance per s e, several thousand years of logical and mathematical history suggest intuitively that something is wrong. Some recent work
(Thomasen 1974) on the reduction of tense logic to
modal logic hints at a solution to this problem. I have gone far enough with this work to become convinced that the approach is sound and no fundamental invention is required; some hard work to refine the ideas.
There remains,
only
however, one
further potential criticism of this approach with which it is necessary to deal.
It is a criticism to which I would prefer to
comment "a pox on those who raise it" and then ignore the matter. As a practical consideration,
however,
it will not go away.
It is
much the same argument that has been raised in the past against every programming language except COBOL;
i.e., the language is too
much like algebra, only the mathematicians can use it.
The argument
is irrefutable for if people believe they cannot understand something,
they won't!
However,
there is one difference between
this situation and the programming language situation.
The only one
who must construct models is the enterprise administrator and only the data base administrator and the applications administrators need to read such models. well compensated.
These individuals are presumably senior and
They can be required to have a little education.
Furthermore, while I have no proof, it is my belief that once the barrier of belief in its esoteric character is overcome,
it is no
harder to teach reasonably intelligent people the relevant logic than it is to teach them COBOL and the DDL.
To summarize this personal view of the nature of a conceptual schema, any alternative is either equivalent and therefore equally complex while being less understood for lack of familiarity,
or it
is not equivalent and therefore can only model a subset of that
382
reality otherwise amenable to modelling.
The only real issue is
whether some less powerful but more acceptable formalism exists that is adequate for modelling anticipated enterprises for a reasonable future.
In my view neither data structure diagrams
nor normalized relations
(Bachman 1969)
(Codd 1970) nor the CODASYL DDL
(CODASYL
1971) being discussed at this Working Conference are candidates for such an alternative.
As overlaid structures for internal and
external schemas they may be quite suitable;
the criteria for
acceptability being different. In conclusion~
let me reiterate that the latter portion of this
paper is my personal view of the appropriate structure for a conceptual
schema and does not necessarily represent the view of
other members of the ANSI/SPARC Study Group on Data Base Management Systems.
On the other hand, the general principle of the three
level approach and the essential requirement for the conceptual schema is fundamental to the deliberations of the Study Group.
It
is reasonable to claim that this position will be maintained in the Final Report of the Study Group and will continue to characterize the official position taken by ANSI on behalf of the USA in any deliberations on data base management systems in the ISO.
383
REFERENCES Bachman,
C. W.: "Data Structure Diagrams",
Bernays,
P. and Fraenkel,
Data Base,
A. A.: "Axiomatic
Set Theory",
North-Holland Braithwaite,
R. B.: "Scientific Press
Carnap,
R.: "Introduction (Cambridge,
CMSAG Joint Utilities
1:2 (1969).
Explanation",
(Amsterdam
1958).
Cambridge University
(London 1953). to Semantics",
~
Harvard University Press
1942).
Project:
"Date Management Requirements",
Systems CMSAG
(Orlando,
FL
1971). CODASYL:
"A Survey of Generalized available
CODASYL":
from NTIS
(Washington,
Banks",
CACM,
13:6
(1970), pp. 377-387.
Mathematica
und verwandter
(1931), pp.
173-198.
"Data Base Management Inc.
Hilbert,
(New York,
D. and Ackermann,
W.:
669.
Systems
I", Monatshefte,
SHARE
N. Y. 1970). "Grundzuge der Theoretischen
1938). (Geneva-3)
S~tze der Principia
System Requirements",
Logik",
ISO: ISO/TC97
(New York 1971).
Model of Data for Large Shared Data
K.: "Uber formal unentscheidbare
GUIDE/SHARE:
Systems",
DC 1969).
"Data Base Task Group Report", ACM
Codd, E. F.: "A Relational
G~del,
Data Base Management
Julius Springer
(Berlin,
38
384 Kleene,
S. C.:
"Introduction (Princeton,
Quine, W. V. 0.:
SPARC:
"Outline for Preparation
"Interim Report: Systems:,
Steel,
T. B.; Jr.:
A.:
for Standardization", DC 1974).
Study Committee on Data Base Management
"Beginnings
(forthcoming).
of a Theory of Information CACM,
7:2
(1964), pp. 97-103.
Begriffe der Methodogie
Wissenschaften
und
S. K.: "Reduction of tense logic to modal logic,
I",
37
I", Monatshefte
der deduktiven
f~r Mathematik
Physikt Thomason#
Harvard University
MA 1961).
(Washington,
SIGMOD NEWSLETTER
"Fundmentale
rev.ed.,
of Proposals
CBEMA
Handling", Tarski,
Logic",
(Cambridge,
document SPARC/90,
van Nostrand
N. J. 1952).
"Mathematical Press
SPARC:
to Metamathematics",
(1930), pp. 361-404.
J. Symbolic Logic, Von Wright,
39:3
(1974), pp. 549-551.
G. H. : "An Essay in Modal Logic", (Amsterdam
1951) .
North-Holland
385
'
Enterprise Administral
®
® Conceptual Schema Processor
Data Base ~dministratol
iptmswm~
"\.dm~,strator/!
® External Schema Processor
Internal Schema _~~Processor
@ ,0 ~ ,® I I 'n"'na'~I_
"~ ..... -i-'~!
Sto,age r"! L,,.n,.°,~°, /
! !
I ~ ! I I
Internal (System) Program Subsystem
I I
Conceptual/ Internal Transformer
i I-,conc ' I !
I I
I
! External II 1(Application) ~ I Program I
I I I I I
~ Subsystem II I I
I I
I I
<~
pplication~', rogramme,/,'
p System
rogramme/
Figure I
,,,//
386
• Reality"
Reality ~--=~"'~ Scientific abstraction Model ~ E n g i n e e r i n g abstractions -~= Conceptual Realm Mental Model imited Models
Perceived
Scientific progress
/
/
i
/"'/'
/
/
/'/
/
Symbolic abstraction -v-$
/'
/
" " ~
~
I
Conceptual Realm
I Conceptual I SymbolicModel Schema
Internal Schema
External ] Schema(s)
1
Figure 2