This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
4 ) are naturally related. 101, IT — — —
V
A Series-Parallel Poset (SPP) over an alphabet £ is defined inductively: • The empty poset, J, is a SPP • For each o e E, the singleton labeled a is a SPP • If P and Q are SPPs, so are P*Q and P®Q The set of all series parallel posets formed from 1 and the singletons and closed under concatenation (•) and shuffle (<8>) forms a bimonoid denoted SP(£*) [15]. For our purposes, the alphabet, E, will consist of all distinct events occurring during a run of the system under consideration. Let each event, e,-, occurring in a system be represented by a singleton poset, c,-. Then, the fact that event e,- precedes event ej is represented by «(•«,-. On the other hand, the independence of events e,- and e,- is represented by e,®e,-. This extends naturally to sets of events. What does it mean for two sets of events two be dependent or independent? Two sets of events, P and Q, are independent if no event in P triggers a chain of events leading to the occurrence of an event in Q and v.v. In other words P and Q are independent if the set of events, which are predecessors of P does not involve any event from Q and v.v. A set of events P always precedes a set of events Q if all events in P occur before any event in Q does, i.e. when each event in Q has all events in P as predecessors. A set of events P partially precedes a set of events Q if P sometimes occurs before Q. This so when each event in Q has at least one predecessor from P, or when P and Q are independent. Let us illustrate the definitions with an example. Consider the following seriesparallel poset: B=
((e,®e2)»e3)®e4 Figure 2 A Simple Series Parallel Poset Example
Consider now the sets of events Pi = [et, e4), and P2 = {e3}. Clearly, Pi and P2 are not independent since ei must occur before e$. Pi does not always precede P2 since a possible event sequence is eh e2, e3, and then e4. But Pi may sometimes precede P2 since another possible event sequence is,, for example, e^ e2, e4, e$.
172
Interpreting series-parallel posets as descriptions of the dependence or independence of sets of events allows us to model system behavior in terms of the sequences of events occurring during its operation. In [8] we presented a methodology for modeling the behavior and properties of non-iterated systems with series-parallel posets. A non-iterated system is one, in which the events are distinct and not repeated.3 . a 2 ,t>2
p t>3 FA t
I c4
c3
I3'
c2 FA 2
|b
i
fk> . b 0
Cl
FAi
I
S3
S2
Si
FA0 -
I
S0
Figure 3 A Non-Iterated System: 4-bit Binary Adder We presented an algorithm, which can be used to verify that a particular property is always satisfied or sometimes satisfied within a given behavior. In [9], the methodology was further expanded to deal with globally iterated/locally noniterated systems. These systems consist of non-iterated sub-systems operating in series or in parallel, such that the global output is fed back for another iteration.
T3^ -3^
Figure 4 A Simple Globally-Iterated/Locally Non-Iterated System The verification algorithms have a low-order polynomial time- and space complexity, which is further improved by the introduction of the behavior reduction methodology in [10]. An iterated system is one in which some or all events are repeated. It consists of a number of components, which function in series or independently so that each component is either an iterated- or a non-iterated system. A wide variety of systems can be considered iterated: • • • 3
Communication-, Interconnect, or Cache Protocols Asynchronous Sequential Circuits Feedback Control Systems, etc.
Not all non-iterated systems can be expressed with series-parallel posets. See the section on Contributions and Limitations.
173 Concrete examples of iterated systems to which we have applied our methodology are the Peripheral Component Interconnect (PCI) bus protocol, used in all Pentum® based PCs, and the Modified/Exclusive/Shared/ Invalid (MESI) cache coherence protocol used to synchronize the operation of cache controllers in shared-memory MIMD systems, and maintain data consistency between the level-1 and level-2 caches of the Pentium® microprocessor [12, 13]. Here is a simple example of an iterated system at the logic gate level:
0>^§}^fc> Figure 5 A Simple Iterated System (gate level)
The notion of a series-parallel poset is not sufficient to describe the behavior of a system with iteration. We, therefore, need to introduce a new structure - the star shuffle semiring S = (S, +, •, ®, *, 0,1) of series-parallel posets, defined as follows: • • • • • • •
S - the set of finite subsets of SP(Z*), closed under the semiring operations If Ke S and L&S,K+L = {PI PeKvPeL}eS If KeS and LeS, K»L= [P»Q\ PsK A Q e L J e S If KeS and LeS,K®L={P®Q\ PeK A QeL}sS If KeS, then*:* = / + K + K»K +...= Zi=0..„ K ' e S , where K ' = K»K • ...•K, i times. 0 is the empty set of posets / i s the empty poset
We define the behavior, B, of an iterated system to be an element of the star-shuffle semiring S, i.e. BeS. Thus, the behavior of an iterated system is a set of seriesparallel posets. For example, if we denote the event "gate i produces a valid output" by e;, then the behavior of the system in Figure 5 is given by the following expression B = ((ej • (e2 <8> e3) • e4)* • es)*. We can represent the verification properties as sets of series-parallel posets as well, i.e. PeS. Unlike behaviors, however, properties are usually be defined over a subset of the alphabet S, since we are most often interested in the mutual dependence or independence of a relatively small subset of system events. For example the property "Gates 2 and 3 produce valid outputs independent of each other, but gate 4 depends on both gates 2 and 3" is given by the expression P = fe ® ej) • e4.
174 The verification questions are specified as predicates over sets of series-parallel posets. These predicates are: • SS(B, P) is a binary predicate, interpreted as "The property P is sometimes satisfied within the behavior, B". The predicate takes a behavior and a property and verifies that P can sometimes be traced within the behavior, B. • AS(B, P) is a binary predicate, interpreted as "The property P is always satisfied within the behavior, B". The predicate takes a behavior and a property and verifies that P can always be traced within the behavior, B, of the system. There are four normal forms of behavior and property expressions: • Concatenation: B = B1»B2»...»B„ & P= Pj»P2:..»Pm • Shuffle: B = B,®B2®...®Bn & P = Pj®P2®...®Pm • Plus: B=Bi+B2+...+B„ & P=Pi+P2 + ...+ Pm • Star: B=B,* & P = P,* To simplify the reasoning about sets of events and the complexity of the verification algorithms we introduce the notion of a reduction of the system behavior. It is prompted by the fact that, while the system behavior may involve hundreds of thousands of events, in most cases the verification property involves only a few events. The reduction is carried out by a recursively defined projection function Pr(B, set(P)), which takes a behavior, B, and a set of events, and returns a reduced behavior, B', with respect to the events in the specified set. The effect of the projection function is to substitute 1 in place of all events not in set(P) without modifying the ordering of events in the behavior. Based on a number of theorems, corollaries, and lemmas, which examine the satisfaction of all forms of properties with respect to all forms of behaviors, we derive the formal definition of the two verification predicates SS(B, P) and AS(B, P) for iterated systems. In this outline, we present only AS(B, P): AS(B, P) iff • P =e A
B=e
•
P=P,*AB=B,®B2®.
•
P =
•
P =
•
P = Pj®P2® ...®PmAB = B] ® B 2 ® . . . ® B „ A F g S P ( X * ) A V i e [ m ] AS(B, Pi)) A Vie[m-1] Independent*^,, PM)) A Vie[m] (Pi=(Pa)* -> Veef»4 L(set(lisc(B, e))) c L(set(Pi))) P = Pi»P2» ... 'Pm A (B = B!»B2» . . . » B „ v B = B,*) A PgSP(2:*) A Vie[m] AS(B, Pt) A Vie [m] (Pi=(P,/)* -» V e e / ^ L(set(lisc(B, e))) c L^setifi))) A Vie [m-1] (\/eePi+1L(predB({e})) n L(set(Pt) = L(set(Pi))) B = B, + B2 +...+ Bn A Vie [n] AS(B,-, P) P = Pj + P2 +...+ Pn A 3 ie [n] AS(B, P,)
•
• •
. .®Bn A A S ( « , i>,)AVie [n]B,= Ba*
P,*AB=B1*AAS(B,,PI) P1®P2®...®P,„AB=B1*AAS(B1,P)
175
In the above definitions we made use of number of functions - the labeling functions /(s) and Z^{s1>...,s„}), the predecessor function, pred(P), and the functions set(P), "Non-Iterated", NI(B), and "Least Iterated Sub-Component", lisc(B, e). The exact definition of these functions is presented in [11] and omitted here for lack of space. We also used the auxiliary predicate Independents^, Q). The predicates serve as a basis of a verification algorithm. The analysis of its requirements shows that the worst-case time complexity is 0(n+m3), and the average case time complexity is 0(n+m2), where n is the number of events in the behavior (before the reduction), and m is the number of property events. The space complexity is O(m). 6
Contributions, Limitations, and Conclusions
In this paper, we presented the modeling and formal verification of the DLX processor control based on the recendy developed series-parallel poset methodology. The technique is less expressive than some other formal verification methods, but has a low complexity. Thus, we can model complex real-world systems and protocols. Current work is on the verification of the InMOS Transputer microcode, and modeling the behavior of dataflow computers. The issues of event sequencing and timing has been studied for a long time by many researchers - D. Dill, B. Moszkowski, Z. Manna, etc. In many respects, our approach is close to the study of language containment of behavior and property automata [2]. However, we approach the topic from a different point of view, avoiding the issue of exhaustive substring matching. Moreover, the use of the shuffle operator (<8>), significantly simplifies and speeds up the verification task by avoiding the study of all possible independent event interleavings. Closest to our work is that of V.Pratt [4]. However, the main stress in [4] is on modeling system behavior with the help of an extensive collection of operations. Our technique uses a far smaller collection of operations (•, <8>, *), but models not only system behaviors but properties as well. The emphasis is on verification, and the reduced collection of operations simplifies analysis, and improves the algorithms' efficiency. One important shortcoming of our technique is the inability to model "N"-type dependencies among the events occurring in a system. These are encountered quite often in real systems and significantly limit the general applicability of our algorithm. Consider the simple example below:
Figure 6 "N"-type event dependence in a simple system
176 If e,- represent the event "gate i produces a valid output", then the event dependency diagram has the "N"-shape described on the right. This type of dependency cannot be modeled only with operations •, ®, and *. Current work is aimed at extending our verification methodology to deal with "N"-type event dependencies as well. References 1. 2. 3. 4. 5.
6. 7. 8.
9.
10. 11. 12. 13. 14. 15.
K. McMillan, "Symbolic Model Checking", Kluwer Academic Publishing, 1993 R.Kurshan, "Computer Aided Verification of Coordinating Processes: The Automata-Theoretic Approach", Princeton Series in CS, Princeton, 1994 M.Nielsen, G.Plotkin, and G.Winskel, "Petri nets, event structures, and domains", TCS, 1981 V.Pratt, "Modeling Concurrency with Partial Orders", Int. Journal of Parallel Prog., 1986 P. Godefroid, "Partial Order Methods for the Verification of Concurrent Systems: an Approach to the State Explosion Problem", Doctoral Dissertation, University of Liege, 1995 R. Nalumasu, G. Gopalakrishnan, "A New Partial Order Reduction Algorithm for Concurrent System Verification", Proceedings of IF1P, 1996 D. Peled, "Combining Partial Order Reductions with On-the-Fly Model Checking", Journal of Formal Methods in Systems Design, 8 (1), 1996 L.Ivanov, R.Nunna, S.Bloom, "Modeling and Analysis of Non-Iterated Systems: An Approach Based on Series-Parallel Posets", Proceedings of ISCAS'99, 1999 L.Ivanov, R.Nunna, "Formal Verification with Series-Parallel Posets of Globally-Iterated Locally-Non-Iterated Systems", Proceedings of MWSCAS'99, 1999 L.Ivanov, R.Nunna, "Formal Verification: A New Partial Order Approach", Proc. of ASIC/SOC'99, 1999 L. Ivanov, R. Nunna, "Modeling and Verification of Iterated Systems and Protocols", Proc. of MWSCAS'01, 2001 L. Ivanov, R. Nunna, "Modeling and Verification of an Interconnect Bus Protocol", Proc. of MWSCAS'00, 2000 L. Ivanov, R. Nunna, "Modeling and Verification of Cache Coherence Protocols", Proc. of ISCAS'01, Sydney, 2001 J.Hennessy, D.Patterson, "Computer Architecture: A Quantitative Approach", Morgan Kaufmann Publ. Inc., 1990 Bloom, Z. Esik, "Free Shuffle Algebras in Language Varieties", Theoretical Computer Science 163 (1996) 55-98, Elsevier
177
D Y N A M I C BLOCK DATA D I S T R I B U T I O N FOR PARALLEL SPARSE G A U S S I A N ELIMINATION * E. M. DAOUDI+, P. MANNEBACK* AND M. ZBAKH+ Lab. Research in Computer Science, Faculty of Sciences, University of Mohamed First, 60 000 Oujda, Morocco, E-mail:{mdaoudi, zbakh} @ sciences.univ-oujda.ac.ma "Computer Science Lab., Polytechnic Faculty, 7000 Mons Belgium E-mail: Pierre.Manneback@fpms. ac. be
+
This article is devoted to describe a new dynamic block data distribution algorithm over a grid of processors for sparse Gaussian elimination in order to improve the load balance compared to the classical static block-cyclic distribution. In order to assure a numerical stability and to separate the ordering and the symbolic factorizations, Demmel and al. [2,3] presented a new method for sparse Gaussian Elimination with Static Pivoting called GESP where the data structure and the communication graph are known before the numerical factorization. In this work, we assume that the ordering and the symbolic factorizations are already performed and we are interesting by the numerical factorization of the final structure of the matrix to be computed. The experimental results show the advantages of our new approach.
1
Introduction
The efficiency of parallel algorithms on distributed memory machines depends on how to distribute the data over processors. The distribution should be well chosen to minimize the execution time (balance the load work of processors and/or minimize the communication cost). The problems dealing with dense matrices are classically solved by block-cyclic distribution [1]. This distribution is also used for the sparse case in many tools and packages for linear systems solution such SuperLU (Supernodal LU [4]). However for sparse Gaussian elimination, this distribution can lead to a bad load balance and can increase the execution cost. In this paper we propose a new approach of data distribution that balance the workload of processors and/or minimize the total execution time of the sparse Gaussian elimination. The main idea is to redistribute the data before each step of factorization. It is important that the performances do not be degraded by the communication overhead arising from the migration of data. The test matrices are of size nxn and the target •THIS WORK IS SUPPORTED BY THE EUROPEAN PROGRAM INCO-DC, DAPPI" PROJECT
178
distributed memory machines have &px q processor grid topology. The outline of this paper is as follows: in sec. 2, we present one way to partition the data by blocks for sparse matrices, in sec. 3, we present some motivating examples showing the inefficiency of the block-cyclic distribution for sparse Gaussian elimination, sec. 4 is devoted to describe our new distribution approach, the experiment results are given in sec.5. 2
Block data structure
The block partitioning method of the matrices is based on the notion of unsymmetric supernode approach [4]. Let L be the lower triangular matrix in the LU factorization. A supernode is a range of columns of L with the triangular block just below the diagonal being full and with the same row structure below this block. This supernode partition is used in both row and column dimensions. If there are N supernodes in an n x n matrix A, the matrix will be partitioned into iV2 blocks of nonuniform size [3]. The size of each block is matrix dependent. The largest block size is equals to the number of columns of the largest supernode. For large matrices, this can be a few thousand, especially towards the end of matrix L [3]. Such a large granularity would lead to very poor parallelism and load balance. Therefore, when this occurs, the large supernode will be broken into smaller chunks, so that the size of each chunk does not exceed a threshold, representing the maximum block size [3]. For the present study, we assume that the blocks have the same size r, where n = N xr. In the sparse Gaussian elimination with dynamic pivoting, the computational graph does not unfold until run-time, in other words, the symbolic and numerical algorithms become inseparable. Demmel and Li [3] presented a new method for sparse Gaussian Elimination with Static Pivoting called GESP where the data structure and the communication graph are known before the numerical factorization. The basic numerical factorization algorithm in GESP is given by algorithm 1 [3]. for k := 1 to N do 1. 2. 3. 4.
Compute the block diagonal factors L(k,k) and U(k, k); Compute the block column factors L(k + 1 : N,k); Compute the block row factors U(k, k + 1 : N); Update the sub-matrix A(k + 1 : N, k + 1 : N) : for j: := k + 1 to N do for i := k + 1 to N do if(L(i,k)^OkU(k,j)^0) A(i,j) = A(i,j)-L(i,k)U(k,j); Alg.l : Sparse right-looking LU factorization
179
In this work, we suppose that the ordering and the symbolic factorizations are already performed. We are interested by the numerical factorization of the final structure of the matrix to be computed A. 3
Block-cyclic distribution
By block-cyclic distribution we mean that block A(ij) (0 < i,j < N) is mapped onto the processor at coordinates (i mod p, j mod q) of the processor grid. This distribution on a grid of processors is not efficient in term of load balancing and communication costs. We present bellow some motivating examples. Example 1: load imbalancing In Figure 1(a) we illustrate the matrix to be distributed cyclicly (Figure 1(b)) on a grid of 2 x 2 processors (Figure
(a)
(b) 0 2
(c)
1 3
(d) Figure 1. (a): block data structure for a matrix, (b): block-cyclic distribution on a grid of 2 x 2 processor grid, (c): blocks to be updated in step 1 of elimination
As shown on Figure 1(c), the step 1 of elimination is executed by one processor (processor 0). Similarly for all other steps. So, each step of elimination is executed sequentially, this shows a bad load balance. Example 2: bad communication management: Figure 2(b) shows that two consecutive not nil (shaded) blocks in the same row/column are not mapped on two neighbor processors of the grid during the first step. This can be improved to decrease the communication cost by replacing, in row 1, processor 2 by processor 1, 4 by 2 and 6 by 3.
180
(a)
(b)
Figure 2. (a): data structure and block-cyclic distribution for one matrix on a 7 x 7 processor grid, (b): the blocks to be updated in step 1 of elimination
4
Description of the new distribution approach
The proposed algorithm consists in redistributing efficiently the data over the processor grid at each step of elimination. The idea is to accumulate the blocks to be updated at each step of elimination in a dense matrix and redistribute this matrix by the block-cyclic approach. The remaining of this section is devoted to describe in details the different steps of the algorithm. For each step k, 1 < k < N, first of all we determine the sub-matrix Mk formed by the blocks Aij that will be updated in step k. Then we determine how the sub-matrix Mk can be efficiently distributed on the grid. Determination of Mk: A block A^ of the matrix A is an element of Mk if the blocks Aik and Akj are both not nils (that is to say Aik ^ 0 and Akj ^ 0). The size of Mk is determined by the number of blocks not nils in row and column k of the initial matrix. Mk is determined by the algorithm 2, where the couple (is, js) indicates the new coordinates of Aij in sub-matrix Mk. 3s ~ 0; for j := k t o N do{
if(Atii^0){ is '•= 0;
for i := k to N do{ if (Ai,k ? 0) { %s .— Xs ~r I ,
I }
181 js
••= js +
l;
} } Alg.2 : Determination of the sub-matrix Mk Distribution of Mk: • if any element of Mk was not an element of any precedent sub-matrix Mk , k < k, then we distribute Mk cyclicly on the grid; • Otherwise, we analyze the latest affectation of each block in Mk and we determine one sub-matrix of the most size, called Mk, which is already distributed cyclicly. We keep the distribution of M§ and we complete the block-cyclic distribution for the rest of Mk. This choice minimize the redistribution cost. The outline of the new approach of distribution is given in algorithm 3. for k := 1 to N d o if Mk is not already distributed distribute cyclicly Mk; else determine Mk; we keep the distribution of Mk and we complete the cyclic distribution of Mk; Alg.3 : The outline of the new distribution algorithm To illustrate the steps of the new algorithm, we consider the matrix M of Figure 3(a) and a grid of 5 x 5 processors (Figure 3(b)). We determine the sub-matrix M1 to be updated in step 1 (Figure 4(a)) and we distribute it by block-cyclic distribution (Figure 4(b)). The Figure 5(a) illustrate the submatrix M2 to be updated in step 2. M$ is formed by the blocks mapped to the processor of the sub-grid formed by processors 6, 7, 8, 11, 12, 13, 16, 17 and 18. We keep the distribution of M$ and we complete the block-cyclic distribution of M 2 (Figure 5(b)). The blocks that were already affected are mentioned by the symbol !. We proceed at the same way in the remaining steps. 5
Numerical results
The implementations are done, in LaRIA laboratory of Amiens (France), under MPI environment [5] for communication and ScaLAPACK subroutines [1] for computation. The target machines are:
182
o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2 0 21 2 2 23 2 4
m
SxS processor grid
Figure 3. Matrix to be distributed by new approach over a 5 x 5 processor grid
m
m
9.
Figure 4. The sub-matrix
M1 (blocks mapped to processors)and
its
distribution
Figure 5. The sub-matrix
M2 (blocks mapped to processorsjand
its
distribution
• Cluster of 19 Celerons Intel Pentium 400 MHz and 18 Celerons Intel Pentium 466 MHz connected by a Fast Ethernet network of 100 Mb/s. • Cluster of eight alphas processors of 533 MHz connected by the Myrinet cards of 1 Go/s. Table 1 presents the computation times for the first five elimination steps of sparse Gaussian elimination on the cluster of 2 x 2 alphas. The test matrix is of size 1600 x 1600 and has the same structure as the matrix of example 1. It
183
is structured by blocks of size 80 x 80. T\ and T2 represent the computation time at each step (they are measured at the end of each step) for each processor for the block-cyclic distribution and for the new approach of distribution respectively. Steps
->
Proc I 0 1 2 3
2
1 Ti 6.26 -
T2 1.61 1.57 1.66 1.69
Ti .073
4
3
r2 .1 -
Tx 5.88 -
T2 1.61 1.26 1.33 1.08
Ti .07
5 T2 .1 -
Ti 4.05 -
T2 1.03 1 1.05 1.08
Table 1. Computation time in seconds on the cluster of alphas (the symbol — means that the corresponding processor is idle)
We remark that, for the block-cyclic distribution, the steps 1, 3 and 5 are sequentially executed, however, with the new distribution, the treatment of these steps is distributed between all processors. This shows a well load balance. For steps 2 and 4, there is only one block to be treated by one processor for the two approaches of distribution (Ti and T2 are roughly equals). In table 2, we present the execution times for sparse Gaussian elimination on the cluster of 4 x 4 Celerons. The test matrix is of size 320 x 320 and has the same structure as the matrix of example 2. It is structured by blocks of size 80 x 80. Ti and T2 represent the execution time at each step for the blockcyclic distribution and for the new approach of distributions respectively. This shows that the execution time is reduced by using the new distribution. Etapes -»•
Ti T2
1 .45 .37
2 .31 .24
3 .09 .02
4 .09 .02
Table 2. Execution time in seconds on the cluster of Celerons
6
Conclusion and perspectives
In this work, we are interested by the numerical factorization of sparse Gaussian elimination. We assume that the ordering and the symbolic factorizations
184
are already performed. We have proposed a new dynamic distribution based on block-cyclic approach. At each elimination step, only the dense blocks concerned by the corresponding elimination step are distributed. This new approach allows to balance the workload compared to the block-cyclic approach. The experimental results show that the workload of processors is well balanced and the execution time can be improved compared to the block-cyclic distribution. Our objective in the future work is to generalize the algorithm for non-uniform blocks and to extend the experimentation for other type of matrices. References 1. Blashford L. S., Choi J., Cleary A., D'Avezado E., Demmel J., Dhillon I. and Dongarra J., ScaLAPACK Users' Guide. (Second Edition SIAM, 1997). 2. Li X. and Demmel J., A Scalable Sparse Direct Solver Using Static Pivoting. In Proceedings of the 9th SIAM Conference on Parallel Processing and Scientific Computing, San Antonio, Texas, 1999. 3. Li X. and Demmel J., Making Sparse Gaussian Elimination Scalable by Static Pivoting. In Proceedings of SC98 Conference, Orlando, Florid, 1998. 4. Li X. , Demmel J. and Gilbert J. R., An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination, SIAM J. Matrix Anal. Appl., 20 (1999) pp. 915-952. 5. Pacheco P. S., Parallel Programing with MPI.(Morgan Kaufmann Publishers, Inc. San Fransisco, California 1997).
185 A L L PAIRS S H O R T E S T P A T H S C O M P U T A T I O N USING J A V A O N PCS CONNECTED ON LOCAL AREA NETWORK
JOHN JINGFU JENQ Computer Science Department, Montclair State University, Upper Montclair, NJ 07043 E-mail: [email protected]
WINGNJNG LI Department of Computer Science and Computer Engineering, Fayetteville, AR 72701 E-mail: [email protected]
University of Arkansas,
The computation of shortest path using Java programming language on PCs that are connected on local area network has been developed. Comparisons with different number of processors, and with different number of nodes of the problem graph have been performed experimentally. Speed up factor and efficiency analyses have been conducted experimentally as well.
1
Introduction
Shortest path computation is a very important task required for efficient routing in transportation and communication networks. The problem shows up in applications such as robot path planning, circuit design, missile defense, and highway transportation design. How to compute shortest paths has been studied intensively for both sequential and parallel computing models. An annotated bibliography and taxonomy of related problems using uniprocessor is given in [5]. Experimental evaluation of the efficiency of different shortest paths algorithms is reported in [3]. The shortest paths problem is commonly represented as a graph with vertices and edges. For a dense graph, a matrix representation is usually used. For a sparse graph, a linked list (and its variations) representation is more popular. For the all pairs shortest paths problem, one is required to compute the shortest path from node i to node j , for i * j , and 0 < i, j < N; where N is number of vertices in the graph under consideration and vertices are numbered from 0 to N-l. Many studies using parallel processing approaches to efficiently solve the shortest paths problem are reported in the literature. Examples include using VLSI systolic arrays [6], using distributed processing [4][11], using CREW PRAM model [2], using CRCW PRAM model [7][8], and using EREW PRAM model [8]. Sometimes it is desirable to find rectilinear shortest paths for robotic applications. Algorithms dealing with rectilinear shortest paths are reported in [2] [12]. Results concerning three-dimensional shortest paths that also find application in robotics are given in [1]. For results of parallel program development for commercial available
186
machines, the reader is referred to the paper of Jenq and Sahni where programs that run on NCUBE mini super computers with 64 nodes are developed and experiment is conducted [10]. The problem of finding a shortest path between two nodes that also has the smallest number of edges is investigated by Zwick [13]. In this paper, we develop programs, using Java programming language, that solve the all pairs shortest paths problem on PCs that are connected to a local area network and conduct experiment to study the effectiveness of the approach. The experimental results are very encouraging and demonstrate the potential of the approach. Java is a popular high level programming language and has gained its popularity in teaching client/server computation for its ease of usage and the shorten software development time. Our experience also confirms that and further suggests using Java as a tool to conduct research in parallel and distributed processing. The paper is organized as follows. Section 2 describes the fundamental sequential algorithm to be paralleled and the basic Java classes used in the program development. Section 3 presents the parallel program. Section 4 analyzes and discusses the experimental results, including comparisons using speed up factor and efficiency measure. Section 5 concludes the report. 2
Preliminary
The famous fast algorithm to solve the all pair shortest path problem for dense graphs is the Floyd's algorithm. Our development of the parallel programs is based on this algorithm. Let N be the number of nodes in the graph. The distance between node / and nodey is denoted as Dist(i.j). Initially, Dist(i,j) is 0 if i -j; is infinity if/ and j are not adjacent; and is the weight of edge (i,j) if/ and j are adjacent. At the end of the computation, Dist(i.j) gives the true distance, or the length of the shortest path, between node i and node j . To use Floyd's algorithm, we shall assume that there is no negative-weight cycles in the graph, though negative-weight edges may be present. The following is Floyd's algorithm consisting of three nested for loops. for (k=0; k < N; k++) for (i=0; i < N; /++) for (j=0;j < N;j++) if(Dist(i,j)) > (Dist(i,k) + Dist(kj)) Dist(i,j) - Dist(i.k) + Distfkj); On each iteration of k, a new matrix will be generated. Let Dist0 be the initial matrix, then it can be verified [9] that the result will not be changed if we replace the if statement by if(Dist(i,j) > (Distk~l (i,k) + Dist1"1 (k,j)) Dist(ij) = Dist*'1 (I,k) + Dist ' (k,j). This enables us to develop parallel algorithms that distribute the
187
computation of two inner most for loops in network computers and only exchange information (by communication) in the outeryor loop. In the current Java net package library there are streaming based socket and data gram socket. We choose data gram socket in our application due to its small communication overhead compared with streaming based sockets. In addition to that, we also take advantage of the multicast facility in the java.net library that further reduces the communication time a lot. 3
Parallel all pair shortest path algorithm on PC area network
We run the program on PC a lab. with Dell computers, using Window 98 operating system, that are connected to a local area network. The language used to develop the program is Java that was developed by SUN. With the java.net library, it is easier for us to construct the program by concentrate more on application problems. The whole application in fact contains two subprograms. One is the controller program and the other a worker. The controller controls the coordination among the workers. It gives control signal through broadcasting. It also receives signals from the workers. The controller issues start signal at the beginning of each iteration, for the computation of the matrix. It then waits for the workers to send in completion signals. When all the signals are received, it starts another iteration of computation by giving another start signal. The total number of broadcasting of the signals to start the computation is therefore 0(N). The signal is just one byte signal that was sent using data gram. Rather than use the streaming based sockets, we used both ordinary and multicast data gram sockets. It is relatively cheaper to use data gram sockets than the streaming sockets in our application. As for the worker program, each worker get a portion of the matrix based on its ID. Note the ID was assigned by the controller when program begins. In addition to the ID number received from the controller, the number of nodes of the graph also received from the controller. The information about the number of PCs, that will participate in the computation, also received from the controller at the beginning execution of the program through broadcasting. The pivot row, required by each processor in the computation, is sent at the beginning of each iteration by the appropriate PC. Broadcasting is used in this operation to reduce the overhead. As soon as the row of data has received, the computation starts on the data partition for each PC. At the end of the computation in each iteration, the PC sent a completion signal back to the controller. It then wait for the controller to start another iteration. Upon receiving the signal, each PC computes and check whether it is its term to broadcast the data to all others.
188 4
Experiment results and analysis
The experiment results are shown in Table 1. The numbers are in milliseconds. A run time of 0 simply means that the run time is less than 1 millisecond, as in the one PE case. It can be seen from the table that, as expected, the run time grows in 0(N
) for the one PE case when N increases. Table 1 Running time using different number PEs 1PE 0 330
64 Nodes 192 Nodes
2 PEs
4 PEs 6 PEs 8 PES 10 PEs 12 PEs
170
110
120
660
610
610
160 660
180 710
170 770
2530
2140
2080
2140
2200
13070 10820
9060
8700
7910
148740 82560 65690 49370
41630
38170
384 Nodes
3740
3620
768 Nodes
28830
21370
1536 Nodes
228270
3072 Nodes 1852580 1183920 587870 411170 324170 267600 232780
As usual, we measure the effectiveness of the parallel programs experimentally using speed up and efficiency. The speed up factor is defined as: running ~ time ~ of ~ sequential ~ a lg orithm . running ~ time ~ of ~ parallel ~ a lg orithm The efficiency E is defined as the speed up factor divided by the number of processors, p, involved in the computation. sp E= — P In the experiment, edge-weight matrices are generated randomly using the Random class of Java that generates pseudo random numbers. For simplicity, only integers are used for the edge-weight. We believe that changing integer type to float or double type will not invalidate the experimental results, though it would be interesting to see how those changes may affect the actual run time. The measurement of run time reported in Table 1 does not include the time used to generate the edge-weight matrix. When the number of nodes of the graph is small, the parallel program does not achieve any speed up at all. Only when a certain threshold of the number of nodes is exceeded, then the speed up becomes significant. The speed up versus the number of PEs used is plotted in Figure 4.1. Note that when the graph is large, the more PEs we have the faster the computational process. For example, considering the case of 3072 node graph, one PC takes about 30 minutes to finish, while 12 PCs only take 4 minutes. sp =
189
Speed up 10 8 6 4 2 0
H 64 Nodes B192 Nodes
2 PEs
4 PEs
6 PEs
8 PEs
10 PEs
12 PEs
• • • •
384 Nodes 768 Nodes 1536 Nodes 3072 Nodes
Figure 4.1. Speedup vs. number of PEs Figure 4.2 depicts the efficiency E plot. Note that higher efficiency is always achieved by a fewer number of PEs than by a larger number of PEs. Nonetheless, when the graph size increases, independent of the number of PEs involved, the efficiency always increases. 5
Conclusions and remarks
Parallel programs, written in Java and run on a set of inexpensive PCs connected to a local area network in a typical student lab, to solve the all pairs shortest paths problem have been developed. A speed up factor of 8 can be achieved when 12 PCs are used to run a graph of around 3000 nodes. Greater speed up would be expected when the number of nodes in the graph increases or when the number PCs increases. For graphs with small size, one PC alone outperforms the multiple network approach due to communication overhead among participant PCs. As the graph size increases, the computation time out weights the communication time and the proposed approach becomes effective. One remark about the great advantage of using Java to write network application is the shorten development time due to the rich classes in the java.net library that comes with the Java develop kit. In our experiment, we take advantage of the multicast facility of Java that results in further reduction of the control coordination overhead and therefore improves the overall performance.
Figure 4.2. Efficiency vs. number of PEs 6
References
1. Agarwal, P. K., Har-Peled, S., Sharir, M., and Varadarajan, K., Approximating shortest paths on a convex polytope in three dimensions, Journal ACM, vol. 44, no. 4, pp 567-584, (1997). 2. Atallah, M., and Chen, D., Parallel rectangular shortest paths with rectangular obstacles, Proceedings of the Second! Annual ACM Symposium on Parallel Algorithms and Architectures, pp 270-279, (1990) 3. Cherkassky, B. V., Goldberg, A., and Radzik, T., Shortest paths algorithms: theory and experimental evaluation, Proceedings of the Fifth Annual ACMSIAM Symposium on Discrete Algorithms, pp 516-525, (1994). 4. Chen, C , A distributed algorithm for shortest paths, IEEE Transactions on Computers, vol. c-31, pp 898-899, (1982). 5. Deo, N and Pang, C. Shortest path algorithms: Taxonomy and Annotation, Networks, pp 275-323., (1984). 6. Dey, S., and Srimani, P. K., Parallel VLSI computation of all shortest paths in a graph, Proceedings of the ACM Sixteenth annual Symposium on Computer Science, pp 373-379., (1988). 7. Frieze, A., and Rudolph, L., A parallel algorithm for all pairs shortest in a random graph, Proc. 22nd Annual Allerton Conf on Communication, Control and Computing, pp 663-670., (1984). 8. Han, Y., Pan, V., and Reif, J., Efficient parallel algorithms for computing all pair shortest paths in directed graphs, Fourth Annual ACM Symposium on Parallel Algorithms and Architectures, 1992, pp353-362, (1982). 9. Horowitz, E., and Sahni, S., Fundamentals of Computer Algorithms, (Computer Science Press, 1978)
191
10. Jenq, J., and Sahni, S., All pairs shortest paths on a hypercube multiprocessors, Proceedings of the 1987 International Conference on Parallel Processing, pp. 713-716., (1987). 11. Lakhani, G., An improved distribution algorithm for shortest path problem, IEEE Transactions on Computers, vol. c-33, pp. 855-857, (1984). 12. Lee, D. T., Chen, T. H., and Yang, C. D., Shortest rectangular paths among weighted obstacles, Proceedings of the Sixth Symposium on Computational Geometry, pp. 301-310, (1990). 13. Zwick, U., All pairs lightest shortest paths, Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, pp. 61-69, (1999)
Computing Education
195 ENHANCING STUDENT LEARNING IN E-CLASSROOMS JEROME ERIC LUCZAJ AND CHIA Y. HAN Department ofECECS, University of Cincinnati, Cincinnati OH 45221-0030, USA E-mail: [email protected], [email protected] E-classrooms provide a unique opportunity to use technology to implement prompt feedback and thereby augment the current classroom experience. Coordinating various instructional streams with student assessment and feedback will provide the means for instructors to know when and if their intended message was communicated to their students, permitting instructors and students to react quickly when there is a gap between intent and understanding. Further, developing a flexible instructional infrastructure will create a bridge between course objectives and course assessment, classroom instruction and student feedback, and seat-time and study-time. By developing this framework within an eclassroom, information can be gathered to measure student, instructor and organizational achievement and to assist in improvement.
1
Introduction
Many different factors influence whether a student has successfully attained learning objectives. Student performance is rarely homogeneous, resulting in the familiar bell-shaped distribution. Causes for poor student performance are hard to pinpoint. A comprehensive assessment strategy needs to be defined and implemented. It must demonstrate that the content has been delivered to the student and that the student has received it. Further, it must give prompt feedback to students, instructors and programs. The resulting impact on student learning should be "A Significant Difference" [1]. With the recent advent of powerful computer technology for delivering mediarich content in classroom settings, many new possibilities are now available for augmenting classroom instruction and learning. Previous related works at several academic research centers, such as Georgia Tech, U. of Massachusetts, and Cornell University, have contributed significantly to the use of electronic notes with audio and video recording. Project Classroom 2000 [2] produced several solutions, such as Zen*, a client server system that allowed each electronic whiteboard to be a client tied into a threaded central server; DUMMBO (Dynamic Ubiquitous Mobile Meeting Board) system uses a SmartBoard electronic whiteboard and a Webinterface access method that was time-line based; Stupad is a customizable tool for personalized notes and playback of all captured streams. MANIC (Multimedia Asynchronous Networked Individualized Courseware), developed at U. of Mass. [3, 4], is an asynchronous system that uses HTML slides and GIF images synchronized with audio via RealAudio to a Web browser. The browser uses the media plug-in to
196 start the presentation. MANIC uses CGI scripting to create the slides from the GIF images and slides. CGI then sends the slides to the Web server when requested by the user, and then the server sends this request to the user's browser. Project Zeno distance learning tools [5] include a full spectrum of automatic editing and playback. The Lecture Browser uses a Real-video plug-in, to allow playback of MPEG videos. Although many new methods are available, the different formats of material being delivered during lectures may overwhelm students, hindering learning, causing student disengagement in the classroom. Research that focuses on getting more out of the classroom experience shows that user-interaction in selecting data keeps the students interested in material. [6] These studies indicate that students' needs and viewpoints have to be taken into consideration when course material is given in technology-based classrooms. Further, since a typical classroom experience involves multiple, simultaneous activities or streams of information, it is important that technology offer a system that will synchronize these streams, providing context and valuable insight during assessment, instructor feedback, and student review. Frequent, periodic assessment providing the basis for prompt feedback to students, instructors, and programs will enhance student learning. 2
Major Assessment Issues
Educational assessment has multiple, distinct uses in instructional improvement including: school and student accountability for academic achievement, feedback for teachers to revise teaching and administrators to allocate resources, and stimulation for students to receive deeper understanding [7]. In most cases, the main criterion for assessing student performance is the degree of subject matter understanding. Student performance is evaluated at irregular intervals, through graded homework, quizzes, tests, projects, and final exams. Typically, students who were confused during lectures would not find out how far behind they were until it was too late to catch up. It is during the contact time in a classroom where the instructor can exert the greatest impact on student learning. In fact, a study from the University of Tennessee found that teacher effectiveness was the dominating factor affecting student academic gain [8]. Thus, it is important to assess learning in the classroom and let instructors take timely measures and make any necessary remedial changes. In terms of teaching evaluation, especially at the collegiate level, the burden of teacher and course evaluation falls upon the student using either in-class or Webbased survey forms. Typically, these evaluations are done just once, if at all. Since the survey is normally completed toward the end of the academic term, it does not impact students in the current class, though it may be helpful to future students in the same course with the same instructor. Also, since students may not take the survey seriously, the validity of the student response is questionable.
197 Typically, there is no correlation between student learning assessment and instructor teaching evaluation. Without timely feedback connecting student learning to instructor evaluation, neither the instructor nor the students have the power to affect change or to correct problems. Frequent feedback to the students advances student learning. According to Brien and Eastmond in Cognitive Science and Instruction, "During instructional activities, the competencies taught must be reinforced each time they are adequately used by the learner." [9] Further, they describe the "ideal" situation as one where the learner knows the final goal as well as the sub-goals that support the final goal. Therefore, it is important that learning outcomes be explicit and feedback frequent.
3
Approach: CaSA System Design
Networked computers or ubiquitous computing/PDA-based terminals are becoming widely available, so they should be used in classrooms to enhance learning through active interaction between students, the instructor and the material. To make use of the new technology, a new generation of instructional software is needed. A new framework, CaSA (Classroom and Student Achievement assessment), is presented. CaSA is a flexible framework to augment the classroom experience by coordinating and synchronizing instructional streams, matching class plans to student class experience, and presenting instruction in a variety of media forms to promote self-directed learning. The emphasis is on facilitating timely feedback from students, offering alternatives to students with differing learning styles, and collecting assessment data.
Real-Tim
e
Stream
Com
p o n e n t a Ho n C o m
S f f a a m B e e a l i in g Com ponent
Com
S tu d e n t « o le
p o S l u d a n t F a e D b a ck C
p o n e n t
C om P° rte it t
Figure 1. CaSA Component Diagram
The CaSA framework will consist of and coordinate three major components: a Preparation component, a Real-Time Stream component and a Review component
198 (see Figure 1). These components are organized by whether their functionality supports e-classrooms prior to, during or after classroom instruction. CaSA will support assessment by providing: student topic marking, periodic and frequent electronic concept questions both during and after class sessions, and instructional stream review and evaluation. 3.1
Pre-Classroom Preparation
Each lecture supports learning objectives from the course syllabus. Prior to the class, the instructor defines the lecture outline, covered topics, and the lecture notes (ClassPlan) which are associated with the text and course syllabus. The lecture consists of a ClassPlan presentation. As it is delivered, the various streams of presentation will be parsed into lecture segments, each of which is associated with a media type and descriptive labels, such as 'introduction,' 'concept,' 'equation,' 'illustrative example,' etc. derived from the ClassPlan. CaSA will also permit streams to be added or removed interactively. As instructor presentation formats evolve, CaSA will accommodate these changes to electronically capture the class session. CaSA will capture the data required to support creation of concept maps from the ClassPlans, the covered topics representing core concepts. As instructors and students connect additional material to these core topics, the data available for the concept map grows. There are several ongoing efforts in the area of visually representing concept maps including work by Puntambekar, Stylianou and Jin at the University of Connecticut [10] as well as work by Aroyo, Dicheva and Velev [11]. CaSA will collect the data required for concept maps with the intent that CaSA will facilitate the use of concept map representational systems within an e-classroom. 3.2
In-Classroom Interaction
During the e-classroom experience, CaSA will coordinate instructional streams, assessments, and feedback and will electronically increase classroom interactivity. CaSA will electronically record the instructional content in its various forms, synchronizing the multiple instructional streams, as well as student feedback and assessment. It will provide context and valuable insight during assessment, instructor feedback, and student review. The instructor's ClassPlan will be made available to students on their computing device at the beginning of each class session. This will accomplish two things. First, it will let the students know what topics will be covered, setting their expectations. Second, it will provide information needed for students to mark topics that are covered. As the class presentation proceeds, students will be able to indicate when they believe a specific topic has been covered. These marks will individually index the
199 instructional streams. When they wish to review course material, the students can access the Web-delivered instructional streams based upon their individual marks. By allowing the students to use their individual marks to access course material, this system provides an incentive to mark topics as accurately as they can and serves to engage the students during the presentation. Each student will be able to create personal links from personal material to course material, facilitating self-directed learning. By providing tools that assist students in retrieving, evaluating, comprehending and memorizing information while performing learning tasks the system will provide clear benefits. [11] Student topic marking also serves another purpose. As the students mark topics, CaSA will determine how many students believe that the instructor has covered a particular topic. This information can be made available to the instructor on demand or via instant notifications based upon predefined thresholds. If the instructor feels that he/she has presented a topic, but student thresholds have not been met, the instructor will know immediately and can make appropriate changes. CaSA will use the installed network of computers to permit the instructor to ask frequent, periodic questions to assess student understanding of the main concepts from the ClassPlan. These questions will be integrated with an intelligent FAQ (iFAQ) data bank to promote self-directed learning. Answers will be collected and immediate feedback regarding the overall state of student understanding can be presented to the student and the instructor. This will permit both student and instructor to take corrective action in a timely matter. In addition, reaction to and general attitude toward the material and class can be surveyed through positive and constructive questions during and at the end of the class. 3.3
Post-Classroom Review and Study
In a post-classroom review and study the system should be entirely student-centric. An Adaptive Computer Assisted Instruction (ACAI) system such as Arthur, developed previously at UC [12] is an ideal platform for this phase of learning. Offering multiple presentation methods from the lecture and off-line instruction modules, Arthur allows a student to switch to the presentation method that will best suit the student's learning style. Arthur has been designed to achieve "A Significant Difference" in learning. CaSA will provide an interface to the existing Arthur system, incorporating it into the overall solution. 4
Conclusions
It is anticipated that the creation and implementation of CaSA will demonstrate the usefulness of technology in education, especially in higher education where significant technological resources are already invested. It will augment both
200
classroom and post-classroom learning, provide a bridge between seat-time and study-time, course objectives and course assessment and will provide a channel for student feedback so that timely and adaptive instruction can be realized. References 1.
Russell, T. L. (compiled by). The no significant difference phenomenon as reported in 355 research reports, summaries and papers: A comparative research annotated bibliography on technology for distance education. North Carolina State University: Office of Instructional Telecommunications. (1999). 2. Abowd, G.D, "Classroom 2000: An experiment with the instrumentation of a living educational environment, Pervasive Computing, Vol. 38, No. 4, 1999. 3. Padhye,, J. and Kurose, J, "An Empirical Study of Client Interactions with a Continuous-Media Courseware Server," Technical Report UM-CS-1997-056, University of Massachusetts, 1997. 4. Stern, M, Steinberg, J, Lee, H. I., Padhye, J., and Kurose, J.,"MANIC: Multimedia Asynchronous Networked Individualized Courseware," Proceedings of Educational Multimedia and Hypermedia, 1997. 5. Mukhopadhyay, S. and Smith, B., "Passive Capture and Structuring of Lectures," Project Zeno, Cornell University. 6. Hannafin, R. and Sullivan, H., "Learner Control in Full and Lean CAI Programs," Educational Technology Research and Development, Vol 43, No. 1, 1996, pp. 19-30. 7. Baker, E. L.; Mayer, R. E., "Computer-based assessment of problem solving," Computers in Human Behavior, 15, pp. 269-282 (1999). 8. Sanders, William L.; Wright, S. Paul; Horn, Sandra P., "Teacher and Classroom Context Effects on Student Achievement: Implications for Teacher Evaluation," Journal of Personnel Evaluation in Education Volume: 11, Issue: 1, April 1997, pp. 57-67. 9. Brien, Robert and Nick Eastmond, Cognitive Science and Instruction, , Educational Technology Publications, pp. 18-33 (1994). 10. Puntambeker, Sadhana; Agnes Stylianou and Qi Jin, "Visualization and External Representation in Educational Hypertext Systems " Artificial Intelligence in Education Volume 68, IOS Press, pp. 589-591 (2001). 11. Aroyo, Lora; Darina Dicheva and Ivan Velev, "A Concept-based Approach to Support Learning in a Web-based Course Environment" Artificial Intelligence in EducationVolume 68, IOS Press, pp. 1-13 (2001). 12. Gilbert, J. E.; Han, C. Y.,"Arthur: Adapting Instruction to Accommodate Learning Style," Proceedings of WebNet 99: World Conference on the WW and Internet, Honolulu, HI: Association for the Advancement of Computing in Education, pp. 433-439, (1999).
201
BUILDING UP A MINIMAL SUBSET OF JAVA FOR A FIRST PROGRAMMING COURSE ANGEL GUTIERREZ Department of Computer Science, Montclair State University, Upper Montclair, NJ 07043, USA E-mail: [email protected] ALFREDO SOMOLINOS Department of Mathematics and Computer Information Science, Mercy College, 555 Broadway, Dobbs Ferry, NY 10522, USA E-mail: [email protected]
There are problems associated with the usage of Java as a first programming language. The complexity of the language, for instance, makes it difficult to use the object-oriented approach from the beginning. We present in this paper the building blocks for a Reduced Instruction Set Java, which allows to write graphical programs and to teach a basic object-oriented programming course, without neglecting the fundamental constructs of the language.
1
Introduction
A common problem encountered in many Java books [1], [2], is that they try to cover too much. If these books are used to teach Java as a first programming language, the students are overwhelmed by the difficulty of the language. The language is too large, there are thousands of functions and therefore even simple programs require a lot of background. The usage of console applications as a starting point produces dull text output, and requires sophisticated techniques for input, usually hidden in ad hoc classes. Even if the books are more object-oriented, [3], [4], the great amount of classes used tends to jeopardize the acceptance of the language. In order to avoid these hurdles we start to develop a Reduced Instruction Set Java (RISJ). This set will be a minimum one that will allow writing graphical programs and teaching a basic object-oriented programming course. We use the strengths of the language, object orientation and graphics, from the very beginning. The approach to teach RISJ is a tutorial one, presenting the general ideas in the context of an application, without trying to be exhaustive or encyclopedic.
202
In the next sections we will expand the first components of RISJ, enumerating the programming ideas introduced in each section, and presenting sample programs that illustrate those ideas. We will also keep track of the new keywords and functions used, and when opportune the sample output will be shown. On the other hand, we will neither explain in detail the programming concepts nor show the complete syntax specifications. 2 2.1
The First Program: A Graphic "Hello World" Concepts
This basic program develop the following concepts: Writing, compiling and executing programs, and using libraries; Executing an applet either from the IDE or from a browser, using HTML; Classes of programs and inheritance; Graphic objects; Build-in functions; Calling a function and using parameters. 2.2
The Program II Program SayHiJava import java.awt*; // Import classes. All the programs will need this and next import classes. import java.applet*; // Import classes. We will not explicitly write them, although they should be there. public class SayHi extends Applet { public void paint(Graphics screen) { screen.fillRect(140,40,20,BO); // Left side of H screen.fiilRect06O,9O,6O,2O); //Centerbar screen.fillRect(220,40,20,130); // Right side screen.fillRect(260,90,20,80); // bottom of I screen.fiUOval(260,60,20,20); //1 dot } }
The output of this program can be seen in Figure 1.
Figure 1. Output of the SayHi program. 2.3
New keywords, constants and methods
Graphics, import, public, class, extends, void, fillRect(), fillOval().
203
3 3.1
Class inheritance Concepts
Inheritance is a mechanism for improving existing working classes without changing the source code of the original class. The inheritance mechanisms that we consider are: Adding variables, adding functions, and overriding functions. 3.2
Programs II Program Square.java. It needs to include the import classes public class Square extends Applet { public void paint(Graphics screen) { screen.fillRect(140,40,130,130);
( }
// Program SquareColor.java. It also needs the import classes. See first program public class SquareColor extends Square { public void paint(Graphics screen) { screen.setColor(Color.red); super.paint(screen); } } // Program SquareColorBG.java. It also needs the import classes public class SquareColorBG extends SquareColor { public void initO { setBackground(Color.blue); repaintO; }
A sample output can be seen in Figure 2.
Figure 2. Sample output of the class inheritance programs 3.3
New keywords, constants and methods
Init(), setColorQ, Color.red, Color.blue, setBackgroundQ, super, repaint().
204
4 4.1
Defining your own commands. Functions Concepts
We use global and local variables (storage locations). We see functions as black boxes. First we use functions without parameters and then functions with parameters, since passing values to a function avoids using global variables and adds flexibility. We distinguish between input parameters and formal parameters, these being just placeholders. We deal finally with the overloading of functions. 4.2
Programs II Program Circleplain. Java. It draws a circle. Do not forget to include the import classes. public class CirclePlain extends Applet { Color drawColor; // Global void circle(int px, int py, int pdiam, Graphics pscreen) { pscreen.setColor(drawCoIor); pscreen. setXORMode(Color.white); pscreen.fillOval(px,py,pdiam, pdiam);
) public void paint(Graphics screen) { drawColor = Color.blue; circle (50,50,150,screen);
i ) // end class // Program Ring.java public class Ring extends CirclePlain { public void paint (Graphics screen) { int x,y,diam; x = 50; y = 50; diam = 150; // variable defined in Circle Plain drawColor = Color.blue; circle (x,y,diam,screen ); // function defined in Circle Plain circle (x+10,y+10,diam-20,screen); } } // End Program Ring // Program Piggy Java. It draws a pig's face public class Piggy extends CirclePlain { public void paint(Graphics screen) { int x,y,diam; drawColor = Color.pink; // global in CirclePlain x = 140; y = 70; diam = 140; circle (x,y,diam,screen); // face x = 165; y = 105; diam = 20; circle (x,y,diam,screen); // eyes x = 235; y = 105; diam= 20; circle (x,y,diam .screen); x = 180; y = 130 ; diam = 60; circle (x,y,diam .screen); // spout x = 190; y = 145 ; diam =15; circle (x.y.diam .screen); // nostrils
205 x = 215; y = 145 ; diam = 15; circle (x,y,diam .screen);
A sample output can be seen in Figure 3.
o
11 111
Figure 3
4.3
New keywords, constants and methods
Color, int, getGraphics(), setXORmode(). 5 5.1
Using functions with parameters in loops Concepts
We create animation by changing a function parameter inside a loop. We deal with conditions, relational operators, and selection using if. Finally while and for loops are encountered. 5.2
Programs II Program MoveBall.java. A blue ball moves from left to right. Import classes need to be included public class MoveBall extends CirclePlain { int maxX, maxY; public void init() { maxX = 400; maxY = 300; setSize(maxX,maxY); } void pause(int count) { Graphics screen = getGraphics(); for (int i = 0; i < count; i++) circle (0,0,0,screen); // waste time
206 ) public void paint(Graphics screen) { int x.y.diam; y = 70; diam = 30; drawColor = Color.blue; // variable defined in Circle Plain x = 0 ; // Initialize Loop while ( x < maxX-diam) // Test { circle (x,y,diam,screen); pause(5000); circle (x,y,dian%screen); x = x+l;//Update
) } // End paint } // End Program
5.3
New keywords, constants, methods and operators
SetSize(), for, while, operator < , operator ++. 6
Discussion
•
We have introduce 22 keywords, constants and methods used in RISJ: import, public, class, extends, void, Graphics, fillRect(), fillOval(), Init(), setColor(), Color.red, super, Color.blue, setBackground(), repaint(), Color, int, getGraphics(), setXORmode(), SetSize(), for, while. Object oriented concepts were used from the beginning: The students learn how to use classes and objects before they are taught how to define their own. In fact, in the examples we used inheritance before we showed how to define our own methods.
•
References 1. Bell D. and Parr M., Java for Students, Second Edition (Prentice Hall, New Jersey, 2000). 2. Deitel H. and Deitel P., Java, How to program, Third Edition (Prentice Hall, New Jersey, 2001). 3. Horstmann C , Computing Concepts with Java 2 Essentials, Second Edition (John Wiley and Sons, New York, 2000). 4. Wu T., Introduction to Object-Oriented Programming with Java, Second Edition (McGraw-Hill, New York, 2000).
207
COMPLETING A MINIMAL SUBSET OF JAVA FOR A FIRST PROGRAMMING COURSE ANGEL GUTIERREZ Department of Computer Science, Montclair State University, Upper Montclair, NJ 07043, USA E-mail: [email protected] ALFREDO
SOMOLINOS
Department of Mathematics and Computer Information Science, Mercy College, 555 Broadway, Dobbs Ferry, NY 10522, USA E-mail: [email protected] There are problems associated with the usage of Java as a first programming language. The language is object-oriented, but the need to learn the details of the syntax, relegates the objectoriented concepts to the background. We complete in this paper a previously started Reduced Instruction Set Java, which allows to write graphical programs and to teach a basic object-oriented programming course, without neglecting the fundamental constructs of the language.
1
Introduction
A common problem encountered in many Java books [1], [2], is that they try to cover too much, or they deal with too many classes, [4], [5], as we previously mentioned [3]. We now use graphics-style interactions with events to complete a Reduced Instruction Set Java (RISJ), [3]. The approach is the same as before, but here we try to avoid text input through the console. 2 2.1
Declaring and constructing GUI objects: Buttons and TextFields Concepts
The Graphics User Interface i.e. predefined classes to create a program's graphic interface. Creating objects with new. Using constructors. Adding objects to the applet. Sending messages to the objects (using the object member functions).
208
2.2 Programs II Program TwoBttns.java. Constructing GUI objects import Java.applet.*; // Import classes. All the programs will need this and next import classes. import java.awt.*; // Import classes. We will not explicitly write them, although they should be there. public class TwoBttns extends Applet { Button oneButton,twoButton; public void init() { oneButton = new Button( "one!" ); // create buttons add( oneButton); twoButton = new Button( "two!" ); twoButton.setBackground(Color.cyan); add(twoButton ); } //end init public void paint(Graphics screen) { oneButton.setLabel("Red"); oneButton. setBackground(Color.red); }
) // Program LoopsTexts.java. It uses Text Fields and nested for and while loops. It needs to import classes public class LoopsTexts extends CirclePlain { public int maxX, maxY; TextField redText,blueText; public void init() { redText = new TextField( "Red" ,30); add( redText); blueText = new TextField< "Blue", 30 ); add(blueText); maxX = 400; maxY = 300; setSize(maxX ,max Y); } void pause(int count) { for (int i = 0; i < count; i++) showStatus("Paused"); // Waste some time
1 public void paint(Graphics screen) { int x = 0 , y = 70, diam = 20; for (int k = 0; k < 2; k++) { drawColor = Color.blue; blueText.setText("Blue going right!"); while ( x < maxX-diam) { circle (x,y,diam,screen ); pause(5000); circle (x,y,diam,screen ); x = x+l; } blueText. setText("Last x = " + x); drawColor = Color.red; redText.setTextf'Red going left!"); while ( x > 0 ) { circle (x,y,diam,screen); pause(5000); circle (x,y,diam,screen ); x = x-l;
209 redTextsetTextfFirst x = " + x); } // end for }//end paint } // End Program
The output can be seen on Figure 1.
Figure 1. Constructing GUI objects.
2.3
New keywords, constants and methods
Button, new, add(), setLabel(), TextField, SetText(), ShowStatus(). 3 3J
Interactive Programs. Events Concepts
We deal with events and event handlers, i.e., interrupt driven programming, interfaces and the methods whose implementation is required by the interface. In particular we show the ActionListener interface and the implementation of the method actionPerformed and how to obtain the source of the interrupt. 3.2
Programs li Program ColorOval.java. ActionEvents are generated by clicking on buttons, and changing the contents of text // fields. We need only to implement one function actionPerformed. The ball turns green when the button is clicked // We need to import the customary classes. public class ColorOval extends Applet implements ActionListener { int x,y,width,height; Color drawColor; Button bgreen; public void initO { bgreen = new Button("Green"); add(bgreen); bgreen.addActionListener(this); I
210 public void paint(Graphics screen) { x = 100; y = 100; width = 150; height = 150; drawColor = Color.red; screeasetColor(drawColor); screen.setXORMode(Color. white); screen.fiUOval(x,y,width,height); } public void actionPerformed(ActionEvent buttonEvent) { String bLabel = buttonEventgetAcuonCommandO; if (bLabel.equals("GreenM)) drawColo r= Color.green; repaintO; } // end actionPerformed } // End Program
The output can be seen on Figure 2.
fcftppial started
Figure 2. ActioeEvents are generate!, for instance, by clicking on buttons
33
New keywords, constants and methods
String, implements ActionListener, addActionListener(), this, actionPerformed(), ActionEvent, getActionCommand(), equals. 4 4.1
A Text Input Program Concepts
We use Text Fields for input and for generating interrupts. The ActionPerformed event is revisited. We also convert from strings to numbers, from decimal numbers - float - to integers, and we use the Math library functions.
211 4.2
The program II Program TextEventjava. The program asks the user to guess a number between 1 and 100. // It gives hints, "too high", "too low", to help in the search. It needs to include the usual import classes
public class TextEvent extends Applet implements ActionListener { TextField outputBox.promptBox, inputBox; int targetNumber; public void initO { outputBox = new TextField("Guess a number from 1 to 10 ",40); add(outputBox); promptBox = new TextField("Move below with the mouse. Type it Press Enter",40); add(promptBox); inputBox = new TextField("",20); add(inputBox); inputBox.addActionListeneitthis); targetNumber = (intX 1 + 100 * Math.random()); // randomO returns a number between 0 and 1 } public void actionPerformed(ActionEvent inputBoxEvent) { int number, String StringOfDigits; StringOfDigits = inputBoxEventgetActionCommand(); number = Integer.parselnt( StringOfDigits); if (number = targetNumber) outputBox.setText("Congratulauons! The number " + number + " is the winner"); else if (number < targetNumber) outputBox.setText("The number you entered " + number + " is too low "); else outputBox.setText("The number you entered " + number + " is too high ");
The output can be seen on Figure 3.
PKiiiR^^Si^^^^^^Biiii \
|
}
|GuessanumbBrfrom1to100
|Movebeloww[ththeminise.TypeH PressEnter
1
1
^Applet started.
1
!
!
]
hue number you entered 51 istoohigh [Move belowwlth the mouse. Type It PressEnter
j
L5!!
:
Applet started.
Figura 3. Asking the user to guess a number, and giving hints about the answer.
212 4.3
New keywords, constants, methods and operators
Math.random(), float, (int), Integer.parselnt(), operator + for strings. 5
Discussion
•
There are in total, [3], 41 keywords, constants and methods used in RISJ: import, public, class, extends, void, Graphics, fillRect(), fillOval(), Init(), setColor(), Color.red, super, Color.blue, setBackground(), repaint(), Color, int, getGraphics(), setXORmode(), SetSize(), for, while, Button, new, add(), setLabel(), TextField, SetText(), ShowStatus(), implements ActionListener, addActionListener(), this, actionPerformedO, (int), float, ActionEvent, String, getActionCommand(), equals, Math.random(), Integer.parselnt(). Using these few keywords, we think that one can illustrate most of the standard techniques introduced in a first programming course. Object oriented concepts were used from the beginning and user interaction is handled using the GUI: Texfields and buttons are all we need to input data from the user. They generate Action Events, which can be easily handled. Using the program source as input: When the student is working with the IDE, the simplest way of changing the behavior of the program is to change the values of the variables. This can be done from the Watch, or Inspector windows in the debugger, or by just modifying the source and recompiling. This is much faster than having to answer several questions of the type: "Please enter the value". Later, when they are more comfortable with the language, they can be taught how to change the values of variables using more traditional methods.
•
•
References 1. Bell D. and Parr M., Java for Students, Second Edition (Prentice Hall, New Jersey, 2000). 2. Deitel H. and Deitel P., Java, How to program, Third Edition (Prentice Hall, New Jersey, 2001). 3. Gutierrez A. and Somolinos A., Building up a Minimal Subset of Java for a First Programming Course, (to appear). 4. Horstmann C., Computing Concepts with Java 2 Essentials, Second Edition (John Wiley and Sons, New York, 2000). 5. Wu T., Introduction to Object-Oriented Programming with Java, Second Edition (McGraw-Hill, New York, 2000).
213 DEWDROP: EDUCATING STUDENTS FOR THE FUTURE OF WEB DEVELOPMENT JOHN BEIDLER Computing Sciences, University ofScranton, Scranton, PA 18510, USA E-mail: [email protected] There are many references supporting the Web's client-side (web browsers), a few references describing the Web's server-side, but there is little in the way of comprehensive support material on all aspects of website development. This paper describes the modifications being made to the Web Development course at the University of Scranton. The changes are based on the premise that the Web is an object-oriented client-server system for the dissemination and gathering of information. If the Web supports the dissemination and gathering of information, then a database is an appropriate repository for that information. The Web Development course is a junior-senior level course. It is being reorganized into three levels of presentation: (lj Introductory Part - Presents the fundamentals of client-side development, the Common Gateway Interface (CGI), and server-side programming. (2) Intermediate Part - Develops the material required to support server-side development, through the construction and delivery of virtual web pages using object-based reusable components and reducing defects by paying attention to process patterns. (3) Advanced Part - Present the fundamentals of web server to database interface for the delivery of virtual web pages. The approach is called DEWDROP, Database Enhanced Web Development with Reusable Objects and Patterns.
1
Introduction
The University of Scranton has offered a Web Development course since the 19961997 academic year. Initially, it was offered as a Special Topics course. From its inception the course covered the essentials of web development - client-side development, the Common Gateway Interface (CGI), and server-side software development. For the first three years I experimented with additional topics on various aspects of web programming. Because of my particular interest in software reuse, the construction of reusable resources has always been an integral part of the course. After the first three years we noticed that the course was having an impact on several other upper division courses because the course introduces students to approaches to programming and software development not normally developed in other courses including such topics as regular expressions for tokenizing strings, using hash tables to handle information, event driven programming, and practical software reuse experience. For example, the discussion of the Web as a set of Internet protocols and an introduction to security issues in the web development course introduces students to topics covered in depth in the Network
214 Communications Course. The Web Development course became extremely popular; almost all computing majors take this course, with the vast majority doing so in their junior year. During 1999-2000 the Web Development course made the transition from a Special Topics offering to a regular upper level course. As part of that process the department faculty discussed the positioning of this course relative to other courses. The course has a single sophomore level course as a prerequisite. The course was approved by the department, passed its review by the College of Arts and Sciences, and was approved by the faculty senate. The department specifically positioned the Web course before the Network Communications Course and the Database Course so that these three courses along with the senior capstone course could be the basis for a set of sequenced assignments leading to a comprehensive capstone assignment. 2
The PNA Project
In the 1999-2000 academic year, I received funding for a project to develop a prototype of a health sciences website in collaboration with a professor of Dietetics and Nutrition, Dr. Marianne Borja, at Marywood University. The project supported the development of a website called the Personal Nutrition Assistant Project (PNAP). The project helps diabetics and other individuals with a need to control their nutritional intake. The website used the USDA Nutrient Database for Standard Reference, nutritional information on over 6000 food items. Nonparticipants may access the system using the URLs, www.scranton.edu/pnap or www.marywood.edu/pnap. The website is currently utilized by eight local medical centers and is actively being developed. After putting the USDA database on our department's database machine and constructing a web interface using Perl 5's DBI module, I discovered that I had underestimated the ease with which a database could be used as the backend for a website. Following consultation with Dr. Yaodong Bi, the faculty member in our department who teaches the database courses, I realized that, with some effort, it was feasible to redesign the Web Development course in a way that gives the students a database driven website design experience. After further consultation with other members of our department we discussed methods of including database material in the Web Development course.
3
DEWDROP
The course will take students with little or no web development experience to where they are prepared to participate in the future of the Web utilizing database enhanced web development. At first it may appear that the amount of material I plan to cover
215 is too large for a typical three-credit course. Based on my experience with the web development course, I believe that the proposed collection of material can be covered by (1) keeping the course focused on its eventual goal, and by (2) making extensive application of software reuse. The course is achieved through a three-part presentation of the material described in the subsections below. A key element in the strategy of presenting this course is software reuse, which is not presented simply as a sound strategy for software development, but also as a means for delivering course material. Reuse is applied to both design patterns and software process patterns. 3.1
Introductory Part
The introductory material is a refinement of material developed over the last few years - an introduction to the client side, the CGI interface, and server-side programming. This material is covered in about three weeks. On the client side, time spent presenting HTML is kept to a minimum. Most students have previous HTML experience, however, all students have access to several on-line HTML tutorials. Web browsers are presented as containers for a pair of object models: the web browser's document object model and the object model for the Javascript interpreters that reside in web browsers. The course emphasizes how these two object models interact in different ways during the pre-load, onLoad, and post-load stages of a web page. The terms pre-load, onLoad, and post-load refer to the time frames surrounding the web page's onLoadQ event. I expect students to learn the basics of Javascript on their own. They have access to many on-line references, like the Javascript Tip of the Week website. The "Tip of the Week" site contains many useful Javascript examples, but the examples are not well packaged. This site is typical of many web sites in that they do not make good use of Javascript's object model. I use this opportunity to emphasize Javascript's object model and demonstrate how it may be employed to encapsulate resource in reusable js files. Normally, students complete two laboratory assignments and one regular assignment involving the encapsulation of Javascript resource. The construction of web forms leads to the CGI interface. The CGI interface is an excellent example of the need to follow standards and recognizing patterns. This topic is approached at three levels. The first two levels are discussed during the introductory part of the course; the third level is described in the intermediate part of the course. As part of the low level description of CGI, the web browser's encoding of the CGI interface is described and the resources required on the serverside to decode the information are introduced. Several artifacts to investigate the CGI interface are provided. One is an artifact, called formecho, that echoes back to the web browser a copy of the encoded string sent by the browser to the we b server. A modified version of Perl's cgi-lib.pl, a standard CGI's interface, is presented as
216
the second level CGI interface. The modified version of cgi-lib.pl includes an extension that supports off-line testing of server-side software using as input the encoded strings echoed back by the formecho artifact. This begins an important multi-step process pattern, described in detail the intermediate part of the course. The introductory part ends with an introduction to the resources essential for server side software development: 1. String processing features. 2. File and directory processing. 3. Access to environmental variables. 4. The ability to use other system resources (call programs). 5. Appropriate data structures, in particular, hash tables (associative memory). Although Perl is used because it provides access to these resources, other programming languages may be used as well. I have seen examples written in Tel, Java, Ada, COBOL, C, and C++. 3.2
Intermediate Part
The intermediate part plays an essential role in the successful delivering of material presented in this course. This part consumes the middle half of the course, about seven weeks. A key element in this part is the emphasis on software process, paying attention to how tasks are accomplished in order to avoid defects, or remove them as early as possible. Emphasis on the client side is on the web page document object model, Javascript's object model, and the interactions between them. This is developed by using a technique I developed to address some of the differences between the Javascript object models in Microsoft's Internet Explorer (IE) and Netscape's Navigator. Many of the conflicts between the IE and Netscape are addressed by employing a technique that is not well documented; namely, in both models, practically any item that can be addressed using typical dotted notation, ABC.xyz, can also be accessed as a hash, ABC["xyz"]. It is amazing how frequently Javascript that works on one browser and not on the other can be made to work on both browsers by replacing the problematic dotted notation with hash-like access. Here we present the third approach to the CGI interface using Perl 5's CGI module. This approach uses Perl 5's object model, presenting the interface as an object. In addition, CGI supports multipart forms, which include a file upload capability. The CGI interface is also used as the focal point for addressing cross platform development. One problematic software development scenario is one where software is developed on a Microsoft platform and the production website is on a UNIX platform. This scenario provides an opportunity to address process issues. Information on the CGI interface in the introductory part is extended from
217 off-line testing of CGI scripts, to testing a CGI script with a web server on the development machine, and finally moving the script to a production website on a UNIX platform. Students are required to maintain defect logs as they move through this three-stage process. They learn to recognize and correct defects in both their software and their development processes. This leads to the creation of support scripts to automate the process and further reduce defects. A key in the middle part of this course is selecting the right types of assignments that will prepare students for the advanced part of this course. It is relatively easy to develop interesting assignments that do this by combining the use of regular expressions, hash tables, and tab delimited files while giving the students more experience with both sides of web development. One good example is a concordance listing assignment, which uploads a file in specific programming language and constructs a framed web page that allows a person to browse a formatted and colorized version of the program that appears in one frame by clicking on the links in the concordance listing in the second frame. PHP is introduced at the end of the first half of the course. PHP is a scripting language that is placed in an HTML file. With PHP the developer can describe both the client-side actions and server-side actions in a single location, the web page. Server-side actions are described within process tags that are performed on the server-side using a simple method of executing server-side software called ServerSide Includes. As a result, the software developer has both the client-side processes and server-side processes described in one document, an HTML file. One of PHP's advantages is that it helps the software developer to distinguish between the roles of objects, their attributes, and representations of objects and attributes on both the client-side and the server-side. The result is the potential for reduced software development time and improved packaging of reusable software. 3.3
Advanced Part
This part covers the last three to four weeks of the course. Since the assumption is that students do not have previous database experience, the course appears to be limited as to what it can accomplish. I have discussed this issue with Dr. Yaodong Bi, the faculty member who teaches our database course, and we agree that the approach described here is feasible. The emphasis placed on tab-delimited files and hash tables in the intermediate part of the course leads naturally to database access. Several pre-defined databases are being considered, and a small set of SQL commands will be presented. The students' previous experience with hash tables will be used as a basis for explaining the SQL commands. A danger in this part of the course is to attempt to do too much. Remember, database experience is not a prerequisite for this course. However, the hash-tables-to-simple-databases analogy along with a small collection of SQL commands is sufficient to set the stage for using databases as the back end to a website.
218 Great care must be taken in developing this material, including the selection of the right tools and a good process. At least two choices are available, Perl 5's DBI module and PHP's database interface. Both support SQL commands. Since I have experience with Perl 5's DBI module, I want to parallel that experience with PHP and construct several laboratory assignments around both Perl and PHP and use Perl's DBI interface one year and PHP's the following year and perform a formal assessment to determine the relative merits of each approach.
4
Curricular Impact
By offering the Web Development course in the junior year, we expect that a very large majority of students will take the Network Communication course and the Database course before taking the Senior Projects course. We plan to develop several assignments that that will build on the web course and lead to possible senior projects. The result will be an opportunity to give students piece-meal assignments that could span up to three semesters. Another course that has been impacted by the web course has been the Programming Languages course. The extensive use made of hash tables and tokenizing using Perl's GREP-like regular expression capabilities has forced the instructor in the Programming Languages course to rethink several assignments. As a result, that course now covers a richer collection of languages, including the scripting language Tel. 5
Conclusions
Too often when I tell people that I teach a course in Web Programming for majors I encounter skepticism about offering such a course to majors. Usually it is given in the context of a statement like, "You teach a course in HTML!?" Needless to say, this is not a course in HTML. Web programming offers a unique opportunity to present multi-platform, multi-program language software development. However, the future of web technology lies not in HTML, it lies in the delivery of virtual web pages, pages constructed on demand to meet the needs of the client. That construction involves selecting information from a database and delivering the desired results in a useful format. This paper describes one approach to teaching web programming as a junior level course that does not have a database prerequisite, DEWDROP. To compress the course materials so that database web development may be taught the course makes extensive use of reusable software components packages using the object features in the various programming languages. Another alternative would be to teach the web course as a senior level course and having the database course as a prerequisite. We considered that possibility, but
219 it did not appear as attractive as the approach we are taking because it did not allow for the opportunity to have projects that could run as long as three semesters. In any case, the web course clearly demonstrates the power of two features that need more development, the use of a regular expression capability and the use of hash tables. Both give students a unique experience demonstrating the importance of having the right tools for the job. Finally, if you are teaching databases, the USDA Nutrient Database for Standard Reference, Release 13, http://www.nal.usda.gov/Jhic/foodcomp/Data/, is an example of a well constructed database with well designed tables. It is a real, non-contrived database that is ready for your use. Try it out. 6
Acknowledgements
I'd like to thank Yaodong Bi, Paul Jackowitz, Bob McCloskey, and Richard Plishka for their advice and suggestions as the Web Development course evolved, and as it continues to evolve. References 1.
2. 3. 4. 5. 6.
7. 8. 9. 10. 11. 12. 13.
Borja, Marianne, and John Beidler, "The Personal Nutrition Assistant Project", Proceeedings of the American Dietetics Association Conference, Denver, CO. October 17-20, 2000. Beidler, John, and Marianne Borja, "The PNA Project", Proceedings of the CCSCNE-01. Middlebury, Vt. April 2001. Goodman, Danny and Brendan Eich, The Javascript Bible (4th Edition). Hungry Minds, Inc. April 2001. Guelich, Scott, et al, CGI Programming. O'Reilly & Associates. July 1997. Hamilton, Jacqueline D., CGI Programming 101. CGI101 .com. February 2000. Heinle, Nick, and David Siegel, Designing With JavaScript: Creating Dynamic Web Pages (Web Review Studio Series). O'Reilly & Associates. September 1997. Kabir, Mohammed J., Apache Server Bible. Hungry Minds, Inc. July 1998. Kingsley-Hughes, Adrian and Kathie Kingsley-Hughes, Javascript 1.5 by Example. Que. January 11, 2001. Laurie, Ben, Peter Laurie, and Robert Denn, Apache : The Definitive Guide. O'Reilly & Associates. February 1999. Lea, Chris, et al. Beginning PHP4. Wrox Press Inc. October 2000. Medinets, David. Perl 5 by Example. Que. October 1996. Musciano, Chuck and Bill Kennedy, HTML & XHTML : The Definitive Guide. O'Reilly & Associates. August 2000. Ray, Erik T., Learning XML. O'Reilly & Associates. February 2001.
220
14. Schwartz, Randal L., et. al. Learning Perl (2nd Edition). O'Reilly & Associates. July 1997. 15. Thomson, Laura. PHP and MySQL Web Development. Sams. March 2001. 16. Wall, Larry, et.al. Programming Perl (3rd Edition). O'Reilly & Associates. July 2000.
221 B O O L E A N FUNCTION SIMPLIFICATION ON A PALM-BASED ENVIRONMENT
LEDION BITINCKA AND GEORGE ANTONIOU Department of Computer Science, Montclair State University, Upper Montclair, New Jersey 07043, USA E-mail: [email protected], [email protected], In this paper the problem of minimizing Boolean expressions is studied and an optimal implementation is provided. The algorithm follows the Karnaugh map looping approach. For the implementation C++ coding was used on the CodeWarrior for Palm Operating System environment. In order to make the overall implementation efficient, the object oriented approach was used. Two examples are presented to illustrate the efficiency of the proposed algorithm.
1
Introduction
It is well known that the Karnaugh-map (K-map) technique is an elegant teaching resource for academics and a systematic and powerful tool for a digital designer in minimizing low order Boolean functions. Why is the minimization of the Boolean expression needed? By simplifying the logic function we can reduce the original number of digital components (gates) required to implement digital circuits. Therefore, by reducing the number of gates, the chip size and the cost will be reduced and the computing speed will be increased. The K-map technique was proposed by M. Karnaugh [1]. Later Quine and McCluskey reported tabular algorithmic techniques for the optimal Boolean function minimization [2,3]. Almost all techniques have been embedded into many computer aided design packages and in all the logic design university textbooks [4]~[11]. K-map is a graphical representation of a truth table using Gray code order. It is suitable for elimination by grouping redundant terms in a Boolean expression. By optimizing the algorithm it is possible to simplify entirely a given Boolean expression. Unfortunately almost all the techniques along with the Espresso technique [12] do not always guaranty optimal solutions. In this paper a personal digital assistant (PDA) -based implementation is proposed for simplifying four-variable Boolean functions, using the K-map looping technique. The implementation is found to have excellent results.
222
The proposed PDA application is a useful tool for students and professors in the fields of computer science, electrical and computer engineering. It provides a fast and portable way to check and solve problems in digital logic, discrete mathematics and computer architecture courses. Also the proposed algorithm can be a valuable utility for the computer chip design industry due to the fact that it can be expanded to cover Boolean functions with more than four variables. 2
Algorithm
The proposed algorithm is based on the looping of redundant terms. Therefore in order to take a closer look on how to loop two, four or eight 1 's to get the smaller possible number of groups in a K-map table setting, consider the following simple example (lower case letter represent the complement value eg. a means complement of A): F = abed + aBcd + ABcd + ABcD + AbcD + AbCD
ab aB AB Ab
cd 1 1 1 0
cD 0 0 1 1
CD 0 0 0 1
Cd 0 0 0 0
Analyzing the above K-map table, the following looping observation can be made for each 1 present in the Table. • • • • • •
Cell abed - has one possibility to be paired, with aBcd aBcd - has two possibilities to be paired, with abed and ABcd ABcd - has two possibilities to be paired, with aBcd and ABcD ABcD - has two possibilities to be paired, with ABcd and AbcD AbcD - has two possibilities to be paired, with ABcD and AbCD AbCD - has one possibility to be paired, with AbcD
It is obvious that there are two cells that have one possibility to be paired, namely abed and AbCD, so they get the highest priority. These two boxes get paired the first, abed gets paired with aBcd and AbCD gets paired with AbcD. After these pairings the pairing possibilities of ABcd and ABcB are decremented by one, so leaving both of them
223
with one possibility to be paired. They get paired together finishing this way the optimization of the Boolean function resulting in three pairs. The observation reveals the presence of a consistency or rule that lies beneath this logic. Extending the described idea the following algorithm is derived for the optimal looping of l's in a K-map table. Step-1: Find and loop a possible octet. Step-2: Find and loop cells that have one possibility to pair. Step-3: Find and loop cells that have one possibility to quad. Step-3a: Repeat Step-2 for new cells with one possibility to get paired. Step-3b: Repeat Step-3 for new cells with one possibility to get quaded Step-4: Find and loop cells that have two possibilities to get quaded without sharing. Step-4a: If Step-4 fails because of sharing, choose one quad out of two, with less sharing. Step-4b: Repeat Step-4 until no quads found. Step-5: Find and loop cells that have two possibilities to get quaded with sharing. Step-6: Find and loop cell that have two possibilities to be paired without sharing Step-7: Find and loop cell that have two possibilities to be paired with sharing Step-8: Repeat Step-2. Step-9: If there are cells that have a value of one and are not quaded or paired, Then the cells a) Can't be paired or quaded, b) Have more than 2 possibilities to get paired or quaded. Step-9a: If a) is valid then do not consider this cell in Step-7. Step-9b: If b) is valid then change the possibilities of one of these cells to 2 and go to Step-3 to repeat the procedure. Step-9c: Repeat Step-7 until no cells that qualify for this step are found. It is noted that the pairing and quading possibilities of a cell are reduced by one when a pair or quad is looped and this cell can be paired or quaded with any of the cells of the pair or quad. 2.1 Design The program was developed using the CodeWarrior For Palm Os 7.0, which supports C, C++ and Java. The implementation of the program was done in C++ using its Object Oriented features. The program is divided into two major classes: a parent class (Kmap), and a child class (KmapElement). Each of these classes represent a logical division of the K-map table. The Kmap object controls everything related to the K-map table as a
224
whole, such as initializing and simplifying. In order to apply these functions the Kmap object creates 16 smaller objects representing each box. Each smaller object (KmapElement) learns its own properties and can only perform a function that includes the objects itself. Breaking the program down in this way is advantageous because the amount of complete, detailed, organized and correct information about the K- map is maximized. The parent object holds 16 children of the class KmapElement and administers the way the simplification methods are called. The child class KmapElement represents one element of the Kmap. This object has all the properties, such as: pairing and quading possibilities, the value of the box and the status of this box. These objects can learn about other objects of this class through their parent because they have a reference of the parent. 2.2 Program Flow As soon as an input is presented a Kmap object is created. As a result 16 children of class KmapElement are created and initialized. During the initialization phase each KmapElement gathers data about itself and its possibilities to be paired, quaded or octeted. The instance variables of this object are updated as soon as a pair, quad or octet is formed. After initialization, the Kmap object defines the way that the simplification is going to take place according to the presented algorithm. The actual pairing, quading, octeting and updating of the instance variables is done by the functions of the KmapElement class. The order in which these functions are executed is determined by the Kmap class because it knows the algorithm. The simplification happens only once, which means that there are no trials or secondary simplifications. All the functions in the KmapElement have options of choosing sharing and priority in any combination of the two. Each of the functions is executed only on objects that meet the requirements. In the case where the function completes its main task, it updates the instance variables of the neighboring cells that need to know the occurred looping. 3
Examples
Two salient examples, simple yet illustrative of the theoretical concepts presented in this work, follow below:
225 3.1 Example 1 Consider the following Boolean Expression: F = abed + aBcd + aBcD + aBCD + ABcD + ABCD + AbCD + AbCd
(1)
The following K-map table is generated.
ab aB AB Ab
cd 1 1 0 0
cD 0 1 1 0
CD 0 1 1 1
Cd 0 0 0 1
For each one on the table the following data can be collected: • • • • • • • •
abed aBcd aBcD aBCD ABcD ABCD AbCD AbCd
has one possibility to be paired has two possibilities to be paired has three possibilities to be paired, and one to quad has two possibilities to be paired and one to quad has two possibilities to be paired and one to quad has three possibilities to be paired and one to quad has two possibilities to be paired has one possibility to be paired
Following the presented algorithm yields, 1. 2.
3.
4.
There is no octets in the K-map table. abed and AbCd are paired with aBcD and ABCD respectively and the latter's pair possibilities are decremented by one. abed, AbCd, aBcD and ABCD are marked as done. The aBcD is looped as a quad. It is quaded with, ABcD, aBCD and ABCD. All these cells are marked as done and their quad possibility is decremented by one. No further cells of value of one that are not marked as done are left so all the cells have been included into pairs and quads. Therefore F = acd + AbC + BD
(2)
226 The above-simplified Boolean expression is the optimal solution for the given Boolean expression (1), having three terms. 3.2
Example 2
Consider the following Boolean expression: F = abcD + aBcD + aBCD + aBCd + ABcd + ABcD + ABCD + AbCD
(3)
Using a Palm PDA the following boxes are selected according to each term of (3).
Four VorioMes
cd cD CD Cd
abDEfnn aBDEfEfEf flB SfEf EfD (Simplify] flbDDEfn
.. In this case a quad is possible to be looped but according to the algorithm any quad of any type will not be looped before all the 1 's that have one possibility to be paired are looped. Therefore, • •
abcD has one possibility to be paired aBcD has three possibilities to be paired and one possibility to be quaded.
227
aBCD aBCd ABed ABcD ABCD AbCD
has three possibilities to be paired and one possibility to be quaded has one possibility to be paired. 1ms one possibility to be paired. has three possibilities to be paired and one possibility to be quaded. has three possibilities to be paired and one possibility to be quaded has one possibility to be paired.
According to the algorithm the following looping combinations can be obtained: ® abcD • aBCd • AbCD • ABcd
is paired with aBcD since abcD has one possibility to be paired is paired with aBCD since aBCd has one possibility to be paired is paired with ABCD since AbCD has one possibility to be paired is paired with ABcD since ABcd has one possibility to be paired F = acD + aBC + ACD +Abc
(4)
Using a Palm PDA aud pressing the "simplify" button the above derived result (4) is displayed in the following Palm screen.
228
The simplified Boolean expression (4) is the optimal solution for the given Boolean expression (3). 4 Conclusion In this paper an algorithm was presented to minimize a Boolean expression on a PDA. For the implementation C++ coding was used on the CodeWarrior for Palm environment. The .pre file, which is executable on a Palm PDA, is 54K and is available for download at: http://csam.monrclair.edu/~antoniou/bfs References 1. Karnaugh M., The map method for synthesis of combinatorial logic circuits, Trans. AIEE, Communications and Electronics, Vol. 72, pp. 593-598, (1953). 2. Quine W.V., The problem of simplifying truth tables, Am. Math. Monthly, Vol. 59, No. 8, pp. 521-531,(1952). 3. McCluskey E.J., Minimization of Boolean functions, Bell System Tech. Journal, Vol. 35, No. 5, pp. 1417-1444, (1956). 4. Gajski D. D., Principles of digital design, (Prentice-Hall, 1997). 5. Wakerly J.F., Digital design, Prentice-Hall, New York, 2000. 6. Hill F. J. and Peterson G.R., Computer aided logical design with emphasis on VLSI, )Wiley and Sons, New York, 1993). 7. Katz R.H., Contemporary logic design, (Benjamin/Cummings Publ, Redwood City, CA, 1994). 8. Mano M. and. Kime C. R, Logic computer design fundamentala, (Prentice Hall, New York, 2000). 9. Brown S and Z. Vranesic, Fundamentals of digital logic with VHDL, (McGrawHill, New York, 2000). 10. Hayes, J.P., Digital logic design, (Addison Wesley Publ., New York, 1993). 11. Chirlian P.M., Digital Circuits with microprocessor applications, (Matrix Publishers, Oregon, 1982) 12. Brayton, R.K., G.D. Hachetel, C.T. McMullen, and A.L. Sangiovanni- Vincentelli, Logic minimization algorithms for VLSI synthesis, (Kluwer Publ., Boston, 1984).
229
INTERNET-BASED BOOLEAN FUNCTION MINIMIZATION USING A MODIFIED QUINE-MCCLUSKEY METHOD SEBASTIAN P. TOMASZEWSKI, ILGAZ U. CELIK AND GEORGE E. ANTONIOU Image Processing and Systems Laboratory, Department of Computer Science, Montclair State University, Upper Montclair NJ 07043, USA E-mail: [email protected], [email protected] In this paper a four variable Boolean minimization algorithm is considered and implemented as an applet in JAVA. The application can be accessed on line since it is posted on the World Wide Web at the URL http://www.csam.montclair.edu/~antoniou/bs. After extensive testing, the performance of the algorithm is found to be excellent.
1
Introduction
The modified Quine-McCluskey (M Q-M) method is a very simple and systematic technique for minimizing Boolean functions. Why do we want to minimize a Boolean expression? By simplifying the logic function we can reduce the original number of digital components (gates) required to implement digital circuits. Therefore by reducing the number of gates, the chip size and the cost will be reduced and the speed will be increased. Logic minimization uses a variety of techniques to obtain the simplest gate-level implementation of a logic function. Initially Karnaugh proposed a technique for simplifying Boolean expressions using an elegant visual technique, which is actually a modified truth table intended to allow minimal SOP and POS expressions to be obtained [1]. The Karnaugh or K-Map based technique breaks down beyond six variables. Quine and McCluskey proposed an algorithmic-based technique for simplifying Boolean logic functions [2,3]. The Quine-McCluskey (Q-M) method is a computer-based technique for simplification and has mainly two advantages over the K-Map method. Firstly it is systematic for producing a minimal function that is less dependent on visual patterns. Secondly it is a viable scheme for handling a large number of variables. A number of methods have been developed that can generate optimal solutions directly at the expense of additional computation time. Another algorithm was reported by Petrick [4], This algorithm uses an algebraic approach to generate all possible covers of a function. A popular tool for simplifying Boolean expressions is the Espresso, but it is not guaranteed to find the best two-level expression [6]. In this paper an Internet based implementation is proposed for simplifying two to four-variable Boolean functions, using a Modified Quine-McCluskey (M Q-M) method. The M Q-M technique is implemented as an applet in Java, and can be accessed on line since it is posted on the World Wide Web. Due to the algorithmic
230
nature of the technique the proposed method and its implementation easily can be expanded to cover more than four variables. The main difference between the proposed algorithm and Q-M method starts when Q-M method groups the elements according to the number of one's in each element, but in the proposed algorithm grouping is not required. In the following steps the M Q-M follow Q-M up to the first step of the prime implicant table, which is identifying the essential prime implicants. For the next step Q-M uses several different techniques to eliminate the implicants efficiently. The M Q-M method simulates the elimination process of minterms and finally when the most efficient combination is reached it is taken out from the table. In the following section the algorithm is presented. 2
Algorithm
The M Q-M algorithm is presented using the following step-by-step approach. I. Input: 1.1 Enter the input of the Boolean expression either into the K-map, Truth Table, or as a Boolean expression. 1.2 Obtain the binary representation of each term from the inputted data. II. Calculations: 2.1 Compare each of the terms among themselves in order to find the terms that are logically adjacent. The following rules have to be followed when combining the terms: a. Combine the two terms only if they differ by only one bit. b. Once there are two terms that differ by one bit, create the new term with the same exact bits or characters, except replace the bit that is different in both of those terms to "-" symbol. c. Once done creating the new term mark both the old terms, indicating that both of the terms are combined. 2.2 Swap all of the combined terms (new terms) and terms that weren't combined at all. 2.3 Repeat steps 2.1 and 2.2 until it is impossible to combine the terms. III. Table: 3.1 Make sure that there is only one term alike. That is get rid of a term if it is a duplicate of another term in the content. 3.2 Create a prime implicant chart. 3.3 Identify the essential prime implicants and consider them as the first terms, which will make up the result. After each implicant is put into the result term area, the implicant chart should be updated.
3.4 If there are any more minterms left over, proceed as the following: a. Look into the prime implicant chart for the implicants, which have the exact same minterms and eliminate the one that is less efficient. b. Try selecting out one of the terms and see if the term will cancel out all of the implicants or not. c. If it cancels out all of the implicants, put the term back into the result term area. d. If it doesn't cancel out all of the implicants repeat step b, with the higher combination of the terms to be taken out. IV. Display: Display the values out of the result term area. In the following section a step-by-step example is given illustrating the proposed technique. 3
Example
Simplify the following Boolean function F = abed + abcD + aBcd + AbCd Applying the algorithm we have, I. Input: 1.1 Input the expression in either way as shown in Fig 1. 1.2 In order to obtain the binary representation of the terms, you will have to know that, lower case letters such as "a"," b", "c", and "
//First term //Second term //Thirdterm //Fourthterm
232
1/f-
Input Vr.ur Bmarji Fxpres;ion And Pre-.t Enter
abca
1*
\sbcc + abcD + aEkd + Ab£d
abcD
t?
abCD
r
abCd
f"
aBcd 15 aBcD
T
iBCD
T
aBCc
r
ABcd
T
ABeD
F"
A3CD
f'
ABCd
r
Abed
f"
AbcD
f'
AbCD
f~
AbCd
fi?
ecl
cD
ab
1
1
0
0
aB
1
0
0
0
AB
0
0
0
0
Ab
0
0
0
1
CD Cd
Clear Incut j
Fig. 1: Boolean function input II. Calculations: 2.1 and 2.2 According to the rules of combining the terms, the first term can be combined with the second and the third term. Therefore the result of combining is as follows: Old Terms XOOOO X0001 X0100 1010
New Terms 0000-00
Swapped Terms 1010 0000-00
The result of combining the terms creates only two new terms, but after swapping them we have three terms. The reason for this is because there was one term among the old ones that was not combined with any of the other terms. That is why X does not mark the term 1010.
233
2.3. Due to the reason that swapped terms cannot be combined any further, steps 2.1 and 2.2 are not repeated. III. Table: 3.1 Looking at the result of combining the terms and swapping, the following terms are present. 1010 0000-00 Since there is no any other repeating term, we can just skip this part of the method. 3.2 Prime Implicant Chart: 0000
0001
0100
1010
X
1010 000-
X
0-00
X
X X
Table 1: Prime implicant chart The prime implicant chart is created to indicate what given terms on the beginning were combined to create the resulting terms or the minterms. For example, the term 1010 wasn't combined with any of the other terms, which is why there is an X under 1010. On the other hand, the last two terms, 000- and 0-00_ were combined with the other two terms. The term 000- was combined with the term 0000 and 0001, which is the reason that there are X's under those terms in the corresponding column. 3.3 The next step is to find the essential prime implicants. These are prime implicants that cover minterms, which no other prime implicants cover. In our example the essential prime implicants are all prime implicants. 3.4 After the previous step there are no minterms that are not covered. IV. Display:
234
Initially the given Boolean expression had the form, 0000 + 0001+0100+1010 which is equivalent to (1). After the application of the procedure the following simplified expression is derived. 1010+ 000-+ 0-00 or F' = AbCd + abc + acd
(2)
The above expression (2) is the optimal solution for the given Boolean function (1).
««
i/
'i--i_L
•
r
Fr'd i strH . obci~T'-J
ciO
cd
cD
ab
1
I
aB
1
0
AE
0
0
Ab
0
1 1
CD Cd
aA'a '0C.-1
i/
.lrir.t
•w
•j
0 0
—1
c
0 1
=!£"::•
li-evii.luc .'-LICL'-"
A.- .:> « 0
Hs-!e 't i h i Peru i
Fig. 2: The Boolean function result
235
It is noted that the same result (2) is given in Fig 2., using our on-line implementation. 4
Conclusion
In this paper, a new modified Quine-McCluskey algorithm for minimizing Boolean expressions is proposed and implemented. The application was implemented in Java and can be accessed on the Internet. The results of this paper can easily be extended to cover more than four variables. This application can be an aid for students and professors in the digital logic design courses and a valuable tool for the digital logic designers. 5
Acknowledgements
The authors would like to thank Prof. Carl Bredlau of the department of computer science, Montclair State University for his valuable advice on JAVA programming. References 1. Karnaugh M, The map method for synthesis of combinatorial logic circuits, Trans. AIEE, Communications and Electronics, Vol. 72, pp. 593-598, (1953). 2. Quine, W.V., The problem of simplifying truth tables, Am. Math. Monthly, Vol. 59, No. 8, pp. 521-531, (1952) 3. McCluskey E.J., Minimization of Boolean Functions, Bell System Tech. Journal, Vol. 35, No. 5, pp. 1417-1444, (1956). 4. Petrick, S.K., On the minimization of Boolean functions, Proceedings of Western Joint Computer Conference, pp. 103-107, (1959). 5. Katz R.H., "Contemporary Logic Design", (Benjamin/Cummings Publishing Company, Redwood City, CA, 1994).
Learning Algorithms
239 A U T O A S S O C I A T I V E N E U R A L N E T W O R K S A N D TIME SERIES FILTERING JOSE R. DORRONSORO, VICENTE LOPEZ, CARLOS SANTA CRUZ, JUAN A. SIGUENZA * Depto. de Ingenieria Informdtica e Instituto de Ingenieria del Conocimiento Universidad Autonoma de Madrid, 28049 Madrid, Spain E-mail: [email protected] Autoassociatine neural networks have been used for d a t a compression and filtering. As with any other application, optimal network structure has to be decided. In this work we show how t o select optimal architecture and output parameters when linear autoassociative networks are used to filter white noise added to univariate time series. We also give a numerical illustration of the resulting procedures.
1
Introduction
Autoassociative neural networks (AAN), that is, networks where input patterns and targets coincide, are widely used for tasks such as data compression and dimensionality reduction [1]. They can also be used for filtering purposes, with the network outputs taken as filtered versions of the corresponding inputs. This is a natural approach for multidimensional input patterns, but it implies that unidimensional inputs have to be vectorialised somehow before they can be taken as AAN inputs. For univariate time series this is easily and naturally done by using time delays, and suggests AANs as a natural tool for series filtering. More precisely, suppose we have a noisy series xt = zt + nt derived from the addition of stochastic noise Nt to a clean series, which we will assume to be given by an integer-indexed stationary stochastic process Zt. Using for convenience an odd number 2M + 1 of delays, we can define a 2 M + 1 dimensional vector Xt = (xt-M,..., £t, • • •, xt+M)T, i-e., {Xt)j = Xt+j, T —M < j < M (A denotes the transpose of A). When Xt is feed into an AAN, its outputs yt can then be considered as a filtered version of Xt. However, although the original series xt was unidimensional, the 3^t provide now not a single filtered series but actually 2M + 1 of them, (y^M),..., (j/°),..., {Vt1), where we denote by j / ^ the value at position k, —M < k < M, of iVt-fc- Considering the number 2M + 1 of delays to be fixed, two problems are to be solved: the optimal choice of the network architecture and the selection of the "best" filtered series yk or, equivalently, the best filtering output. *WITH PARTIAL SUPPORT FROM SPAIN'S CICYT, GRANT TIC 98-247
240
In this work we will give theoretical answers to these questions in the case of linear AANs. Their architecture is given by an (2M + 1) x L x (2M + 1) network with 2M + 1 inputs and outputs and a single hidden layer with L units. The optimal parameters L and k will be characterized in the next section in terms of the square error between the processes Yk (see below) and that of the clean series Z. In turn, this error can be expressed in terms of the square error between Yk and X and the value a^, of the noise variance. The practical use of these facts requires thus an estimate of afj. We shall provide one in the third section and give also its error with respect the true noise variance. The results of these two sections are given without proofs, that will appear elsewhere. Finally, the resulting procedures will be illustrated on a numerical example.
2
Optimal network parameter selection
We will assume throughout this work that E[Z], and hence E[X], are 0. If we have N' samples xt of the process Xt = Zt + Nt, we can define N = N' — (2M + 1) + 1 delay coordinate vectors Xt as before. Once network training is finished, the network transfer function is given [3] by yt = J2i=i(XtUl)Ul, where Ul are the eigenvectors associated to the L l largest eigenvalues X x of the (2M + 1) x (2M + 1) matrix S ^ = X T X/iV, with X = (Xi,X2,..., Xx)T• £ M is approximately equal to the sample autocovariance matrix Y^ = (t'x)ij = (7kLji)> w^ith 7 * the k-th. sample autocovariance of X. Although the above eigenvalues and eigenvectors do depend on the concrete M value used, we will assume it fixed and drop the M index accordingly. Notice that the eigenanalysis of T^ has many applications to filtering problems [4]-[6]. Each series yk is thus given by
L
vkt = (yt-kh = YlW-k&K 1=1
L
=E
/
M
\
E *t-k+j*i si.
1=1 \j=-M
(!)
J
where ulk denotes the fc-th component of the l~th eigenvector Ul. We will work for simplicity with the underlying processes Z, X and TV instead of the sample values. A natural tool to characterize the optimal parameters L, k is the error estimate e^{Z, Yk) = E[\Z- Yk\2] between the clean process Z and each of the processes Yk derived from Xt through the time invariant filters given by the process version of (1), that is,
241
fe
L
I
M
\
k+M
( L
44-s ^ = E E *«-*««$ K ==fe-M E V(= E l '=1
Xt-..
(2)
\j=-M
Here u^ denotes the A;-th component of the l-th eigenvector of the true autocovariance matrices Tx or r|f. Notice that by our assumptions, Tx = r^f+0jv/2Af+i, w i t n ^ c being the KxK identity matrix. Thus, A^- = A^+CT^ and Tx and T% have also the same eigenvectors. The following result is then true. Proposition 1 In the above conditions,
eL(Z,Yk)
*l-^r(\lz-o%)(ulf.
=
(3)
;=i
As an easy consequence, we have the following. Corollary 1 The optimal L is the largest value such that for all I, 1 < I < L, Xlz > a% or, equivalently, Xlx > 2a%. Notice that from (3), this ensures the minimization of ej,{Z, Yk) for all k. For practical purposes, however, the error estimate CL{X, Yk) = E[\X — Yk\2] is more convenient than (3). It can be shown that Proposition 2 With Z, X and Yk as before,
eL{X,Yk)=o\-Yj\lx{ulk?
(4)
i=i
Moreover ei,{X,Yk)
and ei,{Z,Yk)
eL(Z,Yk)
are related as l
= eL(X,Yk)
+
(5)
i=i
Now, once L has been chosen, equation (5) tells us to select k = k(M, L) as k(M,L)
= arg mm{eL(Z,Yk)
: -M
< k < M} •
k
= arg min{e L (X, Y ) + cr
2
L
2£(4)
N
.
i=i
2
: -M
M}.
242
In practice, we can replace the theoretical Ul and Xlx by their sample based approximations. Moreover, if we have an estimate a2^ of the noise variance we can then estimate the optimal L and k values as follows: 1. We choose as L the largest V value such that for all I, 1 < I < L', Xlx > 2&%.
2. We select then k as k = arg min{eL(X,Yk) -M
+ a2N 2 £ f = 1 ( u ^ ) 2 - 1
M}, now with eL(X, Yk) = &x - £ f = 1
Xlx(ulk)2.
We will denote the resulting optimal estimate of E[\Z — Yk\2 as e(M). In the preceding we can use any noise estimate. However, the same ideas leading to the previous results can be used to get such an estimate. We show how next. 3
Noise variance estimates
Notice that a natural choice for
y/2M + l + DM{2cj*) (COB(-MW'), COB((-M
SOO =
+ l)a/*),..., 1 , . . . , cos(Mu;*))T,
V2 /
V 2M + l - £ > M ( 2 w * ) (sin(-Ma;*), s i n ( ( - M + l ) w * ) , . . . , 0 , . . . , sin(Ma;*)) T ,
where DM(ij) = s i n ( M + l/2)a>/sin(o;/2) denotes the well known Dirichlet kernel and u* is an appropriately chosen frequency (the coefBcients appearing in front of these vectors normalize their length to one). It thus follows that, if Ul ~ C(ul,M) for an appropriately chosen UJ1'M frequency, that is, if ulk ~ cos{k(jjl'M) up to a normalizing constant, the frequency response component Hlk of Hk can be approximated by
243
2cos(fca/' M )e- ifc '"
__,, .
^ '
lM
iju
j=-M
l , JMf)e ,„-to P « ( " - ^ ' M ) + ^ = cos(fca/' 2M + 1 +
(" + <J-M)}
DM(2UJ1>M)
and a similar formula holds when we have instead Ul ^ S(ul'M). Given the behavior of the Dirichlet kernel, \Hlk\2 acts as narrow band filter, letting only pass those frequencies near ±u>l,M. This behavior extends to that of the full eigenfilter frequency response H%(UJ), which in practice verifies that I'Hfc^)!2 — 1 near ±ul'M, 1 < I < L, while being close to zero away from them. In other words, \H^ I2 shows a near 0-1 response, and can thus be assumed to be essentially concentrated in a frequency range of measure ^v\H^{u})\2duj. This integral can be shown to be equal to 2ir^2i(ulk)2. Therefore, the measure of the region of [—n, w] outside the "support" of H% can be taken to be 27r(l — J2\ ( u l) 2 )- These ideas suggest the following noise variance estimates
a%(M,L,k)
=
* f |PXH|(1 - | ^ M | 2 ) ^ , 2 2*(1-Ei=i«) )-'-*
(6)
These a2sr{M, L, k) actually overshoot the true noise variance, for we have the following. Proposition 3 The estimate a%(M,L,k) can be written as
^
(MLk)-
'Z-'ZLMO2 1
_a,
4-Ef^K)2
- Ej=i«)2
i - Ei K)2
Notice that for fixed M, these a2^ estimates depend again on L and k while, in turn, we want to use them to obtain the optimal L, k. To avoid this circularity and because of the overshooting observed, in our next section's illustration we will select first an M dependent estimate a%(M) as a%(M) =
mina2N(M,L',k'), k',L'
which will be the value actually closer to
244
noise % 10 20 40 70
opt. M 4 3 2 3
opt. L 4 3 2 2
est. L 2 2 1 1
opt k 1 2 0 0
est k 1 2 0 0
% noise removed 19 44 58 70
Table 1. Estimated M values for an AR 1 process at different noise levels (col. 2), its associated optimal and estimated L values (cols. 3, 4) and optimal and estimated k output indices (cols. 5, 6). Column 7 shows the percentage of noise removed.
4
A filtering example
We shall illustrate the above procedures on a sample series derived from an autoregressive (AR) process of order 1. No comparisons will be made with other methods: despite their simplicity, the filters obtained have reasonably good noise reduction capabilities, but there are several ways in which they can be enhanced. The general form of an AR1 process is zt+i = 4>zt + at, where A — (at) is independent white noise of a certain variance a\ and its power spectrum is p(w) = 0^/(1 — 2(f>cosuj + (j>2). Since this spectrum has a minimum value of cr\/(l +
245
e(M) = min^fc e/,(Z, Yk). Taking for instance figure 1, it suggests an optimal filtering width of 7 = 2 x 3 + 1, corresponding to M = 3. The 20 % line in table 1 suggests an optimal L of 3, while its estimation by the above procedures is 2. In other words, the optimal filter (i.e., the one derived from the theoretical eigenvalues and eigenvectors and the true noise variance) should use the 3 first eigenvectors while our sample based procedures suggest 2 eigenvectors. On the other hand, the optimal and estimated outputs have a common value of 2, which corresponds to a one step delay. The noise reduction achieved by this filter is 2.5 Db, about 44% of the initial noise variance. 5
Conclusions
In this paper we have theoretically characterized the optimal architecture of a linear autoassociative filter for one dimensional time series, and shown how to chose the best filtering output component of such a network. Although simple, the resulting niters have good noise reduction capabilities, that will be further improved in future work along two distinct directions. The first one is concerned with non linear extensions of the autoassociative networks discussed here. The second will retain the present linear structure, but combining simpler, one hidden unit networks with a previous multiresolution decomposition of the signal to be filtered. References 1. Diamantaras, K.I., Kung, S.Y., Principal Component Neural Networks: Theory and Applications, John Wiley Publ., (1996). 2. Brockwell, P.J., Davis, R.A., Time Series: Theory and Methods, Springer Verlag, (1991). 3. Baldi, P., Hornik, K., Learning in Lineal Neural Networks: A Survey. IEEE Transactions in Neural Networks, 6, 837-858, (1995). 4. Haykin, S., Adaptive Filtering Theory, Prentice Hall, (1996). 5. Pisarenko, V.F., The retrieval of harmonics from a covariance function, Geophysics Journal Royal Astronomical Society, 33, 347-366, (1973). 6. Tufts, D.W., Kumaresan, R., Estimation of frequencies of multiple sinusoids, Proceedings of the IEEE, 70, 975-989, (1982)
246
0.8 -
0.6 -
Figure 1. Evolution for M between 1 and 20 of e(M) (continuous line) and e(M) (dotted line) for an A R l signal to which 20% noise has been added. The top curve shows noise variance estimates, and the middle straight line true noise variance.
247 NEURAL NETWORK ARCHITECTURES: N E W STRATEGIES FOR REAL TIME PROBLEMS
A.
UGENA, F. DE ARRIAGA,
M. EL ALAMI
Universidad Politecnica de Madrid, Escuela T.S. de Ingenieros de Telecomunicacion Ciudad Universitaria s/n, 28040 Madrid E-mail: farriaga @mat. upm, es
There are a few rules to design neural networks most of them coming from experience. In the case of functional-link neural networks it is even worst because they are not well known in spite of their advantages which make them suitable for real time problems. To void the intuition-inspired design some strategies related to very well known mathematical techniques are proposed. They have been used for the solution of a real time problem: speech recognition.
The obtained results, from which only a sample has been included, show drastic reductions in the iteration number to reduce the error under a certain bound and an increase of the learning rate.
1
Introduction
Artificial neural networks (ANN) which can aid the decision process by learning from experience, are a suitable procedure to solve the function approximation problem in many well known applications. But several others, such as image processing, speech recognition and control on real time, demand efficient and rapid function approximation even in cases where the analytic expression of the function is unknown, although some function values could be known. But even when we decide to use AAN, some further decisions have to be made concerning the neural network type and the strategy of use or the specific network model. As far as the neural network type is concerned, if we concentrate on supervised learning, the multi-layer perceptron has been widely used in the literature in problems which do not have critical restrictions on time. In many real-time applications the multi-layer perceptron solutions are not appropriate due to the low learning rate and the big number of input-output pairs needed for training. In order to get rid of those drawbacks we have introduced functional links among the nodes of the neural network according to Pao [1]. The functional-link technique allows the incorporation of a set of functions {fo> fi. ••» fn } to each node under the name of functional expansion. That way when the node k is activated producing the output Ok, we also get
248 { fo(Ok), fl(Ok),.., fnCOk) }
as additional node outputs. The set of functions, if they are linearly independent, has the mission of increasing the output space dimension, producing the faster obtainment of the pattern separation hyperplanes. The set of chosen functions can be applied to the node output, as we described on the previous paragraph, and/or to the node input. The difference matters in the case of the input (first layer) or output (last layer). As we will show, the advantages of the different functional expansions will be decisive to choose the appropriate network model in connections to real time applications. As far the functional link method is concerned, it has to be emphasized that according to Sobajic [3] it is always possible to solve supervised learning problems with AAN without hidden layers by means of this method. 2
Theoretical background
It can be shown [4] that for continuous and piecewise continuous functions, functional link neural networks are universal approximators, that is to say : any piecewise continuous function can be approximated with error less than a chosen bound, by means of a functional link neural network without hidden layers in any real interval. 3
Main Strategies for Using Functional-Link Neural Networks
Among the strategies we have set up for using functional-link neural networks in real-time problems we would like to mention those related to known mathematical techniques [2]: 3.1 Lagrange's Neural Network This model follows Lagrange's philosophy of the interpolating polynomial or the elementary polynomials. Let fj = Cj (X - Xi )
( X - Xj.j ) ( X - X i + 1 )
be the set of elementary polynomials, such that f i (x i ) = l,f i (x j ) = O i f i ^ j Lagrange's interpolating polynomial will be fn* = Zp(Xi)fi (x)
( X - Xn )
249 where p(i) are the known values of the unknown function, and x;, i= l,..,n are chosen points of the independent variable. The set of elementary polynomials plus the constant function f0 = 1, will be chosen as the functional expansion applied to the input layer nodes. There will be no hidden layer and only one single node in the output layer. In consequence, the full set of increased inputs will be: { x b x2, x3,...,xn, f)(x), f2(x),
, fn(x) }
The net output will be expressed by O = F (Sxi*Wj + 0 ) where F is the activation function, Wj are the weights or network coefficients, x; are the real inputs and 9 is the threshold. If the weights related to X; are chosen equal to zero and F is the identity function, we get Lagrange's polynomial and the weights, after the net training, will coincide with the polynomial coefficients. 3.2 Other strategies: Taylor's, Newton, Mc.Laurin's, Fourier's,.., Neural Network Following a similar approach we can device many other strategies. In the case of the Taylor's Neural Model and supposing that the specific point for the development is 0, we will use the following set of functions: fO = 1; fl = (x-0); f2 = (x-0)2/2;
; fn = (x-0)"/ n i
The net can be trained according to the explained procedure. With this method we get not only the function approximation but also the derivatives at a certain point, because the final network weights are the derivatives at the point defined by the first pattern. Therefore, if the results can be expressed as: P(x) = f(0) + f (0) (x-0) + f'(0) (x-0)2/2 + then
f(0) = wO; f (0) = wl;
+ f* (0) (x-0)7nl ; I01 (0) = wn
being wO, wl,..wn the weights associated to fO, fl,..fn. Similarly for the remainder models. 4. Phoneme recognition Data for these phoneme recognition experiments were obtained from 100 continuos voice utterances ( ordinary conversation ) of different speakers, digitised at the rate
250 of 16 Khz. Because of that, two sources of noise were introduced with the phonemes: the consonant joined to the vowel, and the influence of adjacent phonemes. The patterns were extracted from the spoken sentences, parameters were obtained with the Matlab format and a total of 350 patterns for each vowel was available for the experiments. First of all we have considered sine and cosine expansions (Fourier's model) with the following options: a) sin (TIX ), cos (nx ); 24 expansions b) sin (7tx ), cos ( roc), sin (2rcx ), cos (2rcx ); 48 expansions c) sin ( nx ), cos ( JCX ), sin (2roc), cos (2nx ), sin ( 3rcx ), cos ( 3icx );72expansions d) up to 120 expansions. The results are as follows: recognition rate: (a) 85.1, (b) 88.7; (c) 89.9; (d)91.2 error: (a) 10'2; (b) 105; (c) 10"6 ; (d) 10"8 As the second possibility we have used a finite set of the Taylor expansion. In our particular problem we have used the following expansions: a) (xi - x) and (xi - x) related to the first 12 coefficients; 24 expansions b) (xi - x) and (xi - x)2 related to the 25 coefficients; 50 expansions c) (xi - x) and (xi - x)2 related to the 25 coefficients, and (xi - x)3 related to the first 12 coefficients; 62 expansions d) using terms up to order fourth with a total of 74 expansions; in this case the network cannot recognise. The results are the following: rate of recognition: (a) 90.6; (b) 91.2 ; (c)92.2 error: (a) 10; (b) 1; (c) 10"' The third possibility we have contemplated has been the Mc.Laurin's development with the following options: a) x2 and x3 of the first 12 coefficients; 24 expansions b) x2, x3 and x4 of the first 12 coefficients; 36 expansions c) x2, x3, x4 and x5 of the first 12 coefficients; 48 expansions d) x2 and x3 of the first 25 coefficients, x3 and x4 of the first 12 coefficients; 74 expansions. The rate of recognition reaches 93.7, the highest value so far obtained, corresponding to option d). Fig. 1 and Table lshow the variation of error with training and the rate of recognition for Newton's model; Fig. 2 and 2.1 for Lagrange's model. Table 3 gives the comparison among different models and, finally Table 4 gives details of the rate of recognition for the multilayer perception. 5 Related work and comparison Waibel [8] uses feed-forward neural networks for the approximation of functions. Sadaoki Furui [7] deals with the problem of the speaker recognition which is
FUNCTIONAL-LINK AND SPEECH
0
200
400
600
800
1000 1200 Epocas
1400 1600
1800 2000
Figure. 1
a e i 0
u
a 94 2 1,5 0,5 2
e 2,5 87.5 3,5 4,5 2
i 0,5 2 96.5 1 0 Tablel
Error ds la Red
104
s 10' ^10°
*w__ n
lio* 110 J a
iir*
\ m BOO
Figure 2
1000 1200 1400 Epocas
S 0.925 E 3. 0.92
u 0,5 1,5 0 3 95
0
1,5 0.5 2 91.5 4,5
1600
1600 2000
f*^*-^**\
/ r—. ~***~^S
NX
K
Ampliaciories
\ \
/ .
252
Enhanc. Error Rate% epoch Operati.
a e i o u
a 90 1 1 5.5 2.5
Trig. 120 0.02 91.2 2000 13.15x10 s
e 0 95,5 2,5 1,5 0,5
Newton 75 5.5 93 2000 9.52xl0 8
i 0 4,5 93,5 0,5 1,5
Lagrange 84 4.82 92.52 2000 8.36x10 s
0
3,5 2 3 85 6,5
u 1 1,5 4 7 86,5
different from ours; he also uses text-independent recognition methods obtaining lower rate of recognition. D. Charlet and D. Jouvet [6] have also studied the speaker recognition problem; they have used a text-dependent speaker verification system and the best results they have obtained were with a genetic algorithm; their error levels are higher than those we have obtained. 6 Conclusions From the results so far obtained it can be stated that functional-link neural networks are most suitable for a sort of problems needing the reduction of the training period, the reduction of the error level or of the computing time for solution. Among those problems the phoneme recognition is one which appears adequate for that technique. The results obtained with polynomial expansions, such as Fourier, Taylor and Mc.Laurin developments show important improvements in relationship to those got with the multilayer perceptron, specially in the value of the rate of recognition and error levels. References 1. Pao, Y., Adaptive Pattern Recognition and Neural Networks. Addison-Wesley, 1989. 2. Amillo, J. , Arriaga, F., Andlisis Matemdtico con Aplicaciones a la Computacion McGraw-Hill, 1987. 3. Sobajic, D., Neural Nets for Control of Power Systems. Ph.D. Thesis. Computer Science Dept. Case Western Reserve University, Cleveland, OH., 1988. 4. Ugena ,A. Arquitectura de Redes Neuronales con Ligadura Funcional. Ph.D. Thesis. Departamento de Matematica Aplicada, Universidad Politecnica de Madrid, 1997.
253
6. Charlet, D. And Jouvet, D. Optimizing Feature set for Speaker Verification. Pattern Recognition Letters 18. Elsevier Science B.V. 1997. 7. Furui, S. Recent Advances in Speaker Recognition. Pattern Recognition Letters 18. Elsevier Science B.V. 1997. 8. Waibel, A. . Neural Networks Approaches for Speech Recognition. Ed. Prentice-Hall. 1991.
255 E V O L V I N G S C O R I N G F U N C T I O N S W H I C H SATISFY PREDETERMINED USER CONSTRAINTS
MICHAEL L. GARGANO AND YING HE School of Computer Science and Information Systems, Pace University, New York, NY 10038, USA E-mail: [email protected], [email protected] WILLIAM EDELSON Department of Computer Science, Long Island University,, Brooklyn, NY. 11201, USA E-mail: edelson@hornet. liunet. edu A scoring function assigns non-negative values (i.e., scores) which help evaluate various items, situations, or people. For example, a professor would like to assign point values to each problem on an exam that was recently administered to the students in his/her class. The professor demands a minimum point value (which may be different for each problem) while the remaining points can arbitrarily be apportioned and added to each problem. After grading each problem for each student on a scale from 0.00 to 1.00, the professor would like the remaining points apportioned so that a specified grade distribution is attained. We propose a GA (i.e., genetic algorithmic) solution to this problem (and other related problems, e.g., loan scoring and personnel hiring).
1
Introduction
A scoring function assigns non-negative values (i.e., scores) which help evaluate various items, situations, or people. For example, a professor would like to assign point values to each problem on an exam that was recently administered to the students in class. The professor demands a minimum point value (which may be different for each problem) while the remaining points can arbitrarily be apportioned and added to each problem. After grading each problem for each student on a scale from 0.00 to 1.00, the professor would like the remaining points apportioned so that a specified grade distribution is attained. We propose a GA (i.e., genetic algorithmic) [1-8] solution to this problem (and other related problems, e.g., loan scoring and personnel hiring). 2
The Genetic Algorithm Paradigm
The genetic algorithm paradigm is an adaptive method based on Darwinian natural selection. It applies the operations of selection (based on survival of the fittest), reproduction using crossover (i.e., mating), and mutation to the current generation of a population of potential solutions to generate a new, typically more fit population in
256 the next generation. This process is repeated over a number of generations until an optimal or near optimal solution is obtained. A genetic algorithm offers the following advantages: a) b) c) d) e) f) g) h) 3
it will usually obtain an optimal (or near optimal) solution(s) it can obtain a satisficing solution(s) it has polynomial computational complexity it easily handles constraints it easily incorporates heuristics it is easy to understand it is easy to implement it is robust Mathematical Model
After administering an exam to a class of students, a professor would like to assign points to each question on the test so that a predefined grade distribution is obtained. The professor would like a method that is academically sound and curves the exams fairly and objectively. If the exam consists of n questions q1; q2, ..., q;,..., qn_i, qn , we would like to assign to each question a nonnegative value or score s(qi) > 0 so that the sum of the scores is l(i.e., 100%). The professor would like to assign lower bounds for the scores of each question b; > 0 so that s(qs) > b; for 0 < i < n. This will guarantee that each question is assigned a minimum point value which is in the professor's control. In general, 1 > S bj > 0 so that the remaining B = 1 - S bj points must be distributed amongst the n questions assigning some proportion p{ (0 < i < n) of this excess to each question. Therefore, score s(qj) = bj + Pi • B = bj + p. • (1 - £ bj) (with 0 < i < n). The professor wants this excess to be distributed so that a predefined grade distribution D is obtained for the class (for example, a normal distribution Nor(|J., a2) estimated by a frequency histogram with mean (average) u, and variance a 2 .) To accomplish this, the professor first grades each question on every students exam and assigns what proportion Tjj of each question the student got correct. Then student Sj would get a grade of Gj = Z T;j • s(qi) and we would like Gj - D.
257 4
Encodings
Each member of the population is an apportion array ( p b p 2 , ...,Pi, ..., p„-i, pn) of length n where pi + p 2 + . . . pj + ... + p„_i + p„ = 1 = £ Pi and with pi > 0 (for 0 < i < n). It is quite easy to generate an initial population of random members for generation 0 using a normalization method. First, simply generate an (xi, x2, ..., Xj, ..., xn_!, xn) consisting of n independent identically uniformly distributed random variables x; ~ uniform [0,1]. Then by calculating the normalized array (XJ/EXJ, x 2 /Ex;, ..., Xj/E x,, ..., x n _i/£xj, xn / E x;) we create a random apportion array (of course we must observe the caveat that Ex, > 0 ) .
5
Mating (Crossover) and Mutating
Selection of parents for mating involves choosing one member of the population by a weighted roulette wheel method favoring more fit members and the other member randomly. The reproduction process is a simple crossover operation whereby the two selected parent members swap randomly chosen positions to create new offspring members. The crossover operation produces an encoding for offspring members having element values which satisfy the apportion constraints. Two parents PI = ( p b p 2 , ...,pj,..., pn_!, pn) and P2 =(JI,, i^, ..., 7ii,..., 7Cn_i, 7t„) can be mated to produce two offspring children CI and C2 where C I = (pj/s, T^/S, . . . , p / s , ..., TC-j/s, p n /s) With pi + 7C2 + ...+ Pi + ...+ 7Cn_i + p n = S
and C2 = (7ij/t, p 2 /t,.... 7t/t,..., p„-i/t, V 0 with 7ti + p 2 +...+ 7tj+...+ p„.i + 7tn = t Similarly, a random population member can be mutated. Mutation is carried out by randomly choosing a member of the population and then randomly changing the value(s) of its encoding (genotype) at randomly chosen positions subject to the apportion constraints. For example, M = (pi/s, rt2/s, ...,p/s,..., 7C„_i/s, pn/s) with p] + %i + •••+ Pi + •••+ ftn-i + Pn = s where positions 2,..., n-1 have been randomly mutated on the cloned member P = ( p b p 2 , ...,pi,..., pn_i, p„).
6
Fitness
After we have found the phenotype (si, s2, .. .,Si,..., s„.i, sn) for a population member P = (Pi. P2, ---,Pi,..., P„-i, P„) by applying the s(qi) = bs + ps • B,
258
we can find all the grades Gj for that scoring function and we can then find a frequency histogram H (i.e., a distribution). As a simplefitnessmeasure we can sum the absolute values of the differences in each of the pre-selected frequency intervals I to obtain: fitness of population member P = | #Dr - #1^ |. The smaller the fitness value the better the approximation to the predefined distribution. 7
Genetic Algorithm Methodology
We are implementing a genetic algorithm (GA) for the scoring problem using feasible encoding schemes for apportion arrays (described earlier). Our GAs create and evolve an encoded population of potential solutions (i.e., apportion arrays) so as to facilitate the creation of new feasible members by standard mating and mutation operations. (A feasible search space contains only members that satisfy the problem constraints for an apportion array. When feasibility is not guaranteed, numerous methods for maintaining a feasible search space have been addressed [7], but most are elaborate, complex, and inefficient. They include the use of problem-dependent genetic operators and specialized data structures, repairing or penalizing infeasible solutions, and the use of heuristics.) By making use of a problem-specific encoding and normalization, we insure a. feasible search space during the classical operations of crossover and mutation and, in addition, eliminate the need to screen during the generation of the initial population. We adapted many of the standard GA techniques found in [1, 8] to these specific problems. A brief description of mese techniques follows. The initial population of encoded potential solutions (genotype) is randomly generated. Each encoded population member is mapped to its equivalent scoring function (phenotype). Selection of parents for mating involves randomly choosing one very fit member of the population while the other member is chosen randomly. The reproductive process is a simple crossover operation whereby two randomly selected parents are cut into three sections at some randomly chosen positions and then have the middle parts of their encodings swapped and normalized to create two offspring (children). In our application the crossover operation produces an encoding for the offspring that have element values that always satisfy proportion constraints. Mutation is performed by randomly choosing a member of the population, cloning it, and then changing values in its encoding at randomly chosen positions and normalizing so as to satisfy the proportion constraints. A grim reaper mechanism replaces low performing members in the population with newly created more fit offspring and/or mutants. The GA is terminated when either no improvement in the best fitness value is observed for a number of generations, a certain number of generations have been examined, and/or a satisficing solution is attained (i.e., the predefined distribution is not precisely the same, but is satisfactorily close).
259
We now state a generic form of the genetic algorithm paradigm: 1) randomly initialize a population of encoded potential solutions (members) 2) map each new member (genotype) to its scoring function (phenotype) 3) calculate the fitness of any member which has not yet been evaluated (that is, how close the distribution is to the target distribution) 4) sort the all members of the population by fitness 5) select one parent for mating from by using the roulette wheel method and the other randomly 6) generate offspring using simple crossover 7) mutate randomly selected members of the population 8) replace the lower half of the current generation with new offspring and mutated members 9) if a termination criteria is met then return the best member(s) else go to 2
8
Related Problems
Two related problems are loan scoring by lending institutions and personnel selection by human resource functions. In the loan scoring problem, there is a record containing facts concerning the person who is requesting the loan and points are assigned based on an expert loan specialist's subjective judgement. A genetic algorithmic approach could lower the lender's risk, provide better investment returns, and be less biased by providing loans to a more diverse population. In the personnel selection problem, we can give an assessment instrument to measure what differentiates successful employees from non-successful employees. We can then assign the point values constrained by the fact we wish to give higher grades to the successful employees and lower grades to the non-successful ones. In this way we can create instruments which can better predict successful potential candidates from less successful candidates for a position in the future.
9
Conclusion
This research is a nice application of GAs to a real world problem. In the future we would like to get more data and perform more experiments on the related problems discussed in section 8.
260 10 Acknowledgement We wish to thank Pace University's School of Computer Science and Information Systems (SCSIS) and Long Island University's Computer Science Department for partially supporting this research. References 1. 2. 3.
4.
5.
6.
7. 8. 9.
Davis, L., Handbook of Genetic Algorithms, Van Nostrand Reinhold, (1991). Dewdney.A.K., The Armchair Universe - An Exploration of Computer Worlds, W. H. Freeman & Co., (1988). Edelson, W. and M. L. Gargano, Minimal Edge-Ordered Spanning Trees Solved By a Genetic Algorithm with Feasible Search Space, Congressus Numerantium 135, (1998) pp. 37-45. Gargano, M.L. and W. Edelson, A Genetic Algorithm Approach to Solving the Archaeology Sedation Problem, Congressus Numerantium 119,(1996) pp. 1 9 3 - 2 0 3 . Gargano, M.L. and W. Edelson, A Fibonacci Survival Indicator for Efficient Calculation of Fitness in Genetic Paradigms, Congressus Numerantium 136, (1997) pp. 7 - 1 8 . Gargano, M.L. and Rajpal, N., Using Genetic Algorithm Optimization to Evolve Popular Modern Abstract Art, Proceedings of the Long Island Conference on Artificial Intelligence and Computer Graphics, Old Westbury, N.Y., (1994), pp. 38-52. Michalewicz, Z., Heuristics for Evolutionary Computational Techniques, Journal of Heuristics, vol. 1, no. 2, (1996) pp. 596-597. Goldberg, D.E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison Wesley, (1989). Rosen, K.H., Discrete Mathematics and Its Applications, Fourth Edition, Random House (1998).
261 GENETIC ALGORITHMS FOR MININING MULTIPLE-LEVEL ASSOCIATION RULES NORHANA BT. ABDUL RAHMAN ARABY AND Y.P.SINGH Faculty of Information Technology, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia E-mail: [email protected] This paper presents genetic algorithms formulation and generalization for mining multiple-level association rules from large transaction databases, with each transaction consisting of a set of items and a taxonomy (is-a hierarchy) on the items. The necessity for mining such rales are of great interest to many researchers [l]-[3]. Some criteria are investigated for pruning redundant rules. Example rales found in database transactions are presented.
1.
Introduction
Genetic algorithms have been used in concept learning in machine learning areas and mining association rules [4]. The proposed study investigates genetic algorithms for mining association rules and multiple-level rules for large database applications. The algorithm randomly generates an initial population of itemsets. Those itemsets' fitness will be rated by its frequency of occurrence as subset of given transactions. Itemsets that fit the user specified support and confidence threshold will survive. They will then replicate according to their fitness, mutate randomly, and crossover by exchanging parts of their subsets (substructures). The algorithm again evaluates the new itemsets for their fitness, and the process repeats. During each generation, the genetic algorithm improves the itemsets (individuals) in its current population. The genetic algorithm stops where there is little change in itemsets fitness or after some fixed number of iterations. The algorithm will finally return a set of frequent itemsets (hidden transactions) from different generations. The multiple-level association rule mining requires the following: 1. 2.
A set of transactions and taxonomy (is-a hierarchy) on items in the transactions. Efficient methods for multiple-level rule mining.
The genetic algorithm is proposed here for mining multiple-level association rules considering extended transactions, i.e. transactions having items combined with taxonomies. We consider the database consisting of a set of transactions and items' taxonomies as shown in Figure 1. Finding associations between items at any level of the taxonomy is known as mining multiple-level association rules.
262
Phase-1 : Find Extended transactions Given a set of transactions and the taxonomy, add all ancestors of each item in a transaction to the transaction as given below: Items in Transactions + taxonomies (items' ancestors) = Extended Transactions Phase-2 : Run genetic algorithm designed for finding frequent itemsets [4] and later find the multiple-level association rules.
Transactions w + taxonomies
2.
Frequent
Extended Pre process
transaction
Genetic Algorithm
itemsets
Algorithm
Multiplelevel association rules
Genetic Algorithms
The currently most important and widely known representatives of evolutionary computing techniques are: genetic algorithms (GAs), evolution strategies (ESs), and evolutionary programming (EP). These techniques are applied in problem solving by applying evolutionary mechanism. In the following we present a brief review of genetic algorithms, the evolutionary computing techniques and their use for machine learning problems. The basic evolutionary algorithm can be represented as given below: f~0; initialize P(t) ; (generalize initial population) evaluate P(t); While not terminate (P(t)) do Select: P(t):=jrom_P(t); Recombine: P'(t):=r(P(t); Mutate: P-(t):=m(PW; P"(t); Evaluate: t:= t+1; Od Return (best individual in P(t));
263
In this algorithm, P(t) denotes a population of individuals at generation t. Q(t) is a special set of individuals that has to be considered for selection and P" is offspring individuals. At the present time, genetic algorithms are considered to be among the most successful machine-learning techniques and are also used as general-purpose search techniques for solving complex problems. Based upon genetic and evolutionary principles, GAs work by repeatedly modifying a population of individuals through the application of selection, crossover, and mutation operators. The choice of an representation of individual (encoding) for a particular problem is a major factor determining a GAs success. GAs have been used for optimization as well as to classification and prediction problems with different kinds of encoding . A GAs fitness function measures the quality of a particular solution. The traditional GA begins with a population of n randomly generated individuals (binary string of fixed length I), where each individual encodes a solution to the task at hand. The GA proceeds for several number of generations until the fitness of the individuals generated is satisfied. During each generation, the GA improves the individuals in its current population by performing selection, followed by crossover and mutation. Selection is the population improvement or "survival of the fittest" operator. According to Darwin's evolution theory, the best individuals should survive and create new offspring for the next generation. Basically, selection process duplicates structures with higher fitness and deletes structures with lower fitness. There few methods which can be used for selection, such as proportional selection, tournament selection, roulette wheel selection, Boltzman selection, rank selection, steady state selection and some others. Crossover, when combined with selection, results in good components of good individuals yielding better individuals. The offspring are the results of cutting and splicing the parent individuals at various crossover points. Mutation creates new individuals that are similar to current individuals. With a small, prespecified probability (pm [0.005, 0.01] or pm = Ml where / is the length of the string representing individual), mutation randomly alters each component of each individual. The main issues in applying GAs to data mining tasks are selecting an appropriate representation and an adequate evaluation function.
264
3.
Simulation Result
Illustrative Example Given a sample taxonomy saying that Skimmed Milk is-a Milk is-a Drink and Bread is-a Food. We can infer a rule saying that "people who buy milk tends to buy bread". This rule may hold even though rules saying that "people who buy skimmed milk tends to buy bread" and "people who buy drink tends to buy bread" do not hold. Drink (1) MfflT(l) Skimmed Milk(l)
Food (2)
Mineral Water (2)
Pasteurized Milk (2)
Fruit (1)
Apple^T)
Bread (2)
Orlhge (2)
Figure 1: Example of taxonomy Let I = {Skimmed Milk, Pasteurized Milk, Mineral Water, Apple, Orange, Bread} - set of items Let T = {{Skimmed-Milk, Bread},{Mineral Water, Apple},{Pasteurized Milk, Bread}, {Pasteurized Milk, Bread}} = {Ti, T2, T3,T4} - sets of transactions Let • = {Milk, Drink, Fruit, Food} - items' ancestors Item, I (Leaf at the taxonomy tree) Skimmed Milk Pasteurized Milk Mineral Water Apple Orange Bread Table 1: Encoded items
Hierarchyinfo code 111 112 12 211 212 22
Normal individual bits 100000 010000 001000 000100 000010 000001
265 Ancestors, • Milk Drink Fruit Food
Hierarchy-info code 11 1 21 1
Table 2: Encoded ancestors The hierarchy-info code represents the position (level) of an item or ancestor in the hierarchy. For example, the item 'Pasteurized Milk' is encoded as '112' in which the first digit T represents 'drink' at level 1, the second digit ' 1 ' for 'milk' at level '2' and the third digit '2' represents the type 'Pasteurized Milk' at level 3. Hence, the more digit an item is encoded as, the deeper level it is in the hierarchy and vice versa. Transactions, T Ti T2 T3 T4
Extended Transactions {111,0,0,0,0,22} {0,0,12,211,0,0} {0,112,0,0,0,22} {0,112,0,0,0,22}
Normal transactions {1,0,0,0,0,1} {0,0,1,1,0,0} {0,1,0,0,0,1} {0,1,0,0,0,1}
Table 3: Transactions In an extended transaction, the bit position reflects the item involved in the transaction while the bit content reflects its hierarchy. A.
Finding Frequent Sets using Genetic Algorithms
GA Parameters Population size, pop size =10 Individual item size, ind_size = 6 Probability of crossover, pc = 0.6 Probability of mutation, pm = 0.01 An initial population is randomly generated where each individual consists of six '0' or ' 1' bits. These bits only represent the items at the lowest level (the leaves of a taxonomy tree) and don't include the ancestors.
266
All the individuals in the initial population is first evaluated to determine their fitness value before they are selected for further process of crossover and mutation. Each individual is compared with the normal transactions of itemsets in the database. The more frequent the individual occurs in the normal transactions, the higher its fitness value is. Roulette-wheel selection method is chosen to select the best individuals in the population, based on their fitness value. The fitter the individual is, the more chances they are to be selected. Crossover and mutation are two basic operators of GA and they may effect the performance of GA. Those individuals are randomly chosen and switched at a randomly chosen crossing point, between 1 to indsize. In this experiment, single point crossover is done where only one crossover point is selected. Other crossover methods for binary encoding are two point crossover, uniform crossover and arithmetic crossover. Mutation is then performed at a very low mutation probability. Bits are inverted randomly. The population is then evaluated again. These individuals are then passed to the next generation for selection, crossover, mutation and evaluation again. The process repeats for several generations, each time improving the fitness of the population. Finally, the GA process will generate a final population, consisting of the most frequent individuals or itemsets. B.
Construction of Multiple-Level Association Rules
From all the frequent sets which were generated by GA, we only choose the single frequent sets. They are then expanded and converted into hierarchy-info code. For example, the single frequent itemsets generated are {000001} and {010000}. Item
Normal individual
Ft
Bread
000001
Pasteurized Milk
010000
0,0,0,0,0,22 0,0,0,0,0,2* 0,112,0,0,0,0 0,11*,0,0,0,0 0,1**,0,0,0,0
Fitness/ Support 3 4 2 3 4
These single hierarchy-info encoded individuals, Fi are paired and evaluated. However, note that an item is not paired with its own ancestors in order to avoid uninteresting rules, such as A->ancestor(A) or ancestor(A) -»A. As for evaluation, each paired hierarchy-info encoded individuals, F2 is compared with the extended transactions. The number of occurrences of F2 in the extended transactions determines the fitness or support value.
267
F2 0,112,0,0,0,22 0,11*,0,0,0,22 0,1**,0,0,0,22 0,112,0,0,0,2* 0,11*,0,0,0,2* 0,1**,0,0,0,2*
Fitness/Support 2 3 3 2 3 4
The bit digits which are substituted with '*' are not taken into consideration when making comparison with the extended transactions. For example: 0,1**,0,0,0,2* is scanned through the extended transactions. If 1** and 2* (regardless of the bit position) is found in the extended transactions, then thefitness/supportvalue is incremented. From Fi and F2, we'll derive the multiple-level association rules. Let confidence threshold, y=0.8 Pasteurized Milk, Bread: PM-»B ifsupport(PMuB) = 2 > 0.8 (0,112,0,0,0,22) support(PM) 2 B->PM if aupporuXMvfi) = 2 < 0.8 support(B) 3 Milk, Bread: Milk-»Bread ifsupport(MuB) = 3 > 0.8 (0,11*,0,0,0,22) support(M) 3 Bread->Milk ifsupportfMuB) =JL> 0.8 support(B) 3 Drink, Bread: Drink-»Bread ifsupportfDuB) = 3 < 0.8 (0,1**,0,0,0,22) support(D) 4 Bread-»Drink if support(DuB) = 3 > 0.8 support(B) 3 Pasteurized Milk, Food: PM-»Food if supportfPMuF^ = 2 > 0.8 (0,112,0,0,0,2*) support(PM) 2 Food->PM ifsupportfPMuFI = 2 < 0.8 support(F) 4 Milk, Food: Milk-»Food if supportfMuF) = 3 > 0.8 (0,11*,0,0,0,2*) support(M) 3
268
Food-»Milk ifsupportfMuF) = 3 < 0.8 support(F) 4 Drink, Food: (0,1**,0,0,0,2*)
Drink—»Food if surjp_ort(DuF} = 4 > 0.8 support(D) 4 Food->Drink if supportfDuF) = 4 >0.8 support(F) 4
From the above computation, the multiple-level association rules derived are: Pasteurized Milk—>Bread, Milk—»Bread, Bread—>Milk, Bread—>Drink Pasteurized Milk—>Food, Milk—>Food, Drink—>Food, Food—>Drink The result shows that an item from any level can be associated with another item (from any level tob); regardless of whether its ancestors or descendants are also associated or not. For example, Milk—»Bread implies although Drink—»Bread doesn't imply. 4.
Conclusion
In this study, we have extended the scope of mining association rules from single level to multiple levels, using Genetic Algorithms. The major issue which was taken into account is the conversion of a given transactions and taxonomies into the extended transactions so that the bits encoded can reflect both the items and their hierarchy in the taxonomy. Mining multiple-level association rules may results in discovery of refined knowledge from a given transactions of data. References 1. Agrawal, A., T. Imielinski, and A. Swami, Mining Association Rules Between Sets of Items in Large Databases, Proc. 1993 ACM SIGMOD Int'l Conf. Management of Data, Washington, D.C., May (1993), pp. 207-216. 2. Agrawal Rakesh and Ramakrishnan Srikant, Mining Generalized Association Rules, Proc. 21st VLDB Conference, Zurich, Switzerland, (1995). 3. Han, J. and Y. Fu, Mining Multiple-Level Association Rules in Large Databases, technical report, (University of Missouri-Rolla, 1997). 4. Singh, Y.P. and Norhana Abdul Rahman Araby, Evolutionary Approach to Data Mining, Proc. IEEE ICIT, Goa, India, (2000).
269 A CLUSTERING ALGORITHM FOR SELECTING STARTING CENTERS FOR ITERATIVE CLUSTERING
ANGEL GUTIERREZ Department of Computer Science, Montclair State University, Upper Montclair, NJ 07043, USA E-mail: [email protected] ALFREDO
SOMOLINOS
Department of Mathematics and Computer Information Science, Mercy College, 555 Broadway, Dobbs Ferry, NY 10522, USA E-mail: [email protected] Iterative clustering algorithms are strongly dependent on the number and location of the starting centers. We present some examples of this dependence for two classes of algorithms: fuzzy clustering and competitive learning. In order to select an optimal location for the starting centers, we propose a non-iterative clustering algorithm which creates groups of points based on the average distance of each point to its closest neighbor, and merges the groups so obtained into clusters. The radius of attraction of each point is defined as the average of the distances from every point to its closest point plus a factor times the standard deviation. Adjusting this factor we can vary the number of groups generated. We merge those groups that are close in the sense of the Haussdorf distance. The algorithm allows declaring the minimum number of points that can constitute a group. The user can then drop those points that do no constitute a group, merge them with the closest group if they fall inside the radius of attraction of that group, or allow them to stand as an independent group.
1
Introduction
Clustering algorithms are used in a variety of fields, data mining, statistical data analysis, pattern recognition, for example, using radial basis functions neural networks and, in general, in preprocessing data for classification algorithms. Most of the algorithms used in clustering are iterative. Starting with a partition, in classes or, equivalently, with the centers of the classes, the algorithm moves the centers, or redefines the classes, for a fixed number of iterations or until a fitness function reaches a certain level. The efficiency of these iterating algorithms depends strongly on the selection of the initial groups. The main problem lies in guessing the correct number of groups. But the location of the starting centers can have a huge impact on the algorithm performance.
270
In our experience, not all the groups in a data set have the same number of points, and the average distance between the points in one group is not the same as the average distance in another group of the same set. Thus we have created two-dimensional data samples with these properties. Some of the methods we use to illustrate clustering problems work better with uniformly distributed groups of the same number of elements. So we have also used uniform samples. In Figure 1 we present the sample we will use in most of the examples. It contains 4 groups of 6, 12, 18 and 24 points, generated with random amplitudes of 0.2, 0.3, 0.4, and 0.5. Clearly, a person would discern four groups. But there are points that could be considered a subgroup, and this could have importance in the classification of signals for medical diagnosis. A distinct subgroup could be the telltale feature that would help diagnose a clinical condition. Or it could be simply an outlier. We want to be able to choose what to do with such subgroups. Drop them, merge them with the larger one, or give them independent meaning.
c:,\
Figure 1. Sample data. Four groups with different number of points and different point spread.
*
# •
\
.**** & * •
*
—
* • » •
*
•
t - *****
Figure 2. Effect of choosing three centers and six centers using fuzzy C-means
•
271
2 2.1
Importance of the initial center selection. Some examples Choosing the wrong number of centers
We first use fuzzy C-means [2]. Starting with three centers, one group is completely unaccounted for. On the other hand, starting with more centers provides us with two subgroups, clearly differentiated in the left group; not so clearly in the bottom one (Fig. 2). ^mmmmmi mmmmmr
a£fn
era
*9"i
£*
*^l|»* * *
***.*
. 4£? .»«*' Figure 3. Competitive learning. Four and five centers.
In Figure 3 we show the results of competitive learning, [4]. The initial centers are attracted by the points in the groups. We present the trace of the motion of centers. The starting centers location was selected to obtain a reasonable result. At the left, four centers start at the four corners and then move to the groups. At the right, we have an extra center at the bottom, which moves straight up. 2.2
Influence of the starting location
We use Frequency Sensitive Competitive Learning [1]. We start with six centers at the top of the screen. Three of them are captured by the top group and one ends up between two groups. We show the final position of the centers at the right (Fig. 4). Rival Penalized Competitive Learning [3] expels the centers that are not needed. It works very well when all groups have the same number of points. Two are sent out of the figure. Four are placed at the center of each cluster. But, if we choose to place all six centers at the top left corner, things can go very wrong, Five centers are sent out of the picture and only one center occupies the middle of the screen (Fig. 5).
272
}"^:: s •. .'.-JIJU.-.".•;• >: ":;:.:'.: iV.'": 'g. *!!•: :'..• ": : lg centers
• •*.*..
^
\>
°\
Figure S. Rival Penalized Competitive Learning. Six ini.iui coitcra.
•
c" «
« *g^ a *
Figure 6. Rival Penalized Competitive Learning. All centers start at top left corner.
273 It should be clear by now that choosing the number and locations of the starting centers is an important task and that it deserves the effort of preprocessing the data. 3
Description of the pre-selection deterministic algorithm
The algorithm works like the creation of clouds. Water particles cling to their closest neighbors to form droplets, and, then, the droplets bunch together to create the clouds. Thus, we first create grouplets by finding the closest neighbors of each point; then, we merge the grouplets into clusters. 3.1
Creating the grouplets •
•
3.2
Find the radius of attraction of the points: Create a matrix of the distances from each point to all the others. Sort the matrix, by rows. Find the average and standard deviation of the first non-zero column, the distances to the closest point. Define the radius of attraction as the average closest distance plus a factor times the standard deviation. Taking this factor small creates a lot of small grouplets. Making it big increases the radius of attraction and the grouplets are bigger. Start with any point. Find all other points inside its radius of attraction. Recursively find the points inside the radius of attraction of the points just added. Stop when there are no more points in the radius of attraction of all the points in the grouplet. Creating clusters by merging the grouplets
•
•
Find the radius of attraction of the grouplets. Using the Hausdorff distance find the average and standard deviation of the distances from each grouplet to its closest neighbor. Define the radius of attraction as the average of the closest distances plus a factor times the standard deviation. Taking this factor small we will merge few grouplets and the clusters will be just the grouplets. Taking the factor large we will have clusters made out of several grouplets Merging the grouplets: Find the number of points in the grouplets - if it a singleton or a doublet we may want to drop it. If the number of points is less than the minimum number of points, apply the chosen strategy, drop or merge. If the grouplet has more than the minimum number of points, find all the
274
grouplets inside its radius of attraction. They would form the cluster. Merge them together. 3.3
Example. Clustering the above sample
figure 7. Ten grouplets, two of them solitons, are merged into five clusters. By adjusting the radii of attraction we could create less grouplets and clusters.
4
Discussion
We have presented a non-iterative method for selecting starting centers for iterative clustering. The method is very flexible and avoids the problems related to choosing the wrong number of centers, or to placing them in the wrong starting location. References 1. Ahalt S.C., Krishnamurty A.K., Chen P. and Melton D.E., Competitive Learning Algorithms for Vector Quantization, Neural Networks 3 (1990) pp. 277-291 2. Jang J.R., Sun C. and Mizutani E., Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (Prentice Hall, New Jersey, 1997) 3. Krzyzak A. and Xu L., Rival Penalized Competitive Learning for Clustering Analysis, RBF Net and Curve Detection. IEEE Transactions on Neural Networks 4 (1993) pp. 636-641. 4. Rummelhart D.E. and Zipser D., Feature Discovery by Competitive Learning, Cognitive Science 9 (1985) pp. 75-112.
275
DIMENSION REDUCTION IN DATAMINING H.M. HUBEY, I. SIGURA*, K. KANEKO*, P. ZHANG* Department of Computer Science, Montclair State University, Upper Montclair NJ 07043 E-mail: [email protected] * Email: [email protected] A complete, scalable, parallelizable, and unified method combining Boolean Algebra, fuzzy logic, modified Karnaugh maps, neural network type training and nonlinear transformation to create a mathematical system which can be thought of as a multiplicative (logical-AND) neural network that can be customized to recognize various types of data clustering. The method can thus be used for: (1) displaying high dimensional data, especially very large datasets; (2) recognizing patterns and clusters, with the level of approximation controllable by the user; (3) approximating the patterns in data to various degrees; (4) preliminary analysis for determining the number of outputs of the novel neural network shown in this manuscript; (5) creating an unsupervised learning network (of the multiplicative or AND kind) that can be used to specialize itself to clustering large amounts of highdimensional data, and finally; (6) reducing high dimensional data to basically three-dimensions for intuitive comprehension by wrapping the data on a torus [1], The method can easily be extended to include vector time series. The natural space for high dimensional data using the natural Hamming metric is a torus. The specifically constructed novel neural network can then be trained or fine-tuned using machine-learning procedures on the original data or the approximated/normalized data. Furthermore we can determine approximately the minimal dimensionality of the phenomena that the data represent.
1
Introduction
There are a set of related problems in the fields of datamining, knowledge discovery, and pattern recognition. We don't know how many neurons should be in the hidden layer or the output layer. Thus if we attempt to use ANNs for clustering as a preliminary method to finding patterns we must use heuristic methods to determine how many clusters the ANN should recognize (i.e. what is the rank/ dimension of the output vector). This is just another view of the problem in datamining of knowing how many patterns there are in the data and how we would go about discerning these patterns. There is a related problem in k-nearest-neighbors clustering in which we need an appropriate data structure to be able to efficiently find the neighbors of a given input vector. Indeed, before the k-neighbors method can be used to classify an input vector we need to be able to cluster the training input vectors and an ANN might have been used for this process. The problem of knowing how many patterns (categories or classes/clusters) there are is an overriding concern in datamining, and in unsupervised artificial neural network training. Typically the basis of all datamining is some kind of a clustering technique which may serve as a preprocessing, and data reduction technique which may be followed by other algorithms for rule extraction, so that the data can be interpreted for and comprehended by humans. Prediction and classification may be a goal of the process also. The major clustering methods can be categorized as fol-
276
lows [2]: (i) Partitioning Methods; (ii) Hierarchical Methods; (iii) Density-based Methods; (iv) Grid-based Methods; (v) Model-based Methods; 2
Boolean algebra, K-maps & Digital Logic
A Karnaugh map (K-map) is an 2D array of size at most 4 x 4 which represents a Boolean function. The arrangement of cell-addresses (nodes) is such that the numbering scheme follows the Gray code. An r-bit Gray code is an ordering of all r-bit numbers/strings so that consecutive numbers differ precisely one bit position. Thus a Gray code is a sequence of r-bit strings such that successive numbers are apart by distance one using the Hamming distance. The specifics of the K-map make it possible to perform Boolean algebraic simplifications and reductions graphically. For low dimensional spaces (i.e. n < 4), there is a natural distance (Hamming) metric defined on the K-map. It is reminiscent of the city block metric used in data mining procedures. The K-map is a 2 x 2 array where 1
'A
/I
/
'
I
/
010 7 1
;
f •
'
' - <•
/
J
100
i
ft
/ /011
Figure 1: Hypercubes, Gray Code, and Data Clustering: The graphics shows (i) the Gray coding (ii) the'growing' of an n-dimensional hypercube from an (n-1) dimensional hypercube, and (iii) the 'shrinking' of the center. For n > 4 the K-map needs to be modified (KH-map) so that it is a metric space [1]. The ideal visualization tool for high dimensional data is the KH-map. Karnaugh map is used for small dimensions. "For more than four variables, the Karnaugh map method becomes increasingly cumbersome. With five variables, two 16x16 maps are needed, with one map considered to be on top of the other in
277
three dimensions to achieve adjacency. Six variables requires the use of four 16x16 tables in four dimensions! An alternative approach is a tabular technique, referred to as the Quine-McCluskey method." [3]. The 4-variable K-map corresponds to a 4-dimensional hypercube. More on how this can be used in clustering, please see below. In this space, each node is adjacent to 4 other nodes. This notion of 'neighborliness' can be visually realized on the 4-variable K-map. The K-map can be wrapped on a cylinder and then the cylinder is bent into the shape of a torus so that the corner cells become neighbors. [4]. We can also create maps similar to K-maps and use them in ways similar to grid-based methods, and also wrap them on a torus. In finding clusters (as in classification/categorization) for an input vector of rank n there are easily 2 possible outputs (clusters). This simplification results from reducing the inputs to binary vectors and allowing only the corners of the hypercube to represent clusters. This is nothing but the decoder problem. The decoder is a Boolean circuit (a classifier!) that identifies the 'coded' input vector (binary string). If we use fuzzy-gates instead of crisp Boolean gates, we will have our classifier with one caveat: the n-dimensional hypercube is the natural space n of this phenomenon, however, the phenomena-space size is 2 . We need to be able to cluster the data which might be spread out over this space. One can easily be constructed using fuzzy logic however, with some thought the procedure can have some desirable properties. 3
Hypercube, Datacube, and the Karnaugh map
The n-dimensional hypercube has TV = 2 nodes and n2 edges. Each node corresponds to an n-bit binary string, and two nodes are linked with an edge if and only if their binary strings differ in precisely one bit. Each node is incident to n = lg{N) [where lg(x)=log2(x)] other nodes, one for each bit position. An edge is called a dimension-k edge if it links two nodes that differ in the kth bit posik tion. The notation u is used to denote a neighbor of u across dimension k in the hypercube [5]. Given any string u = u,...u,
k w, the string u is the same as u
except that the kth bit is complemented. The string u may be treated as a vector. Using d(u, v) the Hamming distance V«W[d(M, U ) = 1] . The hypercube is node and edge symmetric; by just relabelling the nodes, we can map any node
278
onto any other node, and any edge onto any other edge. That is, for any pair of edges (w, v) and (u, v) in an N-node hypercube H , there is an automorphism O of H
such that CJ(W) = u and <J(V) = v . An automorphism of a graph is a
one-to-one mapping of nodes to nodes such that edges are mapped to edges. If u - u,ur>...u,
N
and u =
MJJ^-.-W/P/V,
men
^ or
an
y permutation n on
{1,2, ..., IgN} we can define an automorphism a with the desired property by [5] <5{xxx2...xXogN) = U J t ( 1 ) ® « 7 [ ( 1 ) © « i ) * ( ^ ( 2 ) © » n ( 2 ) e " 2 ) * -*
0)
*(xn(lgN)®un(lgN)®ulgN)
where p.*v denotes the concatenation of string |i with string v . 000
111
011
\
110|
J
e
7
1001
101
/
010
nm
\
001
Figure 2: Graph Automorphism. The automorphism on an input-vector hypercube is equivalent to a permutation of the components of the input vector. As a simple example (Fig 2) the automorphism o(x,XyK^) = x~*{xr, @ l)*x,
(2)
maps the edge (010,011) to edge (110,100), where the © indicates an XOR. Other examples can be seen in Leighton[5]. Any nD (n-dimensional) data can be thought of as a series of (n-l)D hypercubes. This process can be used iteratively to reduce high-dimensional spaces to visualizable 2D or 3D slices. Properties of high-dimensional hypercubes are not intuitively straightforward. The datacube that is used in datamining [2] is a lattice of cuboids, 2D or 3D slices of which can be displayed by common datamining programs. There is no need to look all over the hypervolume for high-dimensional data sets; we need to look only along the surface (or even near the corners (nodes)). Vectors in fi
{0, 1}
n
are called binary vectors, and the vectors in {-1, 1}
are bipolar vec-
279 tors. These vectors are the sets of corners or vertices of {0, 1} respectively. The points in {0, 1}
and {-1, 1}
are located at distances (0 to Jn) from the
origin but the vectors in {-1, 1} are all of length Jn. Therefore {-1,1} is a subset of the hypersphere of radius Jn in 3i . The domains of such vectors are the hypercubes [6]. In high-dimensional Euclidean spaces the volumes and areas of hypercubes and hyperspheres are counterintuitive. Hypercubes in n-dimensional spaces are highly anisotropic, something like spherical porcupines [5]. Thus, most of the data in a high-dimensional space will be found at the corners of the hypercube of the normalized datacube. Now, all we need to do is normalize the input data vectors to {0,1}. Then obviously, the hypercube is the natural data structure for datamining. Furthermore, for visualization and manipulation, we need another kind of a data structure, and this data structure is the KH-map[l]. 4
Fuzzy Logic, Neural Networks and Dimensional Analysis
The other important part of the method used to affect a dimension reduction is the use of a novel multiplicative neural network which can be interpreted in terms of a special kind of fuzzy logic in Hubey[7]. Therefore the final result is not merely an ad hoc multiplicative network but as a result of a rigorous mathematico-scientific reasoning. Furthermore, the results can be transformed back into the arithmetic domain (where interval and ratio scaled values may be used) and still be interpreted using fuzzy logic so as to create association rules. The axioms of fuzzy logic can be found in many books [7,8,9]. Also in Hubey[7] is the special logic that is useful for training of arithmetic (interval-scaled or ratio-scaled) multiplicative neural networks. Aside from the standard axioms that should be satisfied by the fuzzy norm and conorm, if we want to take some guesses as to what kinds of laws of logic are impeccably true and should be preserved, the three that are commonly put forward as candidates are: The Law of the Excluded Middle (LEM) The Law of Noncontradiction (LNC) The Law of Involution or Self-Inverse (LSI)
x + C(x) = 1 C(x • C{x)) = 1 C{C(x)) = x
(3a) (3b) (3c)
where C(x) is the negation or complement function. LNC is usually written as x-C(x)= 0 and called the Law of Contradiction. Adding another level of indirection to continuous valued logic by separating of the truth value assigned to the log-
280
ical variable from the value of the logical variable allows the satisfaction of some of the constraints above by functions other than t(x) = x. For example, normally inputs into the datamining problems are positive finite numbers in some interval x e [0, L] where L > 1 . If we define the truth valuation as t(x) = x and the complement as C(x) = l/t(x) = 1/x, then we can treat these real numbers akin to logical values. For example, all values of x > 1 would be interpreted as more true than false and their complements would always be (1 /x) < 1 . In the IEEE floating point standard a very large number is called a NaN {not a number) which is its way of saying something like "infinity" is returned when and overflow occurs. Therefore the 1/x can easily be used as a fuzzy complement by trapping the NaN, and thinking of two values {0, °°} as the two truth values of crisp logic [10]. In general the outputs (using the suppressed summation notation of Einstein) for this ANN are of the n \n(y.)
= wik\n(xk)
or
y- =
wik J j xk
(4)
k= 1 where the repeated index denotes summation over that index. This network is obviously a [nonlinear] polynomial network, and thus does not have to "approximate" polynomial functions as the standard neural networks. The clustering is naturally explicable in terms of logic so that association rules follow easily. Since we can interpret multiplication as akin to a logical-AND (conjunction) and addition as a logical-OR (disjunction), we can then convert Eq (4) to the logical-form of a neural network and train it using the actual data values instead of the normalized values. A special fuzzy logic developed Hubey[7] is especially well-suited to interpret such multiplicative neural networks. The resulting neural networks are not created ad hoc but rather follow directly and logically from the results developed in Hubey[l]. Therefore, using the QuineMcCluskey method, or a related method we can cluster the inputs, and thus classify them. Using this minimization (or clustering) we can create a specially tuned multiplicative neural network which can be interpreted using this special fuzzy logic. This is the essential basis of the method. w21 -w22 -w23 Terms, [or fuzzy-minterms] such as Xj x 2 x^ serve functions similar to dimensionless groups of fluid dynamics [1] and the exact relationships amongst the input variables should be sought in terms of these groups. Hence, the method also has achieved a dimension reduction akin to PCA. For more examples of the use of dimensional analysis books such as White[10] Olson [11] may
281
be consulted. An example of the use of dimensional analysis to solve a problem in speech can be found in Hubey[12]. At the same time, we have achieved the solution to one of the problems associated with neural networks, that is, we now know how many output neurons an ANN should have for some specific problem at hand. We can now modify digital circuits which are custom-made for the problem at hand to create a [multiplicative] neural network which can be interpreted using the specific fuzzy logic shown above. This multiplicative neural network is customized for the problem and also does nonlinear regression. In addition, (i) we know how many output neurons we should have (ii) can perform nonlinear separation and (iii) does not need a second stage (for classification). As a simple example, a simple singlelayer multiplicative network can solve the XOR problem of Minsky [1]. It is known from empirical evidence, and from the Buckingham Pi Theorem [11,12] that nonlinear dimensional reduction is affected when dimensionless groups of variables regressed against each other. It is based on Rayleigh's "method of dimensions" in (Theory of Sound, 1887). The Pi Theorem was first stated by Vaschy, and proved in increasing generality by Buckingham, Riabouchinsky, and Martinot-Lagarge, and Birkhoff [12]. Without dimensionless numbers experimental progress is fluid dynamics and heat transfer would have been almost nil; it would have been swamped by masses of accumulated data. Indeed, the Navier-Stokes equations for fluid dynamics have never been solved in generality and all progress is due to dimensional analysis. It may be said that dimensional analysis was the first datamining technique used in the sciences. However clusters of variables (which are multiplicative, and may even be ratios after training) can be used as dimensionless groups and thus determine the 'size or dimension of the problem'. It is analogous to embedding a high-dimensional problem into a smaller dimensional space. 7
Conclusion
The method of KH-mapping combined with the Quine-McCluskey-like algorithms, together with the special fuzzy logic in Hubey[7] is the analogue ofPCA methods of statistics and Dimensional Analysis of physics. It is an embedding of a high-dimensional problem into a smaller dimensional space, and it is also a datamining, clustering technique. With minor modifications the method can handle; i) Supervised or Goal-directed Clustering; ii) Multiple Stages (Product-ofSums and Sum-of-Products); iii) Nonspherical Clusters; iv) Spectral Analysis in the Time domain. Correlation matrices can be used. In addition the method is highly scalable and parallelizable. Furthermore, the KHmap is ideal for visualiztion of high dimensional data. It can easily handle extremely large data sets or sparse data sets with minor modifications and can be used along with statistical
282
sampling techniques. It works together with specialized fuzzy logics [7,14,15] to create interpretations of the complex phenomena in high dimensional spaces. References 1. Hubey, H.M. A Complete Unified Method for Taming the Curse of Dimensionality in Datamining and Allowing Logical-ANDs in ANNs, submitted to the Journal of Datamining and Knowledge Discovery, June 2001 2. Han, J and M. Kamber, (Data Mining, Morgan Kaufmann, New York, 2001) 3. Stallings, Wm. Computer Organization and Architecture, (Macmillan, New York, 1993) 4. Hubey, H.M. Mathematical and Computational Linguistics, (Mir Domu Tvoemu, Moscow, Russia, 1994) 5. Leighton, T., Intwdution to Parallel Algorithms and Architectures, (Morgan Kaufmann, San Mateo, California, 1992) 6. Hecht-Nielsen, R (1990) Neurocomputing, Addison-Wesley, Reading, MA 7. Hubey, H.M. The Diagonal Infinity: problems of multiple scales, (World Scientific, Singapore, 1999) 8. Klir, G. and B. Yuan, Fuzzy Sets and Fuzzy Logic, (Prentice-Hall, Englewood Cliffs, NJ, 1995) 9. Jang, J., Sun, C. and E. Mizutani, Neuro-Fuzzy and Soft Computing, (Prentice-Hall, Upper Saddle River, NJ, 2000) 10. Hubey, H.M. Mathematical Foundations of Linguistics, (Lincom Europa, Muenchen, Germany, 1999) 11. White, F. Fluid Mechanics, (McGraw-Hill, New York, 1979) 12. Olson, R. Essentials of Engineering Fluid Mechanics, (Intext Educational Publishers, NY. 1973) 13. Hubey, H.M. Vector Phase Space for Speech Analysis via Dimensional Analysis, Journal of the International Quantitative Linguistics Association, Vol 6, No 2, August 1999, 117-148. 14. Hubey, H.M. Fuzzy Logic and Calculus of Beauty, Moderation and Triage, The Proceedings of the 2000 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS2000), June 26-29, Las Vegas. 15. Hubey, H.M. Fuzzy Operators, Proceedings of the 4th World Multiconference on Systemics, Cybernetics, and Informatics (SCI2000), July 23-26, 2000, Orlando, FL.
283 PROCESS C O N T R O L OF A LABORATORY COMBUSTOR USING NEURAL NETWORKS T. SLANVETPAN AND R. B. BARAT Department of Chemical Engineering, New Jersey Institute of Technology, Newark, AV'07102 E-mail: [email protected], [email protected] JOHN G. STEVENS Department of Mathematical Sciences, Montclair State University, Upper Montclair, NJ 07043 E-mail: [email protected] Active process control of nitric oxide (NO) emissions from a two-stage combustor burning ethylene (doped with ammonia) in air is demonstrated using two multi-layer-perceptron neural networks in series. Steady-state experimental data are used for network training. A Visual Basic interface controller program accepts incoming concentration and flow rate data signals, accesses the networks, and outputs feedback control signals to selected electronic valves. The first network identifies the amount of ammonia in the feed. Based on that value and the NO set point, the second network adjusts the first-stage fuel equivalence ratio
1
Introduction
Minimization of both transient and steady state emissions through active process control of combustion systems like waste incinerators, furnaces, turbines, automobile engines or power plants has become an important research area. Classical controllers such as the Proportional-Integral-Derivative (PID) are widely used. However, they can be challenged by process oscillations when large disturbances or set point changes are encountered. Another drawback is that tuning the PID is time consuming and requires a combination of operational experience and trial-and-error. The task becomes even more difficult for highly nonlinear processes especially in the presence of significant time delay. The use of controller models based on artificial neural networks has become an important research area. The inherent parallel structure and its ability to learn non-linear relationships from a known input-output data set have made the neural network a strong candidate for an alternative to PID. In this study, the emissions from a laboratory combustor are controlled via an active feedback control loop operating with trained neural networks.
284 2
Experimental Setup
2.1 The two-stage combustion facility A two-stage reactor served as the combustion facility. It has been well characterized elsewhere [3, 4, 5]. The first stage is a well-mixed zone that can be modeled as a perfectly stirred reactor (PSR). The hot effluent from the first zone passes into a linear flow zone that can be modeled as a plug flow reactor (PFR). Gas residence times are ~ 25 milliseconds. Thermocouples measure temperatures in various locations. Watercooled extractive probes withdraw gas samples for analyses. Figure 1 shows the overall system. 2.2
Controller Design and Setup
The application of feedback process control for the combustor involved several components and tasks. Two electronic control valves, one for primary air and the other for secondary air, served as the final control elements. Analog signals from continuous emission monitors (0 2 , NO, and C02) and PSR zone thermocouple were fed continuously into a Fluke data logger. All signals from the Fluke data logger were digitalized and simultaneously sent into the controlling computer COM port via an RS232, based on requested command from the controlling computer. A Visual Basic program with control software and hardware enabled the computer to process, display, and transmit signals from a thermocouple or gas analyzer simultaneously. The computer also provided feedback control by detecting deviations from assigned set points and generating correction signals (4-20 milliamps) to the electronic valves through a Keithley 12 bit 8-channel analog output board (DDA-08/16). All experimental data are transferred and recorded into an Excel spreadsheet. 2.3
Neural Network Architecture
Network architectures are varied depending on the complexity of each individual process and the objectives for using them. Multi-layer-perceptron (MLP) networks were constructed and used in each phase of the experiments. This architecture has been successfully used in several neural-network-based control applications [1, 2, 6, 7]. The back-propagation learning algorithm was applied because of its simplicity and ease of use. It is an iterative process that involves changing the weights, by means of a gradient descent method, to minimize the learning error. In the learning process, the training data were partitioned into two groups, one for training the network and the other for testing the network. The training data were
285
presented to the networks in random order to break up any serial correlations. Doing so led to significant improvements in convergence speed and performance of the trained network. Network weights were updated after the presentation of a set of known inputoutput pairs during the training. The NeuroSolutions software package from NeuroDimension Inc. was used to construct all neural networks in this study. The software combines a modular, iconbased, network design interface with a built-in custom wizard. These allowed us to build our own networks, generate and compile executable dynamic link library (DLL) files, and embed them into the existing Visual Basic controller interface. The DLL file is an executable module that performs the inputs-outputs mapping function (recall process). All measurable combustor parameters were displayed and preprocessed in the Visual Basic controlling interface before being fed into the neural networks. 3
Experimental Results
The main objective of this experiment was to develop a neural-network-based nonlinear process identification and model-predictive controller for minimizing the NO level from the two-staged combustor. Ethylene (C2H4) served as the fuel. Ammonia (NH3) was used as a model waste dopant containing fuel-bound nitrogen to produce NO. The purpose of the controller was to maintain the first stage fuel equivalence ratio ((b^ and the overall fuel equivalence ratio (
286
Figure 3 shows the results from the experiment. The initial equivalence ratios were set at (|>i = 1.14 and <> | 0 = 0.89. The initial dopant/fuel ratio in the feed was set at 0.027. The measured NO concentration from the PFR was 460 ppm. This number was set as a process set point. A step disturbance was then applied to the NH3 flowrate to raise the dopant ratio to 0.057. Figure 3 shows the open-loop NO level rose to 620 ppm. In closed loop, the controller increased <(>! to 1.3 by reducing the Ai in order to bring the NO back to the set point (460 ppm). Flow rate A2 was increased in order to maintain <|>0. The control action was set to execute every 240 seconds to facilitate the long time delay in the gas sampling and NO analysis process. This allowed the controller to receive the true value of the feedback signal from the NO analyzer. Although the controller response tended to be somewhat sluggish, the NO level was brought back to the set point after a reasonable period of time. Several strategies have been used to deal with the process time delays. Here, recurrent methods [3,4] such as Smith predictor and internal model controller (IMC) are under consideration. To extend the range of the neural network operation and improve the network accuracy, steady-state PSR+PFR simulation results with detailed chemical mechanisms will be used to expand the training set to cover a wider range of combustor operating conditions. 4
Conclusions
This paper has demonstrated the usefulness and effectiveness of applying a neuralnetwork-based controller to a combustion process. The second experimental part provided a detailed case study in which neural networks were applied to the nonlinear NO control process with significant time delay in the sampling process. The use of a controller based on the two neural networks connected in series was demonstrated. With the neural-network-based identifier and controller, the process was successfully brought back to the set point after a step disturbance in the feed stream.
287
Two-itaf td coakutor
=2. <
-HI
<XH
1 r^Qt ^
^
«.Qft Figure 1. Overall experimental system
, — 1 _ 5 ^ _ , I............
5*3
^S
-^ t'
PSB
PFR
m i "in -W
IX]
Aeaul (low line -EkranmicMiWDt™
Figure 2. NO control experimental setup
288 703 650 600 550 500 450 " 400 closed-bop
350 300 0
1000
1500 71me(s)
Figure 3. Open-loop and closed-loop results of NO experimental control
References 1. Allen, M.G., C.T. Butler, S.A. Johnson, E.Y. Lo, and F. Russo, "An Imaging Neural Network Combustion Control System for Utility Boiler Applications." Combustion and Flame. Vol. 94, (1993): 205-214. 2. Bhat, N., T.J. McAvoy, "Use of Neural Nets for Dynamic Modeling and Control of Chemical Process Systems." Computers Chemical Engineering. Vol. 14. No. 4/5 (1990): 573-583. 3. Cheng, Y., T.W. Karjala, and D.M. Himmelblau, "Identification of Nonlinear Dynamic Processes with Unknown and Variable Dead Time Using an Internal Recurrent Neural Network." Industrial & Engineering Chemistry Research. Vol. 34,(1995): 1735. 4. Chovan, Tobor, Thierry Catfolis, and Kurt Meert, "Neural Network Architecture for Process Control Based on the RTRL Algorithm." AIChE Journal. Vol. 42. No. 2, (1996): 493-502. 5. Mao, Fuhe, "Combustion of Methyl Chloride, Monomethyl Amine, and Their Mixtures in a Two Stage Turbulent Flow Reactor." Ph.D. Dissertation, New Jersey Institute of Technology. (1995) 6. Palancar, Maria C, Jose M. Aragon, and Jose S. Torrecilla, "pH-Control System Based on Artificial Neural Networks." Industrial & Engineering Chemistry Research. Vol. 37, (1998): 2729-2740.
289
7.
Syu, Mei-J and Bow-C. Chen. "Back-propagation Neural Network Adaptive Control of a Continuous Wastewater Treatment Process." Industrial & Engineering Chemistry Research. Vol. 37, (1998): 3625-3630. 8. Mao, Fuhe and Robert B. Barat "Minimization of NO During Staged Combustion of CH3NH2." Combustion and Flame. Vol. 105, (1996): 557-568.
Communications Systems/Networks
293 INVESTIGATION OF SELF-SIMILARITY OF INTERNET ROUND TRIP DELAY JUN LI AND CONSTANTINE MANIKOPOULOS Electrical and Computer Engineering Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA E-mail: [email protected] and [email protected] JAY JORGENSON Department of Mathematics, CCNY, Convent Ave, at 138 ST., New York, NY 100031, USA E-mail: [email protected] Measurements of local-area and wide-area network traffic have shown that network traffic magnitude exhibits variability at a wide range of scales—self-similarity. Interestingly, recent research work on network packet delays demonstrates that the distribution of the round trip delay of the packets that traverse the Internet also obeys self-similarity. Understanding the reason of self-similarity in packet round trip delay is critical in order to improve quality of services (QOS) and the efficiency of network bandwidth utilization. In this paper, we show that the phenomenon of self-similarity in the packet round trip delay process is mainly caused by the self-similarity in the background traffic. Moreover, the Hurst parameter of the packet round-trip delay process is proportional to the Hurst parameter of the process of the background traffic magnitude.
1
Introduction
Recent measurements of local-area and wide-area network traffic have shown that it exhibits variability at a wide range of scales [1][2]. Such scale-invariant variability is in strong contrast to the traditional models of network traffic, which show variability at short time scales but are exponentially smooth at large scales. This effect is described statistically as long-range dependence (LRD), and the time series description showing this effect is said to be self-similarity. Since the traffic pattern is significantly changed, the traditional analyses on network resource utilization and performance parameters are also challenged. In [3][4], some experimental and simulation studies show that long range dependence has significant impacts on all metrics of network performance, such as packet loss rate and queue occupancy. The main reason why LRD affects these metrics is that LRD traffic seriously decays the queuing performance. As showed in the experiments in [3], the tail of a queue length distribution decays much more slowly with LRD packet traffic than with short-range dependence (SRD) packet traffic, which is widely used in traditional network performance analysis. Recent studies based on empirical measurements using the User Datagram Protocol (UDP) probe packet, transmitted over the Internet, showed that the round-trip delay of the packets also exhibits self-similarity [5][6][7]. In these experiments, each packet
294 generated by the source is routed to the destination via a sequence of Internet backbone switches or routers, and then is sent back by the destination to the source. The round-trip delay is thus the sum of the delays experienced at each hop, as the packet gets transferred. Since these packets are of fixed packet length, each such delay, in turn, consists of two components: a fixed component, that includes the transmission time delay at a hop and the propagation delay on the link to the next node, and a variable component, that includes the processing and queuing delays at each node. Through statistically analyzing these data traces, the researchers found that although packet interdeparture times are deterministic, arrivals at the source exhibit LRD in most cases. It is noteworthy that the degree of LRD of these data traces, as indicated by the Hurst parameter, varies significantly. Understanding the reason why the round-trip delay exhibits LRD, especially why the degree of LRD varies significantly from case to case, is very important, for the proper design of network algorithms, such as routing and flow control algorithms and for the dimensioning of buffer and link capacity. The empirical measurement data for the Internet packet delay can be better modeled by a self-similar process with long-range dependence, rather than a traditional Poisson process with short range dependence. However, the reason why this packet delay is selfsimilar, has been obscure, but will be revealed in this paper. In order to figure out why the Internet round trip packet delay is self-similar, we should consider the following two factors. The effect of the background traffic: When the probe packets transfer through the Internet, they do so in the company of other background Internet traffic. Now, let us assume that these probe packets as well as the background traffic are fed into a network router with fixed service rate. If the background traffic is self-similar, then the queue length of the router, when viewed as a time series, will also be self-similar. As a consequence, the queuing delay, at the router, of the probe packet should also exhibit self-similarity. Thus, the background traffic has significant effect at least on the router delay component of the Internet round trip packet delay. This is borne out by our simulation results, shown in Section 3. The correlation of the queues of the routers: As indicated by recent empirical measurements on the Internet round trip delay is self-similar. As the UDP probe packets traverse the Internet, sometimes from one continent to another, they will often go through a series of routers or switches, some of which will part of the backbone. As discussed earlier, the distribution in time of the queue length of a specific router is selfsimilar. Traffic through these routers subsequently converges into the inputs of other routers, thus influencing greatly the distributions in their queues. Thus, we may reasonably expect that the length of the queues of the routers, that the probe packet passes, should have some degree of correlation. This issue is also indicated by the work in [7] when the author worked to derive the formula for Internet packet delays.
295
In this paper, we focus on the effect of the first factor, i.e. the effect of the background traffic magnitude. We carry out a simulation study on the packet round trip delay process, by analyzing the behavior of the round trip delay of probe UDP packets, when the degree of LRD of the background traffic is changed. Our simulation results show that the degree of LRD of the round trip delay process of the probe UDP packets is strongly related to the degree of LRD of the background traffic. The rest of the paper is organized as follows: In section 2, we will present our approach to generate the experimental data sets. And in section 3, we will present the results of statistical analyses on the simulation data. Finally, in section 4 we conclude this paper.
Figure 1. Diagram of the simulation network
2 Simulation Experiments In our simulation experiment, we model two classes of traffic, that is, the UDP probe traffic and the background Internet traffic. In this section, we will present our approach to generate the experimental data sets. First, the simulated network configuration is introduced in subsection 2.1. In subsection 2.2, we present our approach to model the background Internet traffic. In subsection 2 3 , we introduce our technique to generate the background traffic with different degree of self-similarity. In subsection 2.4, we discuss the method to generate the UDP probe packets and calculate the round-trip delay of these packets. 2.1 Network Model Our simulated network, shown in Figure 1, consists of four subnets: Subnetl, 2, 3, and SubnetJServer: The clients are located in Subnetl (Ethernet), Subnet2 (FDDI), and Subnets (Token Ring), that are all connected with routers to the server, located in
296 Subnet_Server (Ethernet). The simulations were carried out using OPNET, a network simulation facility. During the simulations, the clients in Subnetl, 2 and 3 establish conversations with the server in Subnet_Server. The links connecting the four Subnets and routers are T l . 2.2 Background Internet Traffic Configurations In our simulation experiments, we model the four most popular TCP/IP services: Http, Telnet, Ftp and Smtp. The application-layer workload characteristics of these four Internet services used in our simulations derive from the reported literature, as in [1][8]. From their work, we find that these workload characteristics closely resemble network measurements. The work in [9] identified these four services as being responsible for 86% of Internet traffic in bytes. We note that due to the workload characteristics of the four Internet services used in our simulations, the network traffic should be self-similar, i.e. the traffic will exhibit variability that appears as "burst" phenomena at a wide range of time scales. 2.3 Achieving Various Degree of Self-Similarity In this paper, we want to observe the behavior of the packet round trip delay when the degree of self-similarity of background traffic varies. In order to achieve this goal, we need to generate the background traffic with various degree of self-similarity. In our simulation experiments, we use the method suggested by the work in [10], that is, changing the shape parameter of the Pareto distribution, which is the statistical model of file size transferred from the server to the client for Http traffic, from 1.05, 1.3, 1.55, 1.8 to 2.0. As indicated by the Hurst parameters shown in Table 2 and 3, this is a practical way to achieve data traces with various degree of self-similarity. 2.4 Probe UDP Packets In our simulation experiments, we send the UDP probe packet from the source, which is a client located in Subnetl, to the destination, which is a client located in Subnet_Server. After the destination receives a probe packet, it will send it back immediately. When the probes reach the source, it will calculate the round trip delay in a way similar to that adopted by [7].
297 2.4.1
Choice of Parameters
Packet Length: In order not to affect the network workload, we use packets with small lengths for probing. We want to use these packets as probes in the network and obtain their round trip delay. In our simulation experiment, we fix the length of the probe packets to 700 bytes. In this way, we can assure that the propagation and transmission delays are fixed, and the variability of round trip delay is due mainly to the variable queuing delay. Inter-packet Departure: We send the UDP probe packet from the source every 100 milliseconds. We want to send as many probes as possible to get an accurate characteristic of the round trip delay. However, sending the probes in a faster rate will affect the background network workload and thus our observations. 3 Simulation Results and Discussion 3.1 Simulation Results In our simulation experiments, we gather data on three network performance parameters. The first parameter we collected is the aggregated packet rate, which is measured on the link connecting Router 1 with Subnet_Server. The traffic going through this link consists of two components, the background traffic and the UDP probe traffic. The second parameter is the length of the queue in Router 1. And the third parameter we collected is the round trip delay of the UDP probe packets. In Figure 4, we show the data traces we collected in our first group of simulation experiments. The results of our first group of simulation experiments are listed in Table 2. In this group of simulations, 50 clients located in Subnet 1, 2 and 3 will communicate with the server in Subnet_Server. The first column of Table 1 shows the shape parameters of the Pareto distribution, which is the statistical model of file size transferred from the server to the client for Http traffic. In column 2, we list the estimated values of the Hurst parameter of the background traffic magnitude, and in column 3 we list the estimated values of the Hurst parameter of the round trip delay of the UDP probe packets. Comparing the results in column 2 and 3, it is interesting to find that when the Hurst parameter of the background traffic magnitude decreases, the Hurst parameter of the round trip delay also decreases; in other words, the degree of self-similarity of round trip delay decreases when the degree of self-similarity of background traffic magnitude decreases. This is depicted in Figure 2. In order to check whether the network topology will affect this result, we carried out a second group of simulation experiments. In this group of simulations, we reduce the number of the clients communicating with the server from 50 to 25. Correspondingly, the background traffic magnitude will be reduced by one half. As shown in Table 3 and Figure 3, the degree of self-similarity of background traffic magnitude and the round
298 trip delay of the UDP probe packets measured across both topologies changed in very similar manner. Shape Parameter* 1.05 1.30 1.55 1.80 2.00
Background Traffic Magnitude (BTM) R/S Analysis Variance-Time 0.9171 0.9233 0.8456 0.8584 0.8011 0.8076 0.7154 0.7293 0.6331 0.6214
Round Trip Delay (RTD) R/S Analysis Variance-Time 0.9419 0.9512 0.8803 0.8814 0.7412 0.7567 0.6834 0.6712 0.6167 0.6101
Table 2 Estimates of Hurst parameter for data traces collected from network topology with 50 clients Round Trip Delay (RTD) Shape Background Traffic Magnitude (BTM) Parameter* R/S Analysis Variance-Time R/S Analysis Variance-Time 0.8710 0.8862 1.05 0.8310 0.8312 0.7038 0.7629 0.7613 1.30 0.7153 1.55 0.6986 0.7012 0.6489 0.6645 1.80 0.5623 0.5616 0.6246 0.6348 2.00 0.5325 0.5530 0.5417 0.5219 * The shape parameter of Pareto distribution, which is the statistical model of file size transferred from the server to the client for Http traffic. Table 3 Estimates of Hurst parameter for data traces collected from network topology with 25 clients 3.2 Discussion As indicated by our simulation results shown in Table 2 and 3, the background traffic has a significant effect on the packet round trip delay. Below we identify the mechanisms of the correlation of the degree of self-similarity of the packet round trip delay to the degree of self-similarity of the background traffic magnitude. As noted in section 1, packet delay in the Internet consists of four components. Considering packets with fixed length, going through a fixed route (the routing of the Internet packets is stationary, as found in [11]), the variability of packet delay is due only to the queuing delay. Since the background traffic and the UDP probe traffic will go through the same route, we can consider the situation that the two classes of traffic are fed into the queue of a specific router. When a probe packet experiences minimum queuing delay, it means that when it enters the queue, there are no packets in it and it is served immediately. However, this is a special case and rarely happens. More commonly, when the probe packet enters the queue, there are a number of packets waiting there, and the probe
299 packet can be served only after all the packets head of it are served. Since the background traffic is self-similar, that is, exhibiting variability at a wide range of scales, we expect that the queue length in number of packets waiting for service also exhibits variability at a wide range of scales. Accordingly, the queuing delay should also exhibit self-similarity of some degree. From Figure 4, we can see this phenomenon visually. This figure shows the data traces we collected in our first group of simulation experiments. The first column is the data traces of background traffic magnitude in packets/second. The second column shows the data traces of the queue length of Router 1 of Figure 1 in number of packets waiting for service. The third column is the round trip delay of the UDP probe packet in seconds. Comparing the figures in the same row, we find a consistent pattern of correlation; there are more bursts in the figures of queue length and round trip delay when there are more burst in the figure of background traffic magnitude. From this figure, we can understand why the degree of self-similarity of the packet round trip delay will be increased when the degree of self-similarity of the background traffic magnitude is increased. 4
Conclusions
As indicated above, two factors may cause the self-similarity of the Internet round trip delay. In this paper, we focus on the effect of the first factor, the background traffic. Based on our simulation results, we conclude that the background traffic has a significant effect on the Internet packet delay, and the degree of self-similarity of the Internet packet delay is strongly and proportionately related to the degree of the selfsimilarity of background traffic magnitude. Acknowledgement We thank OPNET Technologies Inc., for partially supporting the OPNET simulation software that we used. References 1. 2.
Vern Paxon and Sally Floyd, Wide-area traffic: the failure of poisson modeling, IEEE/ACM Transaction on Networking, 3(3) pp226-244, June 1995. W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson, On the selfsimilarity nature of Ethernet traffic (extended version), IEEE/ACM Transactions on Networking, Vol. 2, pp 1-15, Feb. 1994.
3.
4.
5.
6. 7.
8.
9.
10.
11.
Ashok Erramilh, Onuttom Narayan and Walter Willinger, Experimental queuing analysis with long-range dependent packet traffic, IEEE/ACM Trans. on Networking, pp. 209-223, April 1996 Jean-Chrysostome Bolot, Charactering end-to-end packet delay and loss in the Internet, Journal of High-Speed Networks, vol. 2, no. 3 pp 305-323, Dec. 1993 M. S. Borella and G. B. Brewster, Measurement and analysis of long-range dependent behavior of Internet packet delay," Proceedings, IEEE Infocom '98, pp. 497504, Apr. 1998. O. Gudmundson, D. Sanghi, and K. Agrawala. Experimental assessment of end-to-end behavior on Internet. In Proc. InfoComm '93, March 1993 Li, Qiong, and D.L. Mills. On the long-range dependence of packet round-trip delays in Internet. Proc. IEEE International Conference on Communications (Atlanta GA, June 1998), 1185-1191. Barford, Paul; Crovella, Mark. Generating Representative Web Workloads for Network and Server Performance Evaluation, http://cspub.bu.edu/techreports/1997-006-surge.ps.Z. Kevin Thompson, Gregory J. Miller and Rick Wilder, Wild-Area Internet Traffic Patterns and Characteristics (Extended Version), IEEENetwork, Nov/Dec, 1997, pp 10-23. Kihong Park, Gitae Kim and Mark Crovella, "On the Relationship between File Sizes, Transport Protocols, and Self-Similar Traffic", Technical Report, Boston University, TR-96-016 Y. Zhang, V. Paxson, S. Shenker, and L. Breslau, The stationarity of Internet path properties: routing, loss, and throughput, in submission, Feb. 2000.
Hurst Parameter of BTM
Hurst Parameter of BTM
Figure 2. Estimates of Hurst parameter of our first Figure 3. Estimates of Hurst parameter of our second group of simulations group of simulation experiments
301
iiyiiiiliiiial
•-lylMuiil
1:
#liiiiili
^HiiiiMil
iminmir**** 1 °> jjJAMilintiMMfr
Figure 4. Data Traces collected in our first group of simulation experiments with packet rate of background traffic (left), queue length (middle) and round trip delay (right). The figures in row 1, 2, 3,4 and 5 correspond to the data generated with shape parameter of Pareto distribution equal to 1.05, 1.3, 1.55, 1.8 and 2.0.
303
MODIFIED HIGH-EFFICIENCY CARRIER ESTIMATOR FOR OFDM COMMUNICATIONS WITH ANTENNA DIVERSITY UFUK TUKELI AND PATRICK J. HONAN Department of Electrical Engineering and Computer Engineering, Stevens Institute of Technology, Hoboken, NJ 07030 E-mail: [email protected] Orthogonal frequency division multiplexing (OFDM) based wireless communication systems combined with coding and antenna diversity techniques operate at very low signal-to-noise ratio (SNR) levels. Receivers are generally coherent demodulators implemented around a fast Fourier transform (FFT), which is the efficient implementation of the Discrete Fourier Transform (DFT). Carrier synchronization is critical to the performance of an OFDM system. Tone or sub-carrier orthogonality is lost do tofrequencyoffset error. OFDM carrier frequency offset estimation and subsequent compensation can be performed from the received signal without periodic pilots or preambles. OFDM algebraic structure can be exploited in a blind fashion to estimate carrier offset. However, the performance degrades at low SNR. The algorithm here will allow highly accurate synchronization by exploiting maximum diversity gains to increase effective SNR without reference symbols, pilot carriers or excess cyclic prefix. Furthermore, diversity gains overcome lack of identifiability in the case of channel zeros on the DFT grid.
1
Introduction
Next generation wireless communication systems will handle broadband applications and OFDM coupled with antenna diversity has been proposed [1]. Carrier frequency synchronization is critical to the performance of an OFDM system. Tone or sub-carrier orthogonality is lost do to frequency offset error. This results in higher inter-channel-interference (ICI) levels thus lowering signal-tointerference-and-noise ratio (SINR). For OFDM, frequency offsets of as little as 1% begins to result in noticeable penalties in SINR [2]. These effects can be countered by correcting for frequency offset prior to the FFT. This requires the accurate estimation of the frequency offset. OFDM is the standard modulation scheme used in Europe for Digital Audio Broadcasting (DAB) and Digital Video Broadcasting (DVB) [9,10]. In addition, local area networks (LANs) such as IEEE 802.1 la are OFDM based. These systems are based on earlier developed synchronization methods using known preambles and/or periodic pilots. Pilot based carrier synchronization systems consume bandwidth and power and result in significant intra cell interference. These methods spend valuable bandwidth and power resources and require channel estimation. Since the performance of OFDM is severely degraded in the presence of carrier frequency offset, reliable channel estimation is difficult to perform before carrier frequency offset compensation. Blind synchronization that exploits the structure of
304
the signaling can be applied directly to the received signal without periodic pilots or preambles [3,4]. Next generation systems will benefit from extensive coding and diversity techniques able to operate at extremely low to negative SNRs. These techniques in particular multi-antenna diversity are quite effective in combating multi-path channel fading effects. Synchronization methods will not directly benefit from coding and channel estimation gains. These prospects make the task of synchronization that much more difficult. The algorithm presented here will allow highly accurate frequency offset estimation, even at low SNR, by exploiting maximum antenna diversity gain. This paper will present an algorithm for maximizing multi-antenna diversity gain for the purposes of improved blind frequency offset estimation. The paper will proceed by first formulating the algorithm around the high efficiency blind frequency offset estimator proposed in [3]. Then observations of the estimators improved performance, as modified by this algorithm, are discussed in terms of identifiability and increased effective SNR. Finally, numerical simulation results are presented and discussed. 2
Problem Formulation
Carrier offset estimation on a multi-path frequency selective fading channel at low SNR results in high variance. Antenna diversity is has been touted as one of the solutions to mitigate channel fading. The probability that all the signal components will fade is reduced when replicas of the same signal are received over independently fading channels [5]. Denote s(k)=[sj(k), s2(k),... sp(k)]T as the kth block of data to be transmitted. The transmitted signal is OFDM modulated by applying the inverse DFT to the data block s(k). Using matrix representation, the resulting N-point domain signal is given by: b={ql(k),S1(k)...gr(knT=VI*(fi,
(1)
where \Vp is a matrix of the NxN IDFT matrix W. In a practical OFDM system, some of the sub-carriers are not modulated in order to allow for transmit filtering. In other words, the number of sub-channels that carry the information is generally smaller than the size of the DFT block, i.e., P< N. because of the virtual carriers [2]. Without loss of generality, we assume carriers no. 1 to P are used for data transmission. For systems with antenna diversity, i.e. the receiver has m antennas the receiver input for the kth block consists of: yk=[y#;,y#;-y„^L
(2)
where y,{k) =WpH,-s(k), is the input to the ith antenna. H,= diagiHj (1), Hs (2),..., H,{Pj), where Ht (p) defines the channelfrequencyresponse at the pth sub-carrier. In
305
the presence of a carrier offset, e'*, the receiver inputs are modulated by E(<j>) = diag(l, e>*..... ei(N-1)lt') and becomes y,(k) =E(0)W,H,s(k) el^,)(N+Ng), where Ng is the length of the cyclic prefix. Since W / ^ E ^ W ^ I , the E(0) matrix destroys the orthogonality among the sub-channels and thus introduces ICI. To recover {s(k)}, the carrier offset, 0, needs to be estimated before performing the DFT. This paper presents an extension of the estimation method developed in [3] to take advantage of antenna diversity. This extension algorithm will compensate for deep fading of modulated carriers, and enable unique identification [4]. Frequency selective fading is to be expected for OFDM signals, which is used for broadband applications over multi-path frequency selective fading channels. The cost function developed in [3], minimizes the following cost function,
p{z)=X X < .z"' (z)y(*)y" (*)z(*)w™,
O)
where Z(z) = diag{\, z, z2,...,z"'1). The y(k) in (3) is equivalent to y/(k) as defined in (2), a single antenna case. An estimate of the covariance is performed as follows:
"y,W"T
•
y*y? =• \si(k)
••
ym(k)\ ym(k)H\
K=j i>*y? •
(4)
k=\
The estimate ROT is averaged over k=l,2..K sample blocks and used in the modified cost function as follows: P( Z ) = Xw;,Z-'(z)R,Z(z)w,
(5)
This form of the cost function is quite effective at taking advantage of multi-antenna diversity. The covariance calculation removes the phase dependency so that received signals are added constructively while preserving the algebraic structure due to the modulation matrix and carrier offset. Figure 1 depicts a multi-antenna receiver implementation of the proposed algorithm.
306
Figure 1. Multi-antenna receiver implementation. The algorithm is computationally efficient and a further improvement is achieved through an adaptive implementation [6]: Ryy(k)=aRyy(k-\)
+ (l-a)y(k)yH
3
(k),
dz w *
(6)
Observations
Severe multi-path complicates synchronization, especially in the presence of deep fades. The diversity combining technique proposed above is shown in numerical results section to be remarkably robust even in the presence of deep fading. Identifiability (uniqueness) conditions for frequency estimation are developed here. The effect of the novel combining algorithm, developed above, on the combined channel transfer is best demonstrated by first creating the new matrix representation rm =[H,,H 2 ,...H,J. The received signal vector and combined signals covariance function can then be expressed respectively as:
R„
=E(
(7)
where S^. is an m antenna generalization of s(£) and Vk additive noise, they are of matrix form defined similar to y k. The covariance expression (7) isolates the
307
multi-antenna signal and additive noise Vk components. The cost function introduced in (5) can now be modified as: P(z)
_ji = £ w^z-(z)E (^)w,r.R11r:w;E"(^)z(z)wM (8)
2 + t7v X<,Z"(z)I„Z(z)Wf
This new cost function results in effective SNR gain, directly by maximizing the diversity gain and indirectly by improving the estimator identifiability. Diversity gain is maximized by exploiting the time domain correlation of the signal. This is evident by considering that the OFDM signal will have correlation in time whose contribution is summed in (8) where noise is not correlated and will not benefit in the same way. The resulting gain over noise is equivalent to coherent combing of the received antenna signals. In the absence of additive noise, P(z) will have a zero at the offset frequency due to null-space orthogonality as developed in the previous section. Again, at the correct frequency offset estimate, that is z= e'*, the condition Z~ (z)E(0) = IN is satisfied, thus restoring the orthogonality of the received signal to the virtual carriers Wp+, and as a result a P(z) equal to zero. Limiting (8) to the no-noise case, the cost function is the sum of non-negative quadratic forms that are zero if and only if: w^,.z-'(z)E
(0)w^r„r^=o,
v; e [i, #-/>].
(9)
This expression assumes the input signals are independently distributed. Rss is a constant multiplied by the identity matrix due to persistence of excitation. The matrix T^r" is a non-negative diagonal matrix given by: T J t = diagiA,]; V/e[l,/>]. (10) We'll show that a lesser stringent condition than assuming full rank of r ^ T " will suffice. Lemma: For a unique zero of P(z) it is necessary and sufficient that only two of the diagonal elements of T T" namely Ax and AP be non-zero, as expressed below in terms of the combined channel responses:
A, =£#,(<))#;«», ••-I
AP =£//,(;>) W ) .
(i i)
i-i
Proof: The virtual carriers wp+, for V/e [1,N — P], consist of fej2""'*')""}""'', span the null-space of W p . The minimization P(z) seeks to find z such that w"+/ Z"'(z)E(0), a function of (2ic(P+i)/N + 0 - 0 ) ,
is in the left hand null-
308
space of the argument \V,r m r". Thus if TmF" is full rank and Q = (j), such that Z _1 (z)E(^)= I N , the arguments null-space is spanned by only the set of virtual carriers. If either A{ or A ? is equal to zero, which is an event of zero probability, P(z) is also minimized at > = (p ± 2TC/N respectively, thus non-unique zero minima's. It should be understood that the channel zeros ^j2""1}_,, that fall on the FFT grid, other than at 1 or P, do not result in an ambiguity because (8) is not simultaneously minimized by contribution from (1:N-P> virtual carriers in (9), and the lemma follows. 1
1
1
. I
Figure 2(a). MSE vs SNR with channel zero at i=P.
10 fc .
.
.
.
.
.
.
.
.
.
.
.
.
Figure 2(b). MSE vs. SNR
The non-realizable (zero-probability) conditions proposed in the above are helpful to understand the effects do to added noise. At low SNR, channel zero's located even near sub-carriers result in estimator ambiguity (uniqueness) issues. As equation (8) shows, minimums near or below the noise level will result in ambiguity. Simulations discussed in the next section, resulting in Figure 3 (b), show this for an SNR of 15 dB which begins to cause an ambiguity for the m=2 case. The algorithm is able to mitigate this by effectively increasing SNR on all sub-carriers by exploiting maximum diversity gains.
309
rtTfftKln'aftBc) taqlflKtvttfaoc)
Figure 3(a). P(z) spectrum for multi-path and no-noise.
4
Figure 3(b). P(z) spectrum for multi-path and 15 dB SNR.
Numerical Results and Discussion
Computer simulations were run over 100 Monte-Carlo realizations of random FIR channels with five sample spaced taps with random uniformly distributed phase ~U(0,2 pi), normal distributed imaginary and real components ~N(0,1). For each Monte-Carlo run, K=5 symbols were used. The OFDM signal consisted of N=32 sub-carriers and P=20 data streams. Offset frequency used <j> =1.11 AD3 where A05= 27t/N is the channel spacing. The performance enhancements of the proposed algorithm are illustrated in Figure 2. The results shown in Figure 3 (a and b) respectively show the improved identifiability for multi-antenna cases m= 1,2,4 & 8, where the sample covariance obtained in (4) was used in the cost function (5). The performance enhancement in the presence of deep fading is most evident when comparing Figure 2(a) and 2(b), each includes multi-path but former has a channel zero at sub-carrier P. As expected, this has a severe impact on the performance of estimator with m=l, and to a lesser degree for the higher diversity configurations. For m=8 estimator shows little performance degradation and an approximately 9 dB relative improvement in SNR. Improved estimator identifiability is demonstrated by Figure 3 (a and b), which show the cost function, without noise, and corresponding to Figure 2(a) for an SNR of 15 dB. Evident in Figure 3(b) are the ambiguous minimums, which are beginning to become an issue for the m=2 estimator. As the SNR is further lowered, the m=4, and eventually m=8 estimator is impacted. But what's most impressive here, even at very low SNRs of minus 3-4 dB, for m=8 estimator meets the design target of
310
OFDM systems operating at much higher SNRs [8]. This combined with the proposed adaptive implementations, will allow high estimator resolution even at lower SNRs. The above simulation results demonstrate the utility of the proposed algorithm for the synchronization of OFDM wireless systems operating at extremely low SNR. 5
Conclusions
The modified high-efficiency carrier estimator for OFDM communications with antenna diversity proposed here will meet the requirements of the next generation wireless communication systems. The proposed by exploiting both the algebraic structure of OFDM and the gains inherent in multi-antenna diversity is able to attain efficiencies in terms of bandwidth, power, and computation beyond other proposed methods. References 1. Y. Li, N. Seshadri, and S. Ariyavisitakul, Channel Estimation for OFDM systems with transmitter diversity in mobile wireless channels. IEEE Journal on Selected Areas in Communications (1999) 17(3): 461^*71. 2. T. Pollet and M. Moeneclaey. Synchronizability of OFDM signals. In Proc. Globecom (1995) pp. 2054-2058. 3. H. Liu and U. Tureli, A high efficiency carrier estimator for OFDM communications. IEEE Communication Letters (1998) 2(4):104-106. 4. U. Tureli, D. Kivanc and H. Liu, Experimental and analytical studies on a highresolution OFDM carrier frequency offset estimator. IEEE Transactions on Vehicular Technology (2001) 50(2):629-643. 5. John. G Proakis, Digital Communications , 3"* edition, (McGraw-Hill, 1995) p. 777. 6. Simon Haykin, Adaptive Filter Theory , 3rd edition (Prentice-Hall, 1996) Chapter 8. 7. R.V. Nee and Ramjee Prasad, OFDM For Wireless Multimedia Communications , (Artech, 2000) Chapter 9. 8. T. Pollet, M. Van Blabel, and M. Moeneclaey, BER Sensitivity Of OFDM Systems To Carrier Frequency Offset And Wiener Phase Noise, IEEE Trans. Communications 43 (1995) pp.191-193. 9. European Telecommunications Standard, Radio Broadcast Systems; Digital Audio Broadcasting to mobile, portable, and fixed receivers, preETS 300 401, (1994) 10. U. Reimers, "DVB-T: The COFDM Based System For Terrestrial Television, Electron Commun., Eng. J. 9 (1995) pp. 28-30
311
A COMPARISON BETWEEN TWO ERROR DETECTION TECHNIQUES USING ARITHMETIC CODING BIN HE AND CONSTANTINE N. MANIKOPOULOS Department of Electrical and Computer Engineering New Jersey Institute of Technology Newark, NJ 07102 Email: [email protected] [email protected] This paper compares the error detection capability of two joint source and channel coding approaches using arithmetic coding, i.e., the approach with redundancy in form of forbidden coding space and the approach with redundancy in form of periodically inserted markers. On the one hand the comparison shows two approaches basically have the same detection capability, on the other hand the forbidden symbol approach is more efficient in error correction while the marker symbol approach is simple and suitable for packet switching networks.
1
Introduction
In recent years, the interest of joint source and channel coding has considerably increased. In many of wireless communication systems and the Internet services, Shannon's separation theorem does not hold [1]. A joint design is needed for these systems where source coding and channel coding function dependency to maximize system performance. Moreover, the joint source and channel coding with variable length codes is receiving increasing attention because of the efficient utilization of limited channel resources, e.g., bandwidth, for variable length codes. Of the variable length codes in joint source and channel coding, arithmetic coding [2] is widely used. Boyd et al. [3] proposed a detecting approach which introduces redundancy by adjusting the coding space such that some parts are never used by encoder. If decoding process enters the forbidden region, an error must have occurred. Kozintsev et al. [4] analyzed the performance of this approach in communication systems by introducing "continuous" error detection. The redundancy versus error detection time was studied. Pettijohn et al. [5] extended this work in sequential decoding to provide error correction capability. Another idea of using arithmetic codes for error detecting was proposed by Elmasry [6], where the redundancy needed for error detection is introduced to the source data before compression in the form of periodically inserted markers. The decoder examines the reconstructed data for the existence of the inserted markers. An error is indicated if a marker does not appear in its proper location. Figure 1 shows the diagram of this idea.
312 Input
(T\ J
Xt>
Data
Block Size Counter
•—
Marker Generator
SoaK
?
1 0
3 2 1
Q
Reconstructed Data
Source
Figure 1. Diagram of the marker symbol approach
In next section of this paper, a comparison between above two detection approaches using arithmetic coding is discussed. Two approaches are referred to as forbidden symbol approach and marker symbol approach, respectively. The comparison is carried out in terms of redundancy and error propagation distance (i.e., error detecting time [4]), which mainly determine the performance of system based on the approach. The comparison shows while two approaches basically have the same error detection capability, they should be applied to different kind of systems. 2
Comparison of Redundancy and Error Propagation Distance
For forbidden symbol approach, the redundancy and error propagation distance are analyzed by Kozintsev et al. [4], where the forbidden symbol is assigned probability e (0 < e < 1). Following geometric distribution with random variable F l is modeled to represent the number of symbols it takes to detect an error after it occurs, i.e., error propagation distance, Pyl{k) = (1 - e)*- 1 e
k = 1,2,.. .,co,
(1)
and the probability that error propagates more than n symbols decreases with n t h order of (1 — e),
P1[Yl>n]
=
(l-e)n.
(2)
The redundancy is Ri = — log 2 (l — e)
bits per symbol.
(3)
For the marker symbol approach, three kinds of marker strategies were introduced by Elmasry [6]. Because of its small amount of redundancy, the previous marker strategy is discussed in this paper where a block of size m
313
source symbols is turned into a block of (m +1) symbols by repeating the m t h symbol at the (m + l ) t h location. The amount of redundancy is R2 =
~.
(4)
m
The probability density function (PDF) of error propagation distance is plotted in Figure 2. m = 10, 20, and 30 are used to show the PDF with different amount of redundancy.
IS 20 25 30 35 Position of the detected error (characters)
40
45
SO
Figure 2. Error propagation distance P D F of marker symbol approach
We can see the PDF remains almost constant in each marker region. But between these marker regions, PDF drops quickly. The probability that error is checked within I markers can be statistically determined by simulation. Here is a simple explanation. Assuming one symbol contains c bits, we have 2° symbols in the alphabet. Since even a single error will cause the decoding process losing synchronization and the decoded data being garbled, when comparing the marker and the previous symbol, the probability that they are the same roughly equals to the probability that two independent numbers selected from 1 to 2C are same, which is 2?y.2c = ^~°- Letting random variable Y2 represent number of markers error will propagate, we have the probability that error propagates more than I markers, i.e., the misdetection probability, P2[Y2 >l]=
(2~ c )' = 2-
lc
(5)
Now we compare the redundancy and error propagation distance between two approaches. If the redundancy of marker symbol approach is counted in
314
number of bits per symbol, we have R2 — — bits per symbol. For the same amount of redundancy, i.e., Ri = R2, we have -log2(l-e) = - > m
(6)
(1 - e) m = 2~ c .
(7)
or, This means the probability that error propagates more than m symbols in forbidden symbol approach is equal to the probability that error propagates more than 1 marker in marker symbol approach. Similarly for I markers, we have (1 — e) / m = 2~lc and it shows two approach have basically the same error detection capability. Figure 3 compares two approaches with same redundancy. The forbidden space e = 0.12 and marked block size m = 30 are selected. We see two approaches have different error propagation distance PDF, but for error that propagates beyond a marked block size, the detection probabilities are same (the areas of two curves in a marked block size are same). 0.1
— —
ma/kef «ymbol approach (m-30) forttddgn symbol approach (e-0.12) |
0.09 0.08 0.07 0.06 u. 0 0.05 O. 0.04 0.03 0.02 0.01 0 0
10
20 30 40 Position of the detected error
50
60
Figure 3. Comparison of error propagation distances of two approaches
However, with the forbidden symbol approach, the error propagation distance distribution is non-uniform. This makes the approach useful for error correction. The reason is that we can estimate the error location based on the geometrically distributed error propagation distance. While in marker symbol approach, the error location PDF within a marked block is approximately uniformly distributed. We can only guess the error may be in a marked block, but do not know the error position within the block.
315
Although the marker symbol approach is less efficient in error correction, it is simple and does not change the entropy code in encoder and decoder. The approach can be applied to existing systems without modification of encoder and decoder. Moreover, though forbidden symbol approach provides continuous error detection, in current packet switching networks there is no need for high frequency of error checking. By introducing several markers in a packet, the marker symbol approach is capable of detecting errors with less computation complexity. 3
Conclusion
This paper compares the error detection capability between forbidden symbol approach and marker symbol approach using arithmetic coding. The comparison shows two approach have basically the same error detection capability. With its geometric distribution of error propagation distance, the forbidden symbol approach is useful for error correction. The marker symbol approach is simple and does not change the source encoding and decoding design. Though forbidden symbol approach provides continuous error detection, its high frequency of error checking is generally not needed in packet switching networks. The marker symbol approach with less computation complexity may be a better choice in this case. References 1. S. Vembu, S. Verdii, and Y. Steinberg. The source-channel separation theorem revisited. IEEE Trans. Inform. Theory, 41(l):44-54, Jan. 1995. 2. G. G. Langdon, Jr. An introduction to arithmetic coding. IBM J. Res. Develop., 28(2):135-149, Mar. 1984. 3. C. Boyd, J. G. Cleary, S. A. Irvine, I. Rinsma-Melchert, and I. H. Witten. Integrating error detection into arithmetic coding. IEEE Trans. Commun., 45(l):l-3, Jan. 1997. 4. I. Kozintsev, J. Chou, and K. Ramchandran. Image transmission using arithmetic coding based continuous error detection. In Proceedings of the Data Compression Conference, pages 339-348, Snowbird, UT, Mar.-Apr. 1998. 5. B. D. Pettijohn, K. Sayood, and M. W. Hoffman. Joint source/channel coding using arithmetic codes. In Proceedings of the Data Compression Conference, pages 73-82, Snowbird, UT, Mar. 2000. 6. G. F. Elmasry. Joint lossless-source and channel coding using automatic repeat request. IEEE Trans. Commun., 47(7):953-955, July 1999.
317 A N OPTIMAL INVALIDATION M E T H O D FOR MOBILE DATABASES WEN-CHI HOU, HONGYAN ZHANG Department of Computer Science, Southern Illinois University at Carbondale, IL 62901, USA E-mail: [email protected]. edu MENG SU1 Department of Computer Science, Venn State Erie, The Behrend College, Erie, PA 16509, USA E-mail: [email protected] HONG WANG CoManage Corporation, 8500 Brooktree Rd, Wexford, PA 15090, USA E-mail: michellew @ comanage. net Mobile computing is characterized by frequent disconnection, narrow communication bandwidth, limited communication capability, etc. Caching can play a vital role in mobile computing by reducing the amount of data transferred. In order to reuse caches after short disconnections, invalidation reports are broadcasted to clients to help update/invalidate their caches. Detailed reports may not be desirable because they can be very long and consume large bandwidth. On the other hand, false invalidations may set in if detailed timing information of updates is not provided in the report. In this research, we aim to reduce the false invalidation rates of the reports. From our analysis, it is found that false invalidation rates are closely related to clients' reconnection patterns (i.e., the distribution of the time spans between disconnections and reconnections). By using Newton's method, we show how a report with a minimal false invalidation rate can be constructed for any given disconnection pattern.
1
Introduction
Mobility and portability of wireless .communication create an entirely new class of applications and new massive markets combining personal computing and consumer electronics. Information retrieval is probably one of the most important mobile applications. In a mobile computing environment, a set of database servers disseminates information via wireless channels to mobile clients. Clients are often disconnected due to some battery power saving measures [16], unpredictable failures, etc., and they also often relocate and connect to different database servers at different times. Due to the narrow bandwidth of wireless channels, clients should minimize communication to reduce contention for bandwidth. Caching of frequently accessed data at mobile clients has been shown to be a very useful and effective mechanism in handling these problems. Many caching algorithms have been proposed for conventional client-server architectures. However, due to the unique features of the mobile environment, such as narrow 1
Correspondent author.
318 bandwidth, frequent disconnections, weak communication capability (of clients), etc., conventional algorithms are not directly applicable to mobile computing. Research and development in cache management for the mobile computing environment has been discussed, for example, in [2, 3, 4, 6, 7, 10, 11, 16, etc]. In order to reuse the caches after frequent short disconnections, invalidation reports are broadcasted to clients to help update/invalidate their caches [3, 11] in mobile databases. Detailed reports can be long, consuming large bandwidth, and thus may not be desirable. On the other hand, cached items could be falsely invalidated (called false invalidations) if detailed timing information of updates is not provided in the report. In this paper, we discuss how to construct a report with a minimal false invalidation rate. We have found that false invalidation rates have to do with clients' reconnection patterns (i.e., the distribution of the time spans between disconnection and reconnection). By applying Newton's method [1] to the clients' reconnection pattern, a design of the report with a minimal false invalidation rate can be obtained. The rest of the paper is organized as follows. In section 2, we describe and review caching management model in the mobile computing architecture and. In section 3, we take clients' reconnection patterns into account in the design of invalidation reports. By using Newton's method, a design with the minimal false invalidation rate is obtained. Section 4 is the simulation results and the conclusions. 2
Cache Management in Mobile Computing
2.1 Cache Management Problems in Mobile Computing Environment Caching can reduce client-server interaction, lessening the network traffic and messageprocessing overhead for both the servers and clients. Various cache coherence schemes [5, 14, 15, etc.] have been developed for the conventional client-server architecture. Since mobile client hosts often disconnect to conserve battery power and are frequently on the move, it is very difficult for a server to keep track of the status and locations of the clients and the validity of cached data items. As a result, the Callback approach is not easily implemented in the mobile environment. On the other hand, due to the limited power of batteries, mobile clients generally have weak or little transmission capability. Moreover, the narrow bandwidth of the wireless network could be clogged up if a massive number of clients attempt to query the server to validate their cached data. As a result, both the Callback and Detection approaches employed in the traditional client/server architecture are not readily applicable to the mobile environment and new methods for maintaining cache consistency have to be designed. Updates are generally made by the server and broadcasted to its clients immediately. Thus, as long as a client stays connected, its cache will be current. Discarding entire caches after short and, moreover, frequent
319 disconnections could be wasteful, as many of the cached items may still be valid. Thus, research in cache consistency has aimed to reuse the caches after short disconnections. An approach of broadcasting invalidation messages to clients to help update their cached items has attracted a lot of attention [8, 9, 10, etc]. It is generally assumed that there is a dedicated channel for broadcasting invalidation messages, which is different from the channel for data broadcast. Based on the timing of the invalidation messages being broadcasted by the servers, cache invalidation methods can be either asynchronous or synchronous. In the synchronous approach, the server gathers updates for a period of time and broadcasts these updates with the time when they were updated in the report. Note that some latency could be introduced between the actual updates and the notification of the updates to the mobile clients. Once invalid items are found in the cache, the client has to submit an uplink request for updated values. The broadcast of the invalidation report divides the time into intervals. A mobile client after reconnection has to wait until the next invalidation report has arrived before answering a query. That is, a mobile client keeps a list of items queried during an interval and answers them after receiving the next report. 2.2
Broadcasting Timestamp (BT) Strategy
Broadcasting timestamp (BT) strategy [3] was developed based on the synchronous invalidation approach. The report is composed of a set of (ID, timestamp) pairs, in which ID specifies an item that has been updated and the timestamp indicates when a change was made to that item. The longer the update activities are recorded in a report, the larger the invalidation report is, which can lead to a longer latency in dissemination of reports. 2.3
Bit-Sequence (BS) Approach
In the above approach, updated items are indicated by IDs and their respective timestamps in the report. When the number of items updated is large, the size of an invalidation report can become very large too. In order to save the bandwidth of wireless channels, the bit-sequence (BS) approach is proposed [11]. The bit-sequence mapping aims to reduce the naming space of the items. Since our approach is based on the BS approach, we will elaborate a little more here on this approach. In the BS approach, each data item is mapped to one bit of an N-bit sequence, where N is the total number of data items in the database. That is, the n* bit in the sequence corresponds to the n"1 data item in the database. A value "1" in a bit indicates the corresponding data item has been changed; and "0" indicates otherwise. This technique reduces the naming space for N items from Nlog(N) bits as needed in BT
320 approach to N bits here. It is noted that at least log(N) bits are needed to store an item's ID in the BT approach. The bit-sequence mapping is illustrated in Figure 2.1. Database
/
/ 1
y
/
3
4
N-l
k
it
i i
a
1
0
2
N /
1 i.
1
A
1
1
ii
0
Figure 2.1 Bit Sequence Mapping
In order to reduce false invalidations, a hierarchically structured and more detailed report is proposed [11]. Instead of using just one bit-sequence (and a timestamp), n bit-sequences (n > 1) (each is associated with a timestamp) are used in the report to show the update activities of n overlapping subintervals. Specifically, the i"1 sequence (0 < i < n-l), denoted Bj, has a length of N/21 bits, where N is the number of data items in the database; it records the latest N/2 ,+1 update activities. Each bit-sequence is associated with a timestamp T(B;) indicating since when there have been such N/21+1 updates. As shown in Figure 2.2, the first bit-sequence B 0 , has a length of N and has N/2 " 1 ' bits, showing that N/2 items have been updated since T(B 0 ). The second sequence Bi has N/2 bits, each corresponding to a data item that has been updated since T(B 0 ) (i.e., the " 1 " bits in B 0 ). Again, half of the bits in B] (i.e., N/4 bits) have the value " 1 " , indicating half of the N/2 items that have been updated since T(Bo) were actually updated after T(B]). In general, the j " 1 bit of the Bj represents the j * " 1 " bit in the Bn, and half of each bitsequence are l's. It can be observed that the total number of bit-sequences n is log(N). The modified scheme is called the dynamic BS. Instead of mapping an item to a bit in B 0 , each updated item is represented explicitly by its ID in the dynamic BS. Thus, B 0 is now made of the IDs of those items that have been updated since T(B 0 ). The rest of bit-sequences (B], ... Bn.]) are constructed in the same way as in the original BS. That is, sequence Bj (0 < i < n-l) has k/21"1 bits, with half of them being "l"s, where k is the number of items updated since T(Bo). If both an ID and a timestamp are implemented as a 32-bit integer, the total size of a report can be calculated approximately as 32k + 2k + 32 log(k), where 32k is the size of k IDs (i.e., Bo), 2k is the size of the rest of the bitsequences (i.e., Bi, ... Bn_i), and log(k) is the number of timestamps (or the number of bit sequences) [11].
321
Jata base
/
/
/
/
1 I k
. 1
3 |
\ N
1
0
\
9
I'l'
• • • •
Client 1 dlient2 •
1/ N/2
1
• •
1
0
0
0
"<J
tt
\
1
Clients' disconnectio n time
1
1
0
T(B,)
T(B„.2) Client 3
_*_
Cuiren time
2 bits:
1
0
T(B».,)
Figure 2.2 An Invalidation Report with Client Disconnection Time
3
An Optimal Construction of Hierarchical Bit-Sequences
Although Jing's hierarchically structured bit-sequences discussed above reduced the naming space of items and number of timestamps, there is no justification for why the bit-sequence hierarchy should be constructed based on half of the updates, i.e., N/2' or k/21 updates. In fact, this "half-update-partition" scheme could favor shorter reconnections than longer ones. Here, we use Figure 2.2 as an example. After reconnection at the current time, clients 1 and 2 will rely on Bo to invalidate their data items, and client 3 will use Bn.2. All cached items updated between T(B0) and T(Bi) in clients 1 and 2's caches will have to be invalidated after reconnection, even though some of them might have been already updated in the caches before disconnections, recalling that updates are immediately reflected on the clients' caches while connected. Notice that the time span between T(Bo) and T(Bi) is much longer than the time between T(B„.2) and T(Bn.!). Tlius, if there are a large number of clients like clients 1 and 2, who disconnected
322 during the earlier period of the window (which is quite likely), there could be a lot of items falsely invalidated. Clearly, this hierarchical structure with "half-update- partition" cannot achieve minimal false invalidations. We redesign Jing's hierarchically structured bit-sequences to minimize the false invalidation rate. Specifically, we will investigate the division of the n overlapping subintervals in the report such that the false invalidations can be minimized. As in any approach, a report can only cover the update activities of a limited period. The window size of an invalidation report W refers to the time period of updates covered in a report. The larger the window size, the longer the clients can stay disconnected without discarding the caches. However, the larger window size also gives rise to larger reports, which may cause longer latency between two consecutive reports, recalling that a reconnecting client has to receive a report before it can answer any query. Here, we assume that both W and L are fixed and predetermined by the system. 3.1 Reconnection Patterns It is observed from Figure 2.2 that false invalidation rates are closely related to the reconnection patterns of mobile clients (i.e., how long clients are likely to reconnect after disconnection). Therefore, to reduce the false invalidation rates, a reorganization of the bit-sequences that takes into account clients' reconnection patterns needs to be devised. Assume that the reconnection pattern can be represented by a certain probability Frequency
CT: Current Time DT: Disconnection time
1
1
1
2
1 3
I 4
1 5
Figure 3.1 The Reconnection Time Distribution
1 6
*• Reconnection time: CT - DT
323 distribution, such as the one shown in Figure 3.1, where the X-axis is the difference between the reconnection time and the last disconnection time of the mobile clients and the Y-axis represents the number of clients. Let us now analyze the relation between the false invalidations and the reconnection distributions. Assume that a mobile client disconnected at time x. After it reconnects and receives the first report, it looks for a sequence, say B i; in the report, with the largest timestamp that is less than or equal to its disconnection time x (i.e., T(Bj) < x ). If the client did not disconnect exactly at T(Bj), there might be a chance for false invalidation, because the client may have already updated some of the items in its cache between T(Bj) and x when it was still connected. The larger the difference between x and T(Bj), the more items might have been updated by the clients before disconnection, and those items would be falsely invalidated when reconnected. Now, let us derive the relationship between false invalidation and division of the window for a given reconnection pattern. Assume updates arrive at a rate C. Since the server receives update requests from users of all kinds, we may assume that updates are independent. Then, the expected number of items falsely invalidated for a client disconnected at x , denoted FI(x), is C*(x-T(Bt)). Letj{x) be the reconnection pattern. Then, the expected total number of falsely invalidated items, denoted by TFI, is „_1 7-(Bl+
1
)
m = c i ( i(x-T(B,y)* f(x)dx) 1=0
T(Bi)
= CX( \x*f(x)dx)-CJX Jr(B ; )*/W^) ;=o nsj) >=o us) 7W
„-i r
T(Ba)
i=0
= C \x*/(x)dc-CX(
J7X3)*f(x)dx)
T(Bil
where n is the number of bit-sequences in the report, T(B„) = CT (i.e., the current time). To minimize TFI is equivalent to maximum "y, <'=»
'{T(B ) * f(x)dx) ' a s r(fl,)
\x * f(x)dx T(B 0 )
constant. We will see how to find a partition of the window into n subintervals to maximum £ ( JT(B,) * f{x)dx) fr°m '=0
T{B,)
me
following theorem and its proof.
IS a
324
Theorem 1. Let £ be an arbitrary positive real number, fix) be a continuous real function. Then, there exists a vector X = [xx, • • •. xn_x ] r such that n-i
X
.
M
g(X) — S ( I x-* f(x)dx) i'=o
.
.
.
.
,
obtains its maximum, where n, Xo, xn are three constants,
Xi
x <•••< x < x-^, <•••< x , and the vector X can be approximated with error £ 0
I
,+1
n
Proof: Let x0 = a, Xn = b, then afaf(x)dx < g (X) < bfaf(x)dx. There exists a maximum value of continuous function g (X) for a-xQ<xx <--<xn_x <xn=b. When a = x0 = xx =••• = xn_ltxn =b, we have gXX)=aIf(.x)dx. a
The solution xx, • • •, JC„_J of dg(*) = 0 ,- _ i 2 ... n -1
must
^e m e
vames sucn
dxj
that g (X) attains maximum atX=[a, xh x^ ..., JC„_I , b]T. Consider the following equation ?4^- = & + if(x)dx + f(xi)(xi_l-xi) = 0,i = l,2,-,n-l. (D dXj
i
g (X) is smooth and it can attain its maximum. Therefore, the equation (1) has x =[xit,-,x„_ , — ,x l]T, solutions. Let X
F^)
=
MX),and
F(X) = [f,(X), F2(X),
•,/v 1 (X)] T
dX:
then (1) is equivalent to F(X) = 0, where 0 is a n-1 dimension vector [0, 0
(2) 0] . By using Newton's iterative method [1], T
we can find the approximation of the solution Xt as following: F(Xk) + F(XkXXM-Xk)
= 0,
(3)
XM-Xk=-F\Xk)-lF{Xk) = -DF(Xky1F(Xk),
*=0,1,2, •••.
We know that Xk -> X,, X, is the solution of (2) such that g ^ ) attains the maximum at X,
=(a,Xt,bf. We choose initial value
^o=[ Jc o,i. Jc o,2'---' Ji: o, n -i] T ' a<x0i
for i = l,2,---,n-l.
325 In (3), DF(X)
is the Jacobian matrix and Af,
f(x2) M, /(*,)
/(*z)
DF(X) --
0
/(* 3 ) M3
/U4)
0 0
/(*_,)
M„_,
whereM, =-2/(jt f ) + (*,_, -JC,-)/'(*,•) fori = l , 2 , — , n - l , Z)F(X)is asymmetric tridiagonal matrix and (3) is equivalent to the linear system equations DF(Xk)(X-Xk) = -F(Xk) (4) By using LU decomposition method [1], we can solve for the linear equations (4) to obtain Xk+i.
We know that the complexity of this method is O(n), n is the order of the
tridiagonal matrix. Equation (4) can be solved for sequence {jft}k = l,2,---. After Nsteps of iteration, the complexity is N • 0{ri) = 0(n). The arbitrary approximation of the solution in polynomial time can be obtained by using I*™ ~ ** | ! = "S <**+!..- - **, ) 2 ^ e. E > 0. II
112
,-_,
to find the number of steps N. Hence the result satisfies the precision requirement. Example 1. Assume that the reconnection pattern has a uniform distribution within the window, that is, fix) = c, where c is a constant. From the above theorem, we have g(X) = 1\™xif(x)dx=ctxi{xM
-x,)
Ft ( f ) = £f+1 cdx+ C(JC,._, -*,•) = c(*,+i - xi + XM ~ xi) fori = 1,2,- • •, n -1. If F ; ( X ) = 0 , we have xi+l — xt — jcg-_j — xt, then
+—. This indicates evenly n spaced intervals gives the optimal solution when the reconnection distribution is uniform within the window. x
= XQ
Example 2. Assume we are to divide a window of size 10 into 3 subintervals, and the reconnection distribution follows the formula
326 x,
0<x <5
y= 10-x, 5<x<10; Then, from the above theorem, we know that there exists a maximum value of g(X), where g(X) = xAXl
ydx + x,|"°ydx. a n d O < X l < x 2 < 1 0 .
If0<x,<5, 5<x2<10, g(X) = ^x2(x2
- 1 0 ) 2 - i ( * 2 -10) 2 x, +25*, - | V -
IfO<x, < x 2 < 5 , g(X) = iAT1(JC22-V) + ^ 2 ( 5 0 - X 2 2 ) . If 5 < JCJ < x 2 < 1 0 , * ( ^ ) = | ( * i -x2)(x2 -10) 2 - ^ U 2 -10) 2 ^,. With the help of MATLAB, we obtained that g(X) has the maximum value of 86.9385 when Xi = 3.1008 and x2 = 5.4005. Thus, we will divide the window at 3.1008 and 5.4005. In Section 4, we will use these two distributions to perform simulations and measure the performance. 3.2 Algorithms for Clients and Servers The algorithm has two parts: one runs on the server side and the other on the client side. The server maintains a sorted linked list, which we call an updatelist. The list contains data items updated during the last window period in chronological order. Each node in the linked list has the following data structure. typedef struct list { int index; int updatetime; struct list *next; int oneposition; } updatelist;
//the index number of the data item in the database // the time of the update // a pointer to the next node // the position among the " 1 " bits, i.e., ith 1-bit in the bit sequence
The server constructs and broadcasts the reports periodically using a dedicated channel, while the clients interpret the reports after reconnection. The construction of the
327 bit-sequences at the server side and the interpretation of the report at client sides are described as follows. • Server side algorithm 1. for (i = 0; i < n; i ++) // calculate timestamps T(Bj) for each Bj T(Bj) = CurrentTime - time[i]; // time[] stores the interval dividing values derived from Theorem 1. 2. for each node in the list // constructing B0; onecount = 1, initially { set the j l h bit of B 0 to " 1", where j is the index of the data item represented by the current node; oneposition (of the current node) = onecount; onecount = onecount +1; } 3. for (k=l; k S n-1; k++) // constructing Bk, 0< k < n-1 { allocate space for Bk (of length onecount bits) and intialize it to all 0's; onecount =1; while (updatetime of the current node > T(Bk)) do // for all nodes in the updatelist { set the j t h bit of Bk to "1", where j is the value of oneposition of the current node; oneposition (of current node) = onecount; onecount = onecount +1; } } • Client side algorithm: An input to the algorithm is the variable "Last" that indicates the last time when the client received a report. 1. if T(Bn.!) < Last, no data cache needs to be invalidated, Stop; //cache is up to date 2. if Last < T(B0), the entire cache is invalidated, Stop; // outside the window 3. Locate the bit sequence Bj such that T(Bj) < Last and Last < TCBj+i) for all j (0 < j < n ); 4. Invalidate all the data items represented by " 1 " bits in Bj. To determine the data items corresponding to the " 1 " bits in Bj in the step 4 above, the following algorithm can be ued. A: if j = 0, then use the positions of those " 1 " bits in Bn to identify the data items and stop; for each "0" bit in Bj, reset the ith " 1 " bit in Bj_i, where i is the position of a '0' bit in BJ; j = j - 1 and go back to step A. 3.3 A Dynamic Scheme Like the dynamic BS scheme [11], we can modify our method a little bit to further reduce the size of the report when then number of items updated is small. That is, instead of using an N-bit sequence for B 0 , we use explicitly the IDs of items that have been update since T(B 0 ). Other bit sequences (Bj, ..., B„_i) are constructed as before. If both IDs and timestamps are implemented as 32-bit integers, the overall size of the report is 32k +
328 y"! " I Bj I + 32n, where the first term is the size of k IDs (or B 0 ), the second term is the size of the rest of the bit-sequences (i.e., B], ... Bn.!), and the last term is the size of n timestamps. Note that the number of timestamps (or bit sequences) in our approach is in general different from that of Jing's. We will pick up this issue in the next section. 4
Preliminary Simulation Results
In this section, we report simulation results on length of the reports and false invalidations of our and Jing's approaches. We have chosen to experiment with the dynamic schemes of these two approaches because of their flexibility in accommodating a variable number of updates in the report (especially for Jing's approach). The results ought to apply to the original schemes without any difference. Due to space limitation, we present only some of the important results here. Interested readers are referred to [17] for more comprehensive simulation results. A report has two parts - bit-sequences and timestamps. Since item IDs are used in the first bit-sequence B 0 in both approaches, we shall exclude B 0 from the "bitsequences" in the following discussion, unless otherwise stated. We will also discuss the effect of timestamps on the overall size of a report later. The purpose of the simulations is mainly to compare the size of the reports and the effectiveness of the bit-sequences in reducing false invalidation rate, denoted FIR, which is defined as p T R _ number— of — items — falsely—invalidated number—of — items—invalidated To the best of our knowledge, there has been no study on the distributions of potential reconnection patterns. Therefore, we have chosen to use the two patterns, a uniform distribution and a non-uniform distribution with a peak in the middle of the window as described in the Examples 1 and 2 of Section 4.1 for our simulations. Hopefully, these distributions are good approximations to some of the potential reconnection patterns. It can be observed that cache size has no effect on FIR, which is due to the inaccuracy of the report. Therefore, we shall not mention cache size in the following discussion. We have also tested with various database update rates, 10%, 20%, 30%, 40%, and 50%, which are the percentages of items updated during the last window period, to see their effects on the false invalidation rate. The lengths of the bit-sequences in two approaches are usually different. As mentioned earlier, the expected size of Jing's bit-sequences (excluding B0) can be calculated beforehand as 2k, while in our approach it depends on the number of subintervals in the window. In order to compare the effectiveness of bit-sequences, we
329 have chosen the number of subintervals to be 3 in our approach so that the lengths of our bit-sequences can be as close to Jing's as possible. As discussed earlier in Section 3, for a uniform reconnection pattern, an evenly divided window gives the optimal performance. For the convenience of calculation, the window size has been set to be 10 time units in all simulations. In the following tables, we report the length of bit-sequences in bits in the "Length" column. The row "Ratio" shows the ratios of the length and FIR of our approach to Jing's.
Update Rate 10% Length FIR Optimal Jing's Ratio
20% Length FIR
30% FIR Length
40% Length FIR
50% Length FIR
1731 0.1883 2281 0.2008
3398 0.1878 4313 0.2004
5065 0.1884 6344 0.2001
6732 0.1881 8345 0.2006
8397 0.1880 10378 0.2005
0.7589 0.9379
0.7879 0.9371
0.7984 0.9412
0.8067 0.9377
0.8091 0.9378
Table 4.1. Uniform Distribution
As observed from Table 4.1, not only our bit-sequences are shorter than Jing's, but also achieve a slightly better (or lower) false invalidation rates (approximately 94% of Jing's). This implies our bit-sequences are more effective in lowering the false invalidation rate. It can be further observed that our bit-sequence size is around 80% of Jing's (i.e., 0.7589, 0.7879, 0.7984, 0.8067, 0.8091 for 10%, 20%, 30%, 40%, 50% update rates, respectively). This result is consistent with our analysis on the estimations of bit-sequence sizes. Recall that the length of Jing's bit-sequences is 2k (excluding B 0 ), while ours is k + (2/3)k = (5/3)k, where k is the size of Bi (and also the number of items updated), and (2/3)k is the expected size for B2. That is, ours is only 83% ((5/3)k / 2k ~ 0.83) of Jing's. The FIRs basically remain the same for different database update rates in each approach, that is, around 18.8% in our approach and 20.0% in Jing's approach. It indicates that FIR has to do with the ways of constructing bit-sequences, but has nothing to do with the rates of updates. Now let us consider the size of timestamps. The total size of timestamps in Jing's report is 321og(k), while it is 32n in ours, where log(k) and n are the numbers of bit-sequences in respective reports. In our report, there are 3 (i.e., n = 3) timestamps, while in Jing's report, it has log(k) timestamps (log( 1,000)= 10, log(2,000)=ll log(5,000)=13 for 1,000, 2,000, ..., 5,000 updates, respectively, during the last window). Clearly, we use much less timestamps and consume less space than Jing's report.
330 Update Rate 10% Length FIR
20% Length FIR
30% Length FIR
40% Length FIR
50% Length FIR
Optimal
1754 0.1755
3444 0.1759
5134 0.1755
6826 0.1754
8513 0.1757
Jing's
2281 0.2315
4313 0.2307
6344 0.2310
8345 0.2313
10378 0.2309
Ratio
0.7690
0.7581
0.7985
0.7626
0.8093
0.7598
0.8180
0.7585
0.8203
0.7610
Table 4.2 Non-uniform Distribution
In Table 4.2, we show the results for the non-uniform distribution described in Example 2 of Section 3.1. According to Theorem 1, we divided the window at 3.1008 and 5.4008. Again, our bit-sequences are shorter (about 80% of Jing's), and yet achieve much better (or lower) false invalidation rates, that is, 76% of Jing's. As in the uniform case, the FIRs remain the same in each approach for different item update rates. The FIRs are around 17.6% in our approach, compared to 23.1% in Jing's. It is worth mentioning that if there is enough bandwidth for longer reports, in our approach we can easily divide the window into more subintervals (i.e., more bitsequences) to achieve lower false invalidation rates. However, this may not be possible for Jing's approach because the number of bit-sequences (i.e., log(k)) is completely determined by the number of updates k during the window period (assumed fixed), and the number of updates and thus the update rate have nothing to do with FIR, as shown in the tables. That is, even though there is still bandwidth left for use, Jing's approach simply cannot use it to reduce the false invalidate rates. (Excess bandwidth may be used to cover longer periods though). In summary, our approach clearly outperforms Jing's approach in terms of the length of bit-sequences, number of timestamps, effectiveness of reducing FIR, and flexibility in using excess bandwidth to reduce false invalidation rates. References 1. 2. 3.
4.
Axelsson. "Iterative Solution Methods", Cambridge University Press, 1994. D. Barbara, "Mobile Computing and Databases-A Survey", IEEE Transactions on Knowledge and Data Owe Engineering, pp. 108-117, Vol. 11, No. 1, Jan/Feb, 1999 D. Barbara and T. Imielinski. "Sleepers and workaholics: Caching strategies for mobile environments". Proc. of the ACM SIGMOD Conference on Management of Data, pp. 1-12, May, 1994. J. Cai, K, Tan, and B. Ooi, "On Incremental Cache Coherency Schemes in Mobile Computing Environments," Proc. of IEEE Data Engineering, Pg. 114-123, April, 1997
331 5.
6.
7.
8. 9. 10.
11.
12. 13.
14. 15. 16.
17.
M. J. Carey, M. J. Franklin, M. Livny & E. J. Shekita, "Data Caching Tradeoffs in Client-Server DBMS Architectures", Proc. of ACM 1991 SIGMOD, pp. 357-366, May, 1991. H. Chung, H. Cho, "Data Caching with Incremental Update Propagation in Mobile Computing Environments", Proc. Australian Workshop on Mobile Computing and Databases and Applications, pp. 120-134, Feb. 1996. A. K. Elmargarmid, J. Jing, and T. Furukawa, " Wireless Client-Server Computing for Personal Information Services and Applications," ACM SIGMOD Record, pp. 4349, Dec. 1995. M. Franklin, M. Carey, and M. Livny. "Global Memory Management in ClientServer DBMS Architectures". Proc. ofVLDD, pp. 596-609, August 1992. C. G. Gray and D. R. Cheriton. "Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency". Proc. ofSOSP, pp. 202-210, Feb. 1989. T. Imielinski, S. Vishwanath, and B. R. Badrinath. "Energy efficient indexing on air", Proceedings of the ACM SIGMOD Conference on Management of Data, Minneapolis, Minessota, 1994. J. Jing, A. K. Elmagarmid, A. Helal, and R. Alonso. "Bit-Sequences: An Adaptive Cache Invalidation Method in Mobile Client/Server Environments". ACM/Baltzer Journal of Mobile Network and Applications, 2(2), pp.115-127, 1997. M., Stonebaker, et al, "Third -Generation Data Base System Manifesto," SIGMOD Record 19, 3, pp. 241- 234, Sept. 1990. J. Strain, R. Acuff, T. Rindfleisch & L. Fagan, "A Pen-Driven, Mobile Surgical Database: Design and Implementations," http://www.smi.stanford.edu/projects/mobile/amia94-2.html, 1994. Y. Wang, "Cache Consistency and Concurrency Control in a Client/Server DBMS Architecture", Proc. of ACM SIGMOD 1991, pp. 367-376, May, 1991. K. Wilkinson & M. Neimat, "Maintaining Consistency of Client-Cached Data", Proc. of 16th VLDB, pp. 122-133, Aug.1990. K.L. Wu, P.S. Yu and M.S. Chen "Energy-efficient Caching for Wireless Mobile Computing", Proc. 12th International Conference on Data Engineering, pp. 34-50, Feb. 1996. H. Zhong, "A New Invalidation Method for Cache Management in Mobile Databases", Master Thesis, CS Department, SIU, May, 1999.
333 COMPARISON OF WAVELET COMPRESSION ALGORITHMS IN NETWORK INTRUSION DETECTION
ZHENG ZHANG, CONSTANTINE MANIKOPOULOS, Electrical and Computer Engineering Department, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA, E-mail: [email protected], [email protected]
Department
JAY JORGENSON CCNY, Convent Ave. at 138 ST., New York, NY 10031, USA E-mail: jjorgenson@mindspring. com
of Mathematics,
JOSE UCLES Network Security Solutions, 15 Independence Blvd. 3rd FL., Warren, NJ 07059, USA E-mail: [email protected] In this paper we report on the experimental results of the effectiveness of using wavelet compression on the monitored data collected in HIDE, a hierarchical intrusion detection system. HIDE measures the network traffic parameters and abstracts them into probability density functions (PDFs), utilizing sixty-four bins. For the sake of resource optimization, we compressed these PDF representations using various wavelet families and then compared their effectiveness. The four families we studied are: hoar, sym2, coi/3, db4. The results showed that all four wavelet bases can reliably compress the PDFs from 64 bin values to 16 to 28 wavelet coefficients (compression range 3), without major performance deterioration. In comparing the four wavelet algorithms, we found that the sym2 wavelet family performed the best; its performance at compression range 5, employing only six wavelet coefficients, resulting in compression ratio of 10.67, is still very satisfactory.
1
Introduction
Network intrusion detection is increasingly needed to protect networks and computers from malicious network-based attacks. Intrusion detection techniques can be partitioned into two complementary trends: misuse detection, and anomaly detection. Misuse detection systems, such as [1][2], model the known attacks and scan the system for the occurrences of these patterns. Anomaly detection systems, such as [3] [4], flag intrusions by observing significant deviations from typical or expected behavior of the systems or users. In [4], we proposed the prototype of a Hierarchical Intrusion DEtection system (HIDE) that uses statistical preprocessing and neural network classifications to detect network-based attacks. HIDE is a hierarchical anomaly intrusion detection
334
system. It gathers data from network traffic, the system log and hardware reports; it statistically processes and analyzes the information, detects abnormal activity patterns based on the reference models, which correspond to the expected activities of typical users, for each parameter individually, or in combined groups using neural network classification; it generates the system alarms and event reports; finally, HIDE updates the system profiles based on the newly observed network patterns and the system outputs (see Fig. 1). Subsequent simulation experiment results have shown that the system could identify attacks accurately and efficiently [5] [6] [7]. Inputs
Network Data • Network Traffic Information (IP, UDP.TCP,...) • Higher Layer Log Information • Hardware reports and information from some other sources
HIDE
Outputs
Event Probe and Analysis
Intrusion Detection
Anomaly Alarm Network Reports Event Log
2££ System Update
Fig. 1 Inputs and Outputs of HIDE
In HIDE, the network traffic parameters are measured and built into probability density functions (PDF). Each PDF is represented by sixty-four bins, with each bin corresponding to the probability of a certain event. The PDF representation is capable of displaying the nimble differences between the activity patterns of legitimate users and those of intruders, thus leads to a better performance than traditional threshold-based IDS. However, building and handling PDFs also consume more system processing power and need more memory and storage resources. Data compression of the PDFs has the potential to enhance the efficiency and the performance of our system. Some preliminary results using haar-based wavelet compression in HIDE are reported in [7]. In this paper, we present our experiments on four different wavelet algorithms with various compression ranges, from range 1 to 5, applied to the PDFs of HIDE. The rest of this paper is organized as follows. Section 2 describes the usage of probability density functions (PDFs) for intrusion detection. Section 3 introduces the wavelet algorithms and the experiment approaches we made. In Section 4, we report the test bed and the attack schemes we simulated. Some experimental results are also in section 5. Section 6 draws some conclusions and outlines future work.
335
2
PDFs for Intrusion Detection
HIDE uses statistical models and neural network classifiers to detect anomalous network conditions. The statistical analysis bases its calculations on PDF algebra, in departure from the commonly used isolated sample values or perhaps their averages, thus resulting in higher classification accuracy. Our system generalizes further by combining the information of the PDFs of the monitored performance parameters, either all of them or subgroups of them, in one integrated and unified decision result. This combining is powerful in that it achieves much higher discrimination capability that enables the monitoring of individual service classes in the midst of general traffic consisting of all other classes. It is also capable of discriminating against intrusion attacks, known or novel. 3
Wavelet compression
For each wavelet family we computed a multi-level one dimensional wavelet analysis and computed approximation coefficients, which are obtained by convolving with a low pass filter. Further levels of approximation coefficients were obtained by repeated convolutions with the given low pass filters. The approximation PDF was then obtained by direct reconstruction via the approximation coefficients. The analysis we undertook is available using commands within the Wavelet Toolbox of MatLab [11]. • Original PDF - PDF after haar wavelet
Puik_ 10
20
30
40
A. M 50
60
Original PDF _ _ PDF after coif3 wavelet
, I
_ ^ ^ _ ^ 10
20
30
40
SO
60
10
20
30
40
50
60
Original PDF PDF after db4 wavelet
f
•
TA 10
A^ 20
30
^ w * ! - , 40
50
Fig. 2 A Sampled PDF with Different Wavelet Compressions
Among the many wavelet families that exist [8], [9], [10], we selected a wavelet from each of the following families: the classical Haar basis (called haar afterward), the Symlets basis (called sym2 afterward), Coiflets (called coifi afterward), and Daubechies (called db4 afterward) wavelets. The Haar wavelet is
336
the oldest and simplest wavelet, and it has the shortest support among all orthogonal wavelets. The Haar basis is a multiresolution of piecewise constant functions and, in practice, is known to not be well adapted to approximating smooth functions because it has only one vanishing moment. Symlets wavelets are characterized as being compactly supported, orthogonal wavelets with least asymmetry and highest number of vanishing moments for a given support width. Graphs of the scaling functions can be found on page 254 of [9] (see also [10]). Coiflet wavelets are characterized has being compactly supported, orthogonal wavelets with minimum support width while requiring the highest number of vanishing moments for both the scaling function and the wavelets. Coiflets first appeared in applications to numerical analysis (see page 254 of [9]). Daubechies wavelets have the minimum support width for a given number of vanishing moments. Daubechies wavelets are orthogonal and very asymmetric. Graphs of the scaling functions can be found on page 253 of [9] (see also [10]). Sampled PDFs of these four wavelet compression algorithms are shown in Fig. 5. In HIDE, we are using wavelet algorithms to compress the network parameters, which were measured as PDFs with 64 bins, into sets of wavelet coefficients. To compare the system performances under various wavelet compression ranges, the compressed PDFs are decompressed and then processed by the statistical modules and the neural network classifiers. The outputs with wavelet compressions are compared with those without compression. The process is illustrated in Fig. 3.
Probe & Event Preprocessing
Fig. 3 Wavelet Compression and Decompression
The numbers of wavelet coefficients obtained for each range for each of the utilized wavelets are tabulated in Table 1. Table 1 Wavelet Coefficients for Different Wavelet Compressions
Wavelet haar | sym2 coi/3 db4
Range1 64 64 64 64
Number of Coefficients Range 2 Range 3 Range 4 32 16 8 33 18 10 21 35 14 22 40 28
Range 5 4 6 10 19 |
337
In the table, PDFs at compression range 1 correspond to uncompressed PDFs. Note that the wavelets coifi and db4 require the largest number of coefficients at each range, while the wavelets haar and sym2 provide the greatest amount of compression. 4
Testbed
We constructed a virtual network using simulation tools to generate attack scenarios. The experimental testbed that we built using OPNET, a network simulation facility, is shown in Fig. 4. The testbed is a 10-BaseX LAN that consists of 11 workstations and 1 server.
Q 11 UDP flooding attack*
n
wt»m_i
n
v*ctn_3
•
n
n
11
wfcstn_5
wkitn_7
n
«riutn_S
Fig. 4 Simulation Testbed
We simulated the UDPfloodingattack with 2 Mbps background and 100 kbps attack traffic using the testbed. 5
Experimental Results
We collected 6000 records of network traffic. These data are divided into two separate sets, one set of 4000 data for training and the other of 2000 data for testing. In each scenario, the system was trained for 100 epochs. In subsection 5.1, we studied the PDF figures of different wavelet compression algorithms. The subsection 5.2 describes the mean squared root errors and the misclassiflcation rates of the outputs. The Receiver Operating Characteristic (ROC) curves of the wavelet results are shown in subsection 5.3. 5.1
PDF Figures
Some pictures of the decompressed PDFs with various compression ranges are plotted in Fig. 5.
338
i"iL^ 10
0.1
n 0.05
>
^
20
haar range 1 I . 40
30 |
IA" \i\ . 10
I A
.
n
20
A. 50
r\ |
.
n, r
r\. 40
30
A , 60
haar range 2 |
SO
60
haar range 3 I
0 10
£
L
0.05
e *
n
10
0.04 e
1°n
40
50
60
haar range 4 I
/ 20
30
SO
40
60
haar range 5 I
. /
* °-2
sym2 range 1 I
\
10
,
vu 10
t-i t-: 1H
30 I
I.
0.02
I*
20
A 20
30
40
A, 50
sym2 range 2 I. /\ 20
30
40
\i
AyC SO
30
40
50
A
A
10
20
1
30
40
/\
V 10
60
S \ 20
30
40
20
30
40
50
A / 60
S\J. 50
60
db4 range 3 I
20
30
40
sym2 range 4 |
10
A. 50
db4 range 2 I
: ^ •
20
\.
10
60
sym2 range 3 j
10
db4 range 1 I
I 0.1 \
A / 60
50
60
db4 range 4 I-
10
60
sym2 range 5 I-
g
20
30
40
0.04 0.02
•
50
60
db4 range 5 I
~~:
Fig. 5 Some HIDE PDFs at Various Wavelet Compression Ranges
From the figures, we can see that, as the compression ranges (and ratios) get higher, the PDF shapes look more and more different from the original uncompressed PDFs. This is, of course, expected since we are loosing more information by representing a PDF with fewer coefficients. For all wavelet bases, the reconstructed PDF curves of compression range 2 are all very close to the original PDF, which hint that there might little or no performance differences at range 2. In fact, compression range 3 appears promising as well. However, visually significant PDF shape details seem to be lost at ranges 4 and 5.
339
5.2
MSR Errors and Misclassification Rates
We evaluated the mean squared root errors and the misclassification rates of the system using different wavelet compression ranges, Fig. 6. The misclassification rate is defined as the percentage of the inputs that are misclassified by neural networks during one epoch, which includes both false positive and false negative misclassifications. In Fig. 6, the x-axis values represent the five compression ranges we tested.
compression range
compression range
Fig. 6 MSR Errors and Misclassification Rates
We can see that the curve of wavelet compression algorithm sym2 rises slowly, and, even for compression range 5, the algorithm still shows strong performance with misclassification rate about 2%. For the other three wavelet bases, the system performance is satisfactorily for ranges 1 to 3, but then deteriorates for ranges 4 and 5. Therefore, wavelet compression algorithm sym2 is a more appropriate choice for PDF compression. In practice, we found that sym2 wavelet compression with compression range of 3 is a safe choice for HIDE to maintain a satisfying performance while boosting system resource efficiency by four fold. From the PDF figures in the previous subsection, the decompressed PDFs of range 3 or higher are noticeably different from the original PDF, but the results in this subsection shows no big difference from range 1. One explanation is that wavelet compression at range 3 still keeps the necessary information to identify typical and abnormal traffic patterns. 5.3
ROC Curves
The Receiver Operating Characteristic (ROC) curves are illustrated in Fig. 7. The xaxis of the figure is the false alarm rate, which is the rate of the typical traffic events being classified intrusions; the y-axis of the figure is the detection rate, which is calculated as the ratio between the number of correctly detected intrusions and the total number of intrusions. For each curve, the point at the upper left corner
340
represents the optimal detection with high detection rate and low false alarm rate. Each Curve corresponds to the system characteristic under certain wavelet compression range. - ' O •••<&
- © - compression compression compression compression compression
0.2
range range range range range
C
5 4 3 2 1
<3>
*3t>*K3>-*<37* O Q
£ °-'
0.4 0.6 false alarm rate (haar) tffr
- e - compression compression O compression compression a compression
-o-^t•p. _*_ _&_
0.2
compression compression compression compression compression
range range range range range
5 4 3 2 1
0.4 0.6 false alarm rate (sym2)
0.8
Q3> Q ? 0 O
range range range range range
5 4 3 2 1
E 0.
-e-J^-0_ if... _$_
compression compression compression compression compression
range range range range range
5 4 3 2 1
0-2;)
0.2
0.4 0.6 false alarm rate (coif3)
O.E
0.4 0.6 false alarm rate (db4)
0.8
Fig. 7 ROC Curves
From the figure, we can observe that the ROC curves of sym2 wavelet compressions are very close to that without wavelet compression. For the others, the deviations become noticeable quickly as the ranges increase. This observation also proves our conclusion in the above subsection once again. 6
Conclusions
We applied wavelet compression in network intrusion detection monitoring data. Our results showed that wavelet compression improves the efficiency of the representation of PDFs for use in statistical network intrusion detection systems. All four wavelets we tested on HIDE, maintain stable performance at compression range 3, thus improving system efficiency by two to three-fold. The Sym2 wavelet algorithm performed best, by effectively compressing the HIDE PDFs from 64 bins
341 to only 6 wavelet coefficients, thus resulting in a compression ratio of 10.67, without major performance deterioration. Acknowledgements. This research acknowledges support by a Phase I and II SBIR contract with US Army and OPNET Technologies, Inc.™, for partially supporting the OPNET simulation software.
References 1.
G. Vigna, R. A. Kemmerer, NetSTAT: a network-based Intrusion Detection Approach, Proceedings of I4tn Annual Computer Security Applications Conference, 1998, pp. 25 -34. 2. W. Lee, S. J. Stolfo, K. Mok, A Data Mining Framework for Building Intrusion Detection Models, Proceedings of 1999 IEEE Symposium of Security and Privacy, pp. 120-132. 3. Joao B.D. Cabrera, B. Bavichandran, R.K. Mehra, Statistical Traffic Modeling for Network Intrusion Detection, Proceedings of 8™ International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication systems, Aug. 2000, pp. 466-473 4. Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, J. Ucles, A Hierarchical Anomaly Network Intrusion Detection System Using Neural Network Classification, CD-ROM Proceedings of 2001 WSES International Conference on: Neural Networks and Applications (NNA '01), Feb. 2001 5. Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, J. Ucles, Neural Networks in Statistical Intrusion Detection, accepted by the 5tn World Multiconference on Circuits, Systems, Communications & Computers (CSCC2001), July 2001 6. Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, J. Ucles, HIDE: a hierarchical network intrusion detection system using statistical preprocessing and neural network classification, accepted by the 2n" Annual IEEE Systems, Mans, Cybernetics Information Assurance Workshop, June 2001 7. Z. Zhang, C. Manikopoulos, J. Jorgenson, J. Ucles, HIDE, A Network Intrusion Detection System Utilizing Wavelet Compression, submitted to the Eighth ACM Conference Computer and Communication Society, Nov. 2001 8. R. Todd Ogden. Essential Wavelets for Statistical Applications and Data Analysis, Birhauser: Boston (1997). 9. Stephane Mallat. A Wavelet Tour of Signal Processing, second edition.. Academic Press, New York (1999). 10. Ingrid Daubechies. Ten Lectures on Wavelets, SI AM Philadelphia, PA (1992). 11. David Donoho, Mark R. Duncan, et al, WAVELAB 802 for Matlab 5.x, http://www-stat.stanford.edu/~wavelab/.
Information Technology/Linguistics
345 THE EMERGING CHALLENGE OF RETAINING INFORMATION TECHNOLOGY HUMAN RESOURCES
RICK GIBSON, Ph.D. American University, 4400 Massachusetts Ave. Washington DC 20016 USA E-mail: [email protected] The responsibilities and functionalities of the Information Technology (IT) department are spreading rapidly and becoming more involved in all aspects of businesses. As a consequence, the process of attracting and retaining skilled IT workers has become increasingly important. Due to the fact that the demand for skilled IT workers exceeds supply, the shortage and turnover problems that occur when skilled IT workers cannot be retained place companies at a competitive disadvantage. This report explores retention methods that have been effective in retaining IT workers.
1
Introduction
For some time now the demand for Information Technology (IT) workers has outpaced the supply. Bort [1] estimated that for the next seven years, there will be 95,000 new IT job positions yearly, but there will be only 45,000 graduates with ITrelated degrees. As a result, most companies are currently faced with a shortage of qualified IT professionals. Additionally, companies must resolve operational and personnel issues resulting from the uniquely high turnover among IT employees. A survey of IT managers revealed that a 15% to 20% annual turnover is now considered average in IT shops [2]. Clearly, organizations with a high reliance on IT professionals need to examine the process of attracting and retaining skilled IT professionals. The situation in the public sector is even worse. According to Hasson [3], in the next few years 50% percent of the Federal IT workforce will reach retirement age. Annual salary surveys reveal further problems. Almost 17,000 IT professionals are satisfied with their jobs, to include 45% of staff and 54% of managers who are satisfied with their pay. The second concern is finding interesting work, getting away from the current company's management culture, flexible work schedules and job stability. Of the 17,000 IT professionals surveyed 45% say their company is good or excellent at attracting talent, only 38% say they know how to retain employees and 2% say their companies are poor or unsatisfactory at retaining employees. Thus, the purpose of this research is to investigate ways for organizations to retain a critical mass of IT employees by addressing several questions. What techniques are being used by the private sector for recruiting IT professionals? What are IT professionals looking for in an IT job? What changes have been
346
implemented to retain IT professionals? Can some of the private sector's techniques for recruiting IT professionals be implemented for the federal government? 2
Methods
In order to answer the questions posed, a descriptive study was chosen for this research. To keep the question focused on finding various solutions, a few investigative questions were developed: (1) What are IT professionals looking for in an IT job, (2) What techniques are being used by the private sector for recruiting IT professionals, (3) What changes has the federal government implemented to recruit IT professionals, and (4) Can some of the private sector's techniques for recruiting IT professionals be implemented for the Federal government. The gathering of the data began with creating a list of IT websites, searching the periodicals online, computer publications websites, newspaper articles and reviewing journal articles. 3
Overview
Organizations have always been concerned about losing their most skilled managerial employees. Technological advances have created a work environment where the skills needed to operate a business require the science of IT to accompany the art of management. These recent technological advances have created an environment in which management level employees are no longer the sole holders of the skills that are vital to the daily operations of a business. Moreover, in contrast to the more mature and experienced members of management, younger workers are dominating the IT workforce. In a TechRepublic Salary Survey [4] report on the average ages of IT professionals, the average age for the majority (32%) of IT professionals are in the 18-25 range, with 25% in the range of 26-35 years of age. Most IT executives and consultants (25%) fall in the 36-55 age range. Today's youthful IT worker has a different attitude regarding worker-employee relationships than previous generations of workers. They are more independent and confident in themselves, their training and skills. They are not afraid to seek new employment as often as every fifteen to eighteen months. A recent survey by Yakelovich Partner reports that 73 percent of all young IT employees said that they could easily find a new job if they had to [5]. A recent study [6] concluded that the top perceived or real factors that increase the turnover rate of IT personnel are various combinations of the following: Boredom or lack of challenge, limited opportunities for growth, low expectations or standards, inferior or ineffective co-
347
workers, lack of leadership, poor supervision, Inflexible work hours, noncompetitive compensation package, commute distance and time. 4
Discussion
In a survey of the retention practices at over 500 high technology companies, American Electronic Association lists the top 10 retention techniques by degree of effectiveness as following [7]. 1. Challenging work assignments 2. Favorable work environment 3. Flextime 4. Stock options 5. Additional vacation time 6. Support for career/family values 7. Everyday casual dress code 8. High-quality supervision and leadership 9. Visionary technical leadership 10. Cross-functional assignments; tuition and training reimbursement; 401(k) matching. For discussion purposes, these effective retention strategies can be categorized into the following four main components; compensation, career advancement, participation relationship with management and positive work environment. 4.1 Compensation Employees should be able to negotiate on issues such as salary, bonus and stock options. For IT professionals, the trend for salaries has been an increase in all positions from the previous year. Despite having salaries that are usually higher than the national average, some IT professionals are not satisfied with their salaries. When various IT workers were surveyed, only 41% reported that they are fairly compensated. This leaves 59% of IT professionals who feel that they are underpaid. The average of dissatisfaction for network administrators was 66.3%, help desk professionals was 61.7% and IT trainers was 60.3%. [8]. Stock options are among is the largest motivators as it relates to compensation [9]. Because stock offerings can amount to more than two thirds of a company's total compensation package, the potential value of the stock is very important. Each company must determine the appropriate amount of stock options to grant. Stock offerings can vary according to the industry and company size. Hence, compensation packages can vary. Executive search firms specializing in competitive compensation analysis assists companies in determining the most appropriate and effective compensation package for each position.
348
Some of the top incentives that businesses are using to reward employees for exceptional work include annual bonuses, competency-based pay, quality-based pay, retention bonuses, profit sharing and tuition reimbursement. However, the components of a compensation package are usually not the highest concern for many IT professionals. This is especially true in regards to today's young IT professional, for whom money is not the best retention tool. They need incentives to which they respond positively, e.g., a visible level of independence, a flexible schedule or closer peer-to-peer relationship with their managers. 4.2 Career Advancement Another effective retention technique involves career advancement opportunities offered to employees. Employees expect the ability to move laterally, e.g., between departments or assignments, within a company. Moreover, they hope to get the opportunity for vertical advancement, i.e., to get promoted to a higher position. Training programs led by the executives of companies such as mentoring and leadership seminars are perceived as positive incentives for retaining employees. Employees often look for value-added programs that will improve their career. Companies that provide these programs are perceived as companies committed to the professional advancement of their employees. Training can be provided through university, community college or self-study courses, seminars, computer-based training or various other interactive methods [10]. Although providing training is one way to keep employees on the job and increase their value to companies, the fear exists that newly trained employees might leave for new jobs after obtaining valuable skills or certifications [11]. Therefore, the training sessions must be associated with a sense of company trust and familiarity that will develop a strong company loyalty. Training reimbursement agreements are usually employed to protect a company's investment. 4.3 Participation Relationship With Management Communication between the management and employees are also important in the retention strategies. Employees are more loyal if they feel "connected" with the company. They need to know that their opinions matter and that management is interested in their input. This includes that employee are part of decision making process. The leadership team of a company, including the CIO and other senior executives, should be directly responsible for applying retention strategies and acting as cultural role models. They need to know how to use various retention methods and how they can be combined those methods to solve the particular problems. Determination should be made regarding the various types and levels of training, benefits and compensation packages for employees [12].
349
In addition, basic human sociological factors need to be considered when designing retention strategies. It should also be recognized that sometimes employees leave a company because of poor leadership within the company [13]. Also, personal or professional conflict between employees and supervisors can cause IT professionals to leave their jobs. Finally, Sohmer [14] believes that smart leadership is often the key to success. He states that management should vary but be consistent in retention offerings and look to use the right strategy to retain nonmanagement level IT professional. 4.4 Positive Work Environment A compatible corporate culture and environment is key to attracting and retaining employees. It is important for all employees to believe in the vision and goals of the company while also feeling comfortable and even passionate about the company for which they work. Examples of corporate culture include the allowance of causal dressing, flexible working hour or free entertainment provided for the staff. Employees often say that they have insufficient time to meet all work and family responsibilities. This time constraint causes a great deal of stress for employees. Some companies address this concern by combining various non-traditional methods, such as flextime, telecommuting and compressed workweeks into traditional work schedules. Empowering employees to productively manage their schedule is a powerful tool in an effective retention strategy. Location, as an environmental factor, is also a way to attract and retain employees. Companies should carefully choose their sites for various groups of employees. For example, a research and development operation in Silicon Valley might be useful, in order to tap into cutting-edge thinking. But a research and development project with a long lead-time would have a high turnover rate, because the skills of the development team would be in high demand. A better location would be in a rural community. Trying to get people to relocate to remote regions poses challenges. Another way private industry recruits IT professionals is by petitioning for H-1B visas. An H-1B nonimmigrant visa may be used to bring a temporarily foreign worker or professional into the United States for a specialty occupation or a professional position. To qualify for an H-1B visa you must hold a bachelors degree or its equivalent or higher. Private industry was given new hope by new legislation that significantly increased the number of new visas available to foreign workers sought by the U.S. high-tech industry. Along with the increase number of new visas, the filling fee for each H-1B visa application will increase to be used for education and training programs for U.S. citizens.
350 5
Conclusions
This paper examined several methods that have proven to be effective in IT employee retention. It appears that to be effective retention strategies must rely on a combination of methods. Companies adopt and use a variety of methods to retain employees. It is important to have a balance of programs, processes and cultural standards that are attractive to as many employees as possible in all positions. Fitzharris [15] reports on General Electric's appliance group use of a combination of three methods for retention: salary, career opportunities and recognition. Implementing these three methods reduced the turnover rate from 11 percent to 3.1 percent. Another approach comes from the Hay Group, a Washington, D.C., based human resource consultancy, which uses three types of rewards and incentives to retain our people: money, career advancement and a positive work environment. These three methods produced reduction in the turnover rate of at least 30 percent. Further, the following guidelines should be emphasized: A single approach cannot be employed for every situation. Compensation and benefit packages are necessary, but not sufficient elements in the retention of employees. • Competitiveness and fairness of compensation and benefit packages compared with the labor market could create the feeling of equity among employees. • The career path is the one of the factors that we cannot serves to maintain the challenge in daily work. • Overlooked factors such as leadership and management style, the corporate culture, and flexible hours are effective in ensuring loyalty to a company. A key factor to be considered as effective retention tool is the direct involvement of an IT management team. This team, composed of the IT manger and the individual in charge of retention efforts, must collaborate with the Human Resources department to develop effective retention strategies. In constructing these strategies, attention must be focus on both performance management of the employees and relationship strategies between management and non-management. • •
References 1. Bort, J. (2000). Mining for high-tech help. Coloradobiz; Englewood, 27, 48-56. 2. Diversity best practices. (2001). Retention. Wow!fact2001 [Online]. Available: http://www.ewowfacts.com/wowfacts/chap39.html 3. Hasson, J. (2000,April). Aging work force alarms CIO. [online]. Available: http://www.fcw.com. (January 20, 2001).
351 4. TechRepubublic, The SalarySurvey2000, The It world: secure, diverse, and comfortable [Online]. Available: http://www.TechRepublic.com 5. Sohmer, S. (2000). Retention getter. Sales and Marketing Management, New York, 152, 78-82. 6. Christian & Timbers (2000). Retention starts before recruitment [Online]. Available:http://www.ctnet.com/ctnet/university/client/retention/retention3.htm 1 7. Sparks, R. (2000,May). Ideas for recruiting and retaining quality employees. Creating Quality, 9 [Online].Available: http ://outreach .m i ssouri .edu/c.. ./cq_m ay 00_ideas%20for%20recruiting%20and %20retaining.htm 8. Thibodeau, P. (2001, January 15). Survey: Above all else, IT workers need challenge, Computerworld [Online]. Available: http://www.computerworld.eom/cwi/story/0,1199 ,NAV47_STO56335,00.htrnl 9. Fitzharris, A.M.(1999, June 24). Balancing employee's professional and personal lives. TechRepublic [Online]. Available: http://techrepublic.com/article.jhtml?src=search&id=r00619990624maf01.htm 10. Meyer, B. (2000,October 9). Providing training is key to retaining IT employees [Online]. Available: http://houston.bcentral.com/houston/stories/2000/10/09/focusl0.html 11. Tseng, W. (2000). [Interview with Mr. Eric Schemer]. Report of Information Governance. 12. Frankel, A. (1998, January 1). Retaining "playing for keep". CIO Magazine [Online]. Available: http://www.cio.com/archive/010198Joyalty.htm 13. Christian & Timbers (2000). Retention starts before recruitment [Online]. Available: http://www.ctnet.com/ctnet/university/client/retention/retention3.html 14. Sohmer, S. (2000). Retention getter. Sales and Marketing Management, New York, 152, 78-82. 15. Fitzharris, A.M.(1999, June 24). Balancing employee's professional and personal lives. TechRepublic [Online]. Available: http://techrepublic.com/article.jhtml?src=search&id=r00619990624mafDl.htm
353
HARD-SCIENCE LINGUISTICS AS A FORMALISM TO COMPUTERIZE MODELS OF COMMUNICATIVE BEHAVIOR BERNARD PAUL SYPNIEWSKI Rowan University - Camden, Broadway and Cooper Street, Camden, NJ 08102 USA E-mail: [email protected] Hard Science Linguistics (HSL) is a new linguistic theory, first worked out in Yngve [1], developed from the insights into human language gained during the history of machine translation and similar efforts. Unlike most linguistic theories, HSL concerns itself with the details of how people communicate rather than on how sentences are parsed. From the start, HSL developed with an eye toward making its results scientifically valid, not in the sense of other linguistic theories but in the sense of physics or chemistry. Both the historical basis in machine translation for HSL and the attention paid to a scientifically valid formalism make HSL an attractive candidate for the development of large-scale computer models of communicative behavior. In this paper, I will use some "mixed domain" terminology in order to more quickly explain HSL in the space available.
1
Introduction
The Cold War need to translate large volumes of scientific and military publications into English spurred the earliest attempts at developing machine translations systems. Initial attempts concentrated on the grammar of sentences and began with word for word translations. Problems with this and similar approaches appeared in short order Yngve [2]. Most approaches to machine translation have assumed that translation is exclusively a problem of language to be addressed grammatically Barr, Cohen and Feigenbaum [3]. While researchers acknowledged that context was a difficult problem that needed to be solved, context was mostly seen as grammatical context. Though some researchers and philosophers, such as Austin [4] recognized the behavioral elements in the problem of context, they often overlooked the consequences of these elements. Most linguistic theories tacitly assume that language can be studied by itself, without reference to the societal matrix in which it exists. Linguistics generally treats language understanding as equivalent to the understanding of grammar. Artificial intelligence has adopted this outlook. While there is much to be said for natural language processing systems that understand the construction of sentences, we should not confuse these systems with systems that try to understand how language is used by human beings in their everyday lives. Our systems must understand more than grammar. Most linguistic theories do not provide us with an understanding of anything other than grammar. Despite the success of Generative Transformational Grammar (GTG), linguistics does not have a sound scientific basis. Linguistic discourse is
354
philosophical discourse with roots in the ancient Aristotelian and Stoic grammatical and logical traditions. Most linguists do not produce scientifically testable results because general linguistics does not provide a scientifically sound formalism; indeed, while many linguists pay lip service to the need to make linguistics scientific, there is no general expectation that linguists will produce scientifically acceptable results. HSL consciously developed a formalism that could produce scientifically sound linguistic results. One of the more controversial themes of HSL is the de-emphasis of the importance of grammar, what HSL refers to as the "linguistics of language". HSL models "communicative behavior", i.e., language in a social context. Language, for HSL, is purposeful rather than merely grammatical. HSL provides a method for unifying traditional linguistic and extra-linguistic issues in a scientifically acceptable way. In its brief history, HSL has been used to model complex social phenomena such as business negotiations Brezar [5], ethnic stereotypes, Cislo [6], the analysis of textbooks, Coleman [7], Czajka [8], and criminal litigation, Sypniewski [9] as well as more traditional linguistic concerns such historical linguistic change, Mills [10], Malak [11] fillers, and Rieger [12]. 2
Some Methods and Tools Provided by Hard-Science Linguistics
HSL provides the researcher with a number of tools to describe the interaction between individuals and their environment. Briefly, some of those tools are: 1.
2.
3.
4.
An individual may interact with other individuals by playing a role part in a linkage. A linkage is a theoretical framework in which communicative behavior takes place. A role part is a description of the linguistically relevant behavior that an individual performs in a particular linkage. An individual may play a role part in several different linkages, which may or may not overlap in space or time. For example, an individual may be both a father and a little league coach at the same time. The role parts exist in different linkages that interact while a little league game is in progress. Every linkage has a setting. A setting is a description of the linguistically relevant environment in which a linkage exists. Settings may have props, linguistically relevant objects. For example, the amount of feedback in an auditorium's sound system may affect the communicative behavior of speakers on stage. Linkages have tasks, which, in turn, may have subtasks. Tasks and subtasks are descriptions of linguistically relevant behavior, somewhat analogous to functions in computer programming. Individuals have properties that may be affected by communicative behavior or that may have an effect on the communicative behavior of others. The loudness
355
of a speaker's voice or the speaker's command of the language of the audience may be reflected in properties of the speaker or listener's role part. 5. HSL uses its own notation (procedures, properties, and other elements) to construct plex structures. Plex structures describe the building blocks of an HSL model. The researcher models people or groups communicating among themselves along with their relevant setting(s) by creating a linkage, enumerating its participants and describing their role parts, describing the sequence of tasks and subtasks that must take place, describing the setting in which the linkage exists, and the relevant properties of the role parts and setting. HSL insists that all models be based on observable communicative behavior stated so that the results of the model accurately predict behavior in the real world in a reproducible way. 3
Implications for Computer Science
HSL allows the modeler to develop a model of arbitrary complexity. Furthermore, the modeler is not restricted to describing language. HSL is based on the scientific principle that the world has structure. An HSL model of communicative behavior is more complex than any model based on any other linguistic theory. The payback from this complexity is substantial. Communicative behavior becomes more manageable, the findings more justifiable, and the model more reflective of the real world. Since HSL sees the world in terms of properties, structures, and function-like tasks, a thoroughly developed model may be easily ported to an appropriate computer language. A structured model of communicative behavior resembles familiar paradigms in computer science. Linkages may be modeled by interacting classes, with each class representing a task or subtask. It may even be possible to use the Unified Modeling Language to move a model from HSL to the computer. The event-driven programming paradigm may be able to express some of the dynamism inherent in HSL. This is still controversial among HSL workers because of the type of model HSL creates. Professor Yngve believes that it will be difficult to adequately model the parallelism of complex HSL models on a serial computer. Because HSL is in its infancy, this remains an experimental question.
4
Discussion of the Current Attempts to Build SIMPLEX
In the mid-1980s, Victor Yngve, then at the University of Chicago, began to develop a simulator called SIMPLEX for his linguistic theories, later to become HSL. Because of the size and capabilities of contemporary machines, Professor Yngve decided to write SIMPLEX in FORTH. SIMPLEX remained incomplete, partially because the underlying linguistic theory needed further development.
356
Presently, Professor Yngve, I, and others have resurrected SIMPLEX and intend to develop it beyond its mid-1980s incarnation. We intend to continue using FORTH for three reasons. First, we will be able to use the code already written since the basic FORTH language has retained stable. Second, the American National Standards Institute (ANSI) standardized FORTH after SIMPLEX was written. ANSI FORTH is now file-based, rather than block-based, as were the original FORTHs. Cross-platform development will thus be simpler using ANSI FORTH. Third, and most important, HSL is now a fully developed theory. We now have a goal for SIMPLEX. SIMPLEX will be both a program and a programming language that will process plex structures. Because HSL models the real world and not just the grammar of sentences, HSL provides a methodology for representing parallel tasks. One of the reasons that FORTH proves useful is that FORTH originated as a computer language to handle multiple synchronous and asynchronous tasks. Accurately representing parallel tasks is one of the biggest challenges for SIMPLEX and one of its biggest potentials. At the time this paper was written, SIMPLEX is still in its infancy. However, the development of SIMPLEX is substantially advanced even though the computer code is, roughly, where it was in the mid-80s. We now have a complete formalism and methodology to model; this was not the case at the time that the original SIMPLEX was written. We are currently porting code from the old block structure to a file structure. In the process, we are testing various FORTHs on different platforms to identify compatibility problems. Because FORTH is a very efficient language, it is likely that SIMPLEX will run on machines that are significantly less powerful than today's desktop standard. We are testing different FORTHs on different platforms in order to determine the minimal configuration needed for SIMPLEX. There is significant interest in HSL in countries where state of the art computing equipment might not always be available. Some preliminary tests with the original version of SIMPLEX show that SIMPLEX may prove useful on Palm Pilots and similar devices. Our goal is to create a cross-platform computer program that will accept files of plex structures, analyze them, and simulate them on a desktop computer. A researcher will then be able to see the model in action and modify it whenever necessary. Once we finish porting SIMPLEX from block-oriented to file-oriented FORTH, we will begin developing sections of the simulator that will process specific HSL structures. SIMPLEX, when fully developed, will become a major tool for HSL researchers who wish to verify the plex structures and findings that they have developed. 5
Acknowledgements
I wish to thank Victor Yngve for his critical review of the manuscript.
357
References 1. 2.
3.
4. 5. 6. 7. 8. 9.
10.
11. 12.
13.
Yngve, Victor H., From Grammar to Science. (John Benjamins, Philadelphia, PA, 1996). Yngve, Victor H., Early MT Research at M.I.T. - The search of adequate theory. In Early Years in Machine Translation, Amsterdam Studies in the Theory and History of Linguistic Science, vol. 97 ed. by W. John Hutchins, (John Benjamins, Amsterdam/Philadelphia, 2000) pp. 38-72. Barr, Avron, Cohen, Paul R. and Feigenbaum, Edward A., The Handbook of Artificial Intelligence, vol. 4 (Addison-Wesley, Reading, MA 1989) pp. 223237. Austin, J. L., How to Do Things with Words, 2nd ed. (Harvard U. P., Cambridge, MA, 1975). Brezar, Mojca Schlamberger, A Business Negotiation Analysis in the Scope of Hard-Science Linguistics, In Yngve and Wajsik [13]. Cislo, Anna, The Victorian Stereotype of an Irishman in the Light of Human Linguistics, In Yngve and Wa_sik [13]. Coleman, Douglas W. Data and science in Introductory Linguistics Textbooks, Paper presented at the LACUS Forum XXVII, Houston 2000. Czajka, Piotr, Human Needs as Expressed in Educational Discourse on the Basis of Textbooks in Linguistics, In Yngve and Wa_sik [13]. Sypniewski, Bernard Paul, A Hard Science Linguistic Look at Some Aspects of Criminal Litigation in Contemporary New Jersey. Rowan University-Camden Campus, ms. Mills, Carl, Linguistic Change as Changes in Linkages: Fifteenth-Century English Pronouns, Paper presented at the LACUS Forum XXVII, Houston 2000. Malak, Janusz, Mayday or M'aider. A Call for Help in Understanding Linguistic Change, In Yngve and Wajsik [13]. Rieger, Caroline L., Exploring Hard Science Linguistics: Fillers in English and German Conversations, Paper presented at the LACUS Forum XXVII, Houston 2000. Yngve, Victor H. and Wa_sik, Zdzistaw (eds.), Exploring the Domain of HumanCentered Linguistics from a Hard-Science Perspective (Poznan, Poland: School of English, Adam Mickiewicz University, 2000).
359 B-NODES: A PROPOSED NEW METHOD FOR MODELING INFORMATION SYSTEMS TECHNOLOGY STANISLAW PAUL MAJ AND DAVID VEAL Department of Computer Science, Edith Cowan University, Western Australia, 6050. E-mail: [email protected], [email protected] There are many rapid developments in the technologies upon which information systems are based. In order to help describe and define these technologies there exist a wide variety of modeling techniques. However this wide range of techniques is in itself problematic and it is recognized that a new higher level of abstraction is needed. A new high-level modeling technique is proposed that can be used to control the technical complexity of information systems technologies. This new method, called B-Nodes, is a simple, diagrammatic, and easy to use method. The model employs abstraction and hence it is independent of underlying technologies. It is therefore applicable not to current and old technologies but is likely to be valid for future technological developments. It is a scalable modeling method that can be potentially used for both small systems (e.g. a PC) and a global information structure. It allows recursive decomposition that allows detail to be controlled. The use of fundamental units allows other more meaningful units to be derived. Significantly therefore the derived units may be used to more accurately specify hardware performance specifications. The model has been successfully used as the pedagogical framework for teaching computer and network technology. Results to date indicate it can be used to model the modules within a PC (microprocessor, hard disc drive etc), a PC, a Local Area Network and an e-commerce web site.
1
Introduction
Computer and network technologies underpin the IT industry. Furthermore many information systems, such as e-commerce web sites, are global in nature. In this type of application there is, in effect, a contiguous link between a client accessing a web page and all the technologies that link that client with data that may be stored on a hard disc drive on another part of the globe. The quality of service of a global IT system depends therefore on the performance of a wide range of heterogeneous devices at both the micro and macro level. At the micro level the performance of a PC depends upon the technical specification of its component modules (microprocessor, electronic memory, network interface card etc). At a higher level of abstraction the PC may be functioning as a server on a Local Area Network (LAN). In this case the performance of the LAN depends upon the operational characteristics of the PC (as a complete device) and the associated networking devices such as hubs and switches. At a macro level a collection of different servers (web-server, application-server, payment-server etc) may be located in a LAN and connected to the Internet. In order to control this complexity a wide range of
360
modeling techniques are used which is in keeping with the ACM/IEEE Computing Curricula 1991 in which abstraction is a recurring concept fundamental to computer science [1]. Semiconductor switching techniques and modelling provides an abstraction that is independent of the underlying details of quantum mechanics. Similarly digital techniques and modelling provide a higher-level abstraction that is independent of the underlying details of semiconductor switching. Such combinational or sequential digital circuits can be described without the complexity of their implementation in different switching technologies e.g. TTL, CMOS, BICMOS etc. Computer and network technology can therefore be described using a progressive range of models based on different levels of detail (e.g. semiconductors, transistors, digital circuits) each with their own different performance metric. However, there appears to be no simple modeling technique that can be used to describe and define the different heterogeneous technologies within a PC. The use of benchmarks to evaluate performance at this level is subject to considerable debate. Similarly, from an IT perspective, a range of different models is used. A business model is used to define the purpose of an e-business. The functional model defines the e-commerce web navigational structure and functions. Customer models are used to define the navigational patterns of a group of customers that may be used to quantify the number and type of customers and the associated request patters - all of which may be used to define an e-commerce workload. Again a wide range of performance metrics is used and includes: hits/second, unique visitors, revenue, page views/day etc. All these different models are designed to progressively hid, and hence control detail, and yet provide sufficient information to be useful for communication, design and documentation. But this wide range of different modeling techniques (from digital systems to customer models) and associated metrics is in itself problematic. Ultimately a global, e-commerce business is a contiguous system and should if possible be modeled as such. The use of a single modeling technique may help to control the technical complexity but also allow the use of a single performance metric, from which other metrics may be derived.
2
Modeling
The principles of modeling were reviewed in order to obtain the required characteristics of models. Models are used as a means of communication and controlling detail. Diagrammatic models should have the qualities of being complete, clear and consistent. Consistency is ensured by the use of formal rules and clarity by the use of only a few abstract symbols. Leveling, in which complex systems can be progressively decomposed, provides completeness. According to Cooling [2], there are two main types of diagram: high level and low level. Highlevel diagrams are task oriented and show the overall system structure with its major sub-units. Such diagrams describe the overall function of the design and interactions between both the sub-systems and the environment. The main emphasis is 'what
361 does the system do' and the resultant design is therefore task oriented. According to Cooling, 'Good high-level diagrams are simple and clear, bringing out the essential major features of a system'. By contrast, low-level diagrams are solution oriented and must be able to handle considerable detail. The main emphasis is 'how does the system work'. However, all models should have the following characteristics: diagrammatic, self-documenting, easy to use, control detail and allow hierarchical top down decomposition. By example, computer technology can be modeled using symbolic Boolean algebra (NOR, NAND gates). At an even higher level of abstraction computer technology can be modeled as a collection of programmable registers. Dasgupta suggested computer architecture has three hierarchical levels of abstraction [3]. A model for describing software architectures was introduced by Perry and Wolf that consists of three basic elements - processing, data and connecting [4]. On this basis various architectural styles exist that include: Dataflow, Call & Return, Independent Process, Virtual Machine, Repository and Domain Specific. Each model is valid. According to Amdahl, 'The architecture of a computer system can be defined as its functional appearance to its immediate users. '[5] However, computer design and manufacture has changed significantly. The PC is now a low cost, consumer item with a standard architecture and modular construction. Two studies by Maj in Australia [6] and Europe [7] found that in both cases the computer and network technology curriculum failed to provide the basic skills and knowledge expected by both students and potential employers. Furthermore, there is considerable unmet potential demand from students of other disciplines (e.g. multimedia) for instruction in computer technology [8] due to the perceived lack of relevance of the current computer technology curriculum. According to the 1991 ACM/IEEE-CS report, 'The outcome expected for students should drive the curriculum planning' [1]. Significantly the current modeling methods used for computer and network technology may no longer be appropriate. Clements comments, 'Consequently, academics must continually examine and update the curriculum, raising the level of abstraction' [9]. 3
Bandwidth Nodes
A new high-level modeling technique called Bandwidth Nodes (B-Nodes) has been proposed [10]. Each B-Node (microprocessor, hard disc drive etc) can now be treated as a quantifiable data source/sink with an associated transfer characteristic (Mbytes/s). This approach allows the performance of every node and data path to be assessed by a simple, common measurement - bandwidth. Where Bandwidth = Clock Speed x Data Path Width with the common units of Mbytes/s. This is a simple, diagrammatic, and easy to use method mat can be used to model different technologies. The heterogeneous nature of the nodes of a PC is clearly illustrated by the range of measurement units used varying from MHz to seek times in milliseconds. Evaluation of these different nodes is therefore difficult. However, it is
362
possible to compare the performance of different nodes using the common measurement of bandwidth in Mbytes/s. The Pentium processor has an external data path of 8bytes with maximum rated clock speeds in excess of 400Mhz giving a bandwidth of more than 3200Mbytes/s. Dual In Line Memory Modules (DIMMs) rated at 60ns (16MHz) with a data path width of 8 bytes have a bandwidth of 128Mbytes/s. The data transfer rate for a hard disc drive can be calculated from the sector capacity and rotational speed (data transfer rate = sector capacity x sectors per track x rps). Typical figures are in the range of 5 Mbytes/s. Modem performance is typically measured in Kbits/s which can be converted to Mbytes/s or Frames/s. CDROM performance is quoted in speeds e.g. x32 speed where single speed is 150kbytes/s. CDROM speeds can easily be converted to Mbytes/s or Frames/s. According to Mueller[ll], the maximum transfer rate of a bus in MBytes/s can be calculated from the clock speed and data width. Significantly, a common performance metric (Mbytes/s) is used thereby allowing the relative performance of the different heterogeneous technologies to be easily evaluated (Table 1). Table 1: Bandwidth (Mbytes/s) Device
Clock (MHz)
Speed
Processor DRAM Hard Disc CROM ISA Bus
400 16 60rps
8 8 90Kbytes
8
2
Data (Bytes)
Width
Bandwidth (Mbytes/s) B = CXD 3200 128 5.2 4.6 16
B-Nodes typically operate sub-optimally due to their operational limitations and also the interaction with other B-Nodes. The simple bandwidth equation can be modified to take this into account i.e. Bandwidth = Clock x Data Path Width x Efficiency (B = C x D x E) with the units MBytes/s [10]. The Pentium requires a memory cycle time of 2-clock cycles i.e. the 2-2 mode (Efficiency = 1/2) for external DRAM [12]. However, if the memory cannot conclude a read/write request within this clock period additional clock cycles may be needed i.e. wait states. Each wait stare reduces the efficiency factor accordingly. For efficient data access burst mode is possible during which transfers can be affected by an initial 2 clock cycles and subsequent transfers needing only 1 clock cycle. The restrictions are an upper limit of 2-1-1-1 for the READ operation. The efficiency is therefore 4/5 i.e. 4 transfers in 5 clock cycles (Table 2).
363
Table 2: Pentium Mode
C
D
(MHz)
(Bytes)
E
Bandwidth (MBytes/s) =CxDxE
2-2
100
8
V4
400
lWait
100
8
1/3
266
Burst
100
8
4/5
640
The ISA bus operates at 8MHz with a data width of 2 bytes. However at least 2 clock cycles are needed i.e. E = 1/2. Each wait state reduces the efficiency accordingly (Table 3). Table 3: ISA Bus Mode
C
D
(MHz)
(Bytes)
E
B (Mbytes/s) =CxDxE
2-2
8
2
'/2
8
lWait
8
2
1/3
5
The Peripheral Component Interconnect (PCI) bus is a 32 bit bus but operating at frequency of 33Mhz. The PCI bus uses a multiplexing scheme in which the lines are alternately used as address and data lines. This reduces the number of lines but results in an increased number of clock cycles needed for a single data transfer. Each wait state reduces the efficiency accordingly. However the PCI bus is capable of operating in unrestricted burst mode. In this mode after the initial 2 clock cycles data may be transferred on each clock pulse. In this case E tends to unity (Table 4).
364 Table 4: PCI Bus Mode
C
D
(MHz)
(Bytes)
E
B (MBytes/s) =CxDxE
Write
33
4
l
/2
66
lWait
33
4
1/3
44
Burst
33
4
1
133
Using B-Nodes it is possible to model a spectrum of PCs ranging from those based on the first generation processor (8088, 8 bit ISA, floppy disc drive etc) through to those based on the latest fifth generation processors (Pentium, PCI, AGP etc). The use of the fundamental units of Mbytes/s allows other, more user oriented units to be derived. Bandwidth Nodes (B-Nodes) have been used as the pedagogical framework for computer and network technology curriculum and evaluated. According to Maj [10], advantages to using this pedagogical model include: • Students can perceive the PC as a unified collection of devices •
Node performance, measured in bandwidth (Frames/s) is a user based, easily understood measurement
•
The units Mbytes/s and Frames/s use a decimal scaling system
•
Students are able to evaluate different nodes of a PC by means of a common unit of measurement
•
Students can easily determine the anticipated performance of a PC given its technical specification
•
Students are able to critically analyze technical literature using this integrating concept.
•
The model is suitable for students from a wide range of disciplines (Computer Science, Multimedia, IT, Business IT)
•
The model is valid for increasing levels of technical complexity.
•
Nodes are independent of architectural detail
365
The model employs abstraction and hence it is independent of underlying technologies. It is therefore applicable not to current and old technologies but is likely to be valid for future technological developments. It is a scalable modeling method that can be used for digital systems, PC modules and a small LAN [13]. 4
A B-Node Model of an E-Commerce Web-Site
A range of different models is used for e-business web sites. The business model is used to define the business directions and objectives for a given level of resources. The business model itemizes the trading processes that can then be used as the basis of a functional model to specify e-commerce web navigational structures and functions. Customer models such, as the Customer Behavior Model Graph (CBMG) is a server-based characterization of the navigational patterns of a group of customers that may be used to quantify the number and type of customers and the associated request patters - all of which may be used to define an e-commerce workload [14]. A wide range of performance metrics is used and includes: hits/second, unique visitors, revenue, page views/day etc. The workload in conjunction with the resource model of hardware and software ultimately must be able to clearly define the site performance, which is used to specify a Service Level Agreement (SLA). Assume an e-commerce web site consists of a collection of servers (web server, application server, payment server etc) on an Ethernet LAN. This configuration can be modeled using CBMG and Client Server Interaction Diagrams (CSID's) in order to obtain the probability of message traffic between the different servers. Given the size of the messages then an approximation can be made about the performance of the LAN. Furthermore if the servers are located on two different LAN's it is possible to calculate the message delays and again the expected performance of this architecture. However the functional and customer models use a range of different metrics, which in turn differ from those used to specify server architecture. It is therefore difficult to directly translate the performance specification measured in page views/day to the required specification of for example a hard disc drive in the server. However, if a web server is modeled as a B-Node then the performance metric is bandwidth with units of Mbytes/s. The sub-modules of a server (microprocessor, hard disc, electronic memory etc) and also be modeled as B-Nodes, again using the same performance metric. The use of fundamental units (Mbytes/s) allow other units to be derived and used e.g. transactions per second (tps). Assuming the messages in a client/server interaction are lOkbytes each, the performance of each B-Node can be evaluated using the units of transactions/s (Table 5)
366
Table 5: Bandwidth (Utilization) Device
Bandwidth (MBytes/s)
Bandwidth (Tps)
Load (Tps)
Utilization
Processor DRAM Hard Disc CROM ISA Bus Ethernet
1600 64 2.7 2.3 4 11.25
160k 6.4k 270 230 400 1.1k
250 250 250 250 250 250
<1% 4% 93% >100% 63% 23%
Capacity planning is the process of predicting future workloads and determining the most cost-effective way of postponing system overload and saturation. If the demand on this server is 250 Transactions/s it is a simple matter to determine both performance bottlenecks and also the expected performance of the equipment upgrades. From table 5 it is possible to determine that for this web server, the hard disc drive, CDROM and ISA bus are inadequate. The metric of transactions/s can easily be converted to the fundamental unit of Mbytes/s, which can then be used to determine the required performance specification of alternative bus structures, CDROM devices and hard discs. A PCI (32 bit) bus structure is capable of 44Mbytes/s. A 40-speed CDROM device has a bandwidth of approximately 6Mbytes/s. Similarly replacing the single hard disc drive by one with a higher performance specification (rpm and higher track capacity) results is a new server capable of meeting the required workload (Table 6). Table6: Upgraded server Device Processor DRAM Hard Disc CROM PCI Bus Ethernet
5
Bandwidth (MBytes/s) 1600 64 12.5 6 66 11.25
Bandwidth (Tps) 160k 6.4k 1.25k 0.6k 6.6k 1.1k
Load (Tps) 250 250 250 250 250 250
Utilization <1% 4% 20% 42% 4% 23%
Secure Electronic Transactions
Security is an essential aspect of e-commerce transactions. There are two main classes of cryptographic algorithms: Symmetric and Public Key (PK). It is possible
367 to estimate the overheads of employing different cryptographic algorithms using the simple bandwidth model. The most common PK algorithm is RSA which has been evaluated as a system load for different key sizes measured in milliseconds [15]. Cryptographic algorithms are CPU intensive operations that required considerable microprocessor time. The B = C x D x E equation is still applicable, however instead of C (clock frequency, MHz) the reciprocal is used I/time (seconds) and D is the key length. It is then possible to calculate the effective bandwidth of a microprocessor in Mbytes/s. For a Pentium II, 266MHz the Input/Output bandwidth is approximately 2128Mbytes/s. However, the computational overhead for a 256Byte key size the public key performance is 5470Bytes/s and the private key performance is 2128Bytes/s. These figures clearly demonstrate that the PK encryption cannot be used for transferring large data volumes. 6
Conclusions
Large IT systems are a complex collection of heterogeneous technologies described by a wide variety of different models and associated performance metrics. This wide range of models and metrics is problematic as it is difficult to compare the relative performance of the different technologies. The performance of any system is ultimately dependant of the speed of the slowest device. B-Nodes have been successfully used to model computer technology on a micro level (digital systems, microprocessor, electronic memory, hard disc drive etc.). The use of fundamental units allows other more meaningful units to be derived. Using B-Nodes it is simple to convert the performance of an e-commerce web site (transactions/s) to Mbytes/s and hence determine the load on a server architecture. B-Node modeling is a simple, diagrammatic, and easy to use method. The model employs abstraction and hence it is independent of underlying technologies. It is therefore applicable not to current and old technologies but is likely to be valid for future technological developments. It is a scalable modeling method that can be used at the micro level but also on a macro level for a global information structure. It allows recursive decomposition that allows detail to be controlled.
References 1.
Tucker, A.B., et al., A Summary of the ACM/IEEE-CS Joint Curriculum Task Force Report, Computing Curricula 1991. Communications of the ACM, 1991. 34(6). 2. Cooling, J.E., Software Design for Real-Time Systems. 1991, Padstow, Cornwall: Chapman and Hall. 3. Dasgupta, S., Computer Architecture - A Modern Synthesis. 1989, New York: John Wiley & Sons.
368 4.
5. 6.
7.
8.
9. 10.
11. 12.
13.
14.
15.
Perry, D.E. and A.L. Wolf, Foundations for the study of software engineering. ACM SIGSOFT, Software Engineering Notes, 1992. 17 (4): p. 40-52. Amdahl, G.M., Architecture of the IBM/360. IBM Journal of Research and Development, 1964. 8(2): p. 87-101. Maj, S.P., et al., Computer and Network Installation, Maintenance and Management - A Proposed New Curriculum for Undergraduates and Postgraduates. The Australian Computer Journal, 1998. 30(3): p. 111-119. Maj, S.P., D. Veal, and P. Charlesworth. Is Computer Technology Taught Upside Down? in 5th Annual SIGCSE/SIGCUE Conference on Innovation and Technology in Computer Science Education. 2000. Helskinki, Finland: ACM. Maj, S.P., G. Kohli, and D. Veal. Teaching Computer and Network Technology to Multi-Media students - a novel approach, in 3rd Baltic Region Seminar on Engineering Education. 1999. Goteborg, Sweden: UNESCO International Centre for Engineering Education (UICEE), Faculty of Engineering, University of Melbourne. Clements, A., Computer Architecture Education, in IEEE Micro. 2000. p. 10-22. Maj, S.P. and D. Veal, Computer Technology Curriculum - A New Paradigm for a New Century. Journal of Research and Practice in Information Technology, 2000. 32(August/September): p. 200-214. Mueller, S., Scott Mueller's Upgrading and Repairing PCs.. 1999, QUE. Indianapolis Indiana, p. 891-898. Mazidi, M.A. and J.G. Mazidi, The 80x86 IBM PC & Compatible Computers, Volumes I & II, Assembly Language Design and Interfacing. 1995, New Jersey: Prentice Hall. Maj, S.P., D. Veal, and A. Boyanich. A New Abstraction Model for Engineering Students, in 4th UICEE Annual Conference on Engineering Education. 2001. Bangkok, Thailand: UNESCO International Centre for Engineering Education (UICEE), Faculty of Engineering, University of Melbourne. Menasce, D.A., et al. A Methodology for Workload Characterization for ECommerce Servers, in 7999 ACM Conference in Electronic Commerce. 1999. Denver, CO: ACM. Freeman, W. and E. Miller. An Experimental Analysis of Cryptographic Overhead in Performance-Critical Systems, in Seventh International Symposium. Modelling, Analysis and Simulation of Computer and Telecommunications Systems. 1999. College Park, MD.
369 THE MONTCLAIR ELECTRONIC LANGUAGE LEARNER DATABASE EILEEN FITZPATRICK AND STEVE SEEGMILLER Department of Linguistics, Montclair State University, Upper Montclair, NJ 07043, USA E-mail: {fitzpatr/seegmillj® sapir.montclair.edu The work described here aims to enable more efficient research and application design in the field of second language performance. We are doing this by expanding a corpus of error-annotated written English that we have built as a feasibility study [2], The goal is to make the resulting corpus publicly available for applications in second language pedagogy, research in second language acquisition, and the design of online writing aids for second language learners.
1
Introduction
Research and development in the field of natural language engineering proceeds by building models of human language performance in an effort to duplicate that performance on a machine. Over the past 10 years, the paradigm for language modeling in natural language engineering has shifted. Models based on introspectively obtained rules have given way to models based on empirically observed patterns in archived data, or corpora. This shift followed the success of the empirical approach in speech recognition [4,5] and the increase in machine storage capacity that enables large amounts of data to be maintained and manipulated. Since most language engineering applications serve the general user, most corpora are designed to model the language of the native speaker (NS). However, more recently, corpora that model the performance of non-native speakers (NNSs) of a language have begun to appear [3]. These corpora are designed primarily to enable the study of differences between NS usage and NNS usage, with the aim of understanding more about second language acquisition, and to enable tool development (spell checkers, grammar checkers, and other writing aids) for NNSs. This paper describes a particular type of NNS corpus of formal written English being developed at Montclair State. For some applications, primary language data is sufficient for model building, but the value of the data is greatly increased when it is annotated with linguistic information like the part of speech of the words in a sentence or the syntactic structure of the sentence. After careful hand annotation of a representative subset of the language, subsequent annotation may be done automatically [POS & parsing refs]. However, since the language of NNSs often varies greatly from the standard language, automatic annotation designed for the standard language performs poorly on NNS text. In this paper we describe a project at Montclair State that is hand
370
annotating the errors in NNS text in such a way that the text can subsequently be submitted for conventional automatic annotation for part of speech and syntactic structure. Montclair is particularly well-suited to carry out this project. Its Center for Language Acquisition, Instruction, and Research (CLAIR) teaches a set of languages - though primarily English - to speakers of unusually diverse native language backgrounds. CLAIR also houses master teachers of English as a Second Language and linguists to annotate the text. 2
The Raw Corpus
The raw corpus currently consists of formal essays written by upper level students of English as a Second Language preparing for college work in the United States. A portion of the essays are timed essays written in class; the rest are untimed drafts written at home. The corpus is small at 25,000 words, but we have recently begun collecting the data systematically which will increase its size quickly. Essays are either submitted electronically or transcribed from hand-written submissions. A record is kept as to how each essay was submitted. Interested student authors sign a release form that entitles us to enter their written work into the corpus throughout the semester. These students also complete a background form on native language, other languages, schooling, and extent and type of schooling in the target language, currently only English. 3
The annotation
Other corpora that are annotated for error, including the Hong Kong corpus [7] and the PELCRA corpus at the University of Lodz, Poland, use a predetermined tagset to mark the errors. While this approach guarantees a high degree of tagging consistency among the annotators, it limits the errors recognized to those in the tagset. Our concern in using a tagset was that we would skew the construction of a model of L2 writing by using a list that is essentially already a model of L2 errors. The use of a tagset also introduces the possibility that annotators will misclassify. Finally, we are concerned that the 'one size fits all' approach of a tagset would force us to apply the same standards to different written genres, e.g., email or postings to listserves. In place of a tagset, we ask annotators to minimally reconstruct the error to yield an acceptable English sentence. Each error is followed by a slash and a minimal reconstruction of the error is written within curly brackets. Missing items and items to be deleted are represented by "0". Tags and reconstructions look like this: school systems {is/are} since children {0/are} usually inspired
371 becoming {a/0} good citizens Reconstruction is faster than classification, there is no chance of misclassifying, and even less common errors are captured. Additionally, syntactic parsers and part-ofspeech taggers often fail with ungrammatical input. A reconstructed text can be more easily parsed and tagged for part-of-speech information. Reconstruction, however, has its own difficulties. Without a tagset, annotators can vary greatly in what they consider an error. One recurring example of this involves the use of articles in English. For instance, the sentence The learning process may be slower for {the/0} students as well is correct with or without the article before students. However, the use of the indicates that a particular group of students had been identified earlier in the essay, whereas the absence of the indicates that students refers to students in general. An additional difficulty is that different annotators may reconstruct an error differently. For example, the student need help can be reconstructed as the {student/students} need help or the student {need/needs} help. We are performing several experiments to determine how much accuracy and efficiency we could achieve in tagging errors[2]. The first experiment was a baseline test to determine if it is possible to get any sort of tagging agreement without a predetermined tagset. In a set of 1549 words, we identified 152 errors. The annotators achieved an average precision rate of .85 and a recall rate of .81.' Encouraged by these results, we tested whether we could develop annotation guidelines that would improve tagging agreement. The authors independently tagged a set of essays and compared annotations. This comparison is shown as Test One in Table 1. We then discussed our annotations, agreed on guidelines, annotated a second set of essays and compared. The comparison after discussion and guidelines is shown as Test Two. Test One Two
Words 2476 2418
Errors 241 193
Recall .73 .76
Precision .84 .90
Table 1. Experiment with annotator guidelines after Test One. Given these results, we are now replicating this experiment with master teachers to develop careful guidelines and a model tagged data set for graduate student annotators to follow. We anticipate that we will not achieve a higher level of agreement between the annotators tagging independently and that we will continue to need two annotators to produce reliably tagged data.
372 4
Annotation Tools
Currently, annotators are using a simple Linux text processor of their choice to annotate the text. We anticipate that we will be able to annotate common errors like subject-verb disagreement automatically and present 'cleaner' text to the annotators who will be left to deal with the more idiosyncratic errors. We intend to automate soon for a few high frequency errors and test whether this improves inter-annotator accuracy and/or efficiency in tagging. Other annotation projects increase efficiency by using interactive annotation tools [1], which also help accuracy by reducing some of the tedium of the task. These tools are better suited to part-of-speech and syntactic tagging where either small windows of text or partial syntactic trees are shown to the annotator. Since our annotation sometimes requires a global judgment at the paragraph level (for instance, in the case of the referent of students in the example given in section 3), we have not used interactive tools. Annotators compare their tags word by word with a Linux shell script using sdiff that lines up the text as shown below and also counts the number of shared tags and the number of tagging discrepancies. This enables the annotators to concentrate on the discrepancies efficiently {this/it} {will:would) not be surprising 5
{this/it} I will not be surprising
Accessing the Data
Each essay is stored in a separate file keyed by a unique number. Each file contains the essay including the annotations. Where the two annotators disagreed about a tag, both annotations are saved. A Linux s e d script enables a researcher examining the essays to see either the original, unannotated essay or the text with annotations. Corpus subdirectories divide the text data by course level and particular class and further subdivide it by essay type (timed or untimed). Background information on the author of each essay is kept in a single data file linked to the essay by the key. A menu driven by a Perl script gives the researcher access to the background information for a particular essay. We are currently writing a script that will enable the researcher to accumulate background information for a particular kind of error.
373 6
Data Processing Tools
We are currently using only Linux tools to look for patterns in the data while we build the corpus. These include searching for particular kinds of error and calculating error frequency given corpus and essay size. An issue we have yet to address regarding the corpus user is the idiosyncracy of the tags. Currently a user cannot search for a particular error type, for example number disagreement, since the tags do not indicate the error type. For high frequency errors, we plan to convert the tags automatically to a named error type for easy search. However, for less frequently occurring errors, the user will still have to peruse a list of tagged errors. Tests of how the user searches the corpus will inform our design of a tool to display less frequently occurring error types. We are also building a tool that will give the corpus user statistics on error occurrence, including error type plotted against background information. This is particularly useful in second language acquisition research which seeks to discriminate second language errors attributable to the native language background from errors attributable to the learning process in general, or to some universal features of language. 7
Applications
We plan to make the MELD corpus publicly available. The design of the corpus allows it to be used for several applications in second language pedagogy and research, as well as in the building of editing tools for second language learners. Here we give examples of possible applications of the corpus. 7.1
Second Language Pedagogy
Frequency of error by level or native language background gives a teacher information as to what writing problems s/he should concentrate on. By comparing word usage and syntactic usage against a comparable NS corpus, the teacher or textbook writer can discover gaps in the NNS use of the language and develop materials accordingly. The corpus can also be used for testing purposes since it allows testing to be targeted to specific levels and language backgrounds. Several types of corpus-based exercises for students have been developed [6] though they are not widely available. A publicly available corpus will enable more exercises of this type. In addition, MELD's reconstruction of the text enables students to use portions of the corpus for proofreading exercise. Certain types of error can be 'turned off so that the student sees only the type of usage s/he needs to master. The student can then compare corrections with those of the annotator.
374
7.2
Second Language Acquisition Research
As mentioned above, research in second language acquisition is heavily oriented to investigating the origin of errors either in the NNS's transfer of first language attributes or in use of an interlanguage that the NNS creates as ever closer approximations to the second language. The corpus will enable the researcher to statistically analyze the distribution of errors by native language background, level of study, gender, age, mastery of other languages, and spoken and written exposure to the target language. The corpus can also be used by lexicographers to study how the NNS word usage diverges from that of native speakers. 7.3
Editing Tools
Spell checkers and grammar checkers are typically based on frequency of error distribution. They do not work well for the writing of NNSs because their errors show statistically different distributions. For example, error of complement type (/ need {of/0} somebody) are rare in NS writing, but very common in the MELD corpus. The corpus provides the statistical base required to develop these tools. 8
Conclusion
The corpus being collected and annotated should provide a wealth of empirical data to assist in second language pedagogy, research, and tool building. We plan to make the corpus and tools publicly available on the web. 9
Acknowledgements
We thank the master teachers Jacqueline Cassidy, Norma Pravec, and Lenore Rosenbluth, who contributed careful labor and thoughtful discussion in providing a tagged data set and tagging guidelines and the graduate student annotators Jennifer Higgins and Donna Samko. References 1. Bredenkamp, A, B. Crysmann, and J. Klein Annotation of error types for German news corpus, in Journees AT ALA sur Ies Corpus Annotes pour la Syntaxe Treebanks Workshop (1999) Paris. 18-19 juin pp. 77-84. 2. Fitzpatrick, E. and Seegmiller, S. Experimenting with Error Tagging. The Second North American Symposium on Corpus Linguistics and Language
375
3. 4. 5. 6. 7.
1
Teaching. Northern Arizona University, Flagstaff, AZ, (March 31-April 2 2000). Granger, S. (ed). Learner English on Computer. (1998) Addison-Wesley Longman. Jelinek, F. Self-organized language modeling for speech recognition. IBM T.J. Watson Research Center, Continuous Speech Recognition Group, Yorktown Heights, NY (1985). Jelinek, F. Markov source modeling of text generation. In The Impact of Processing Techniques on Communications, ed. by J.K. Skwirzinski (Nijhoff, Dordrecht, 1985). Milton, J. Exploiting LI and interlanguage corpora in the design of an electronic language learning and production environment. In Granger, S (ed). Milton, J. and N. Chowdhury. Tagging the interlanguage of Chinese learners of English. In Entering Text, ed. by L. Flowerdew and A.K.K. Tong. Language Centre, The Hong Kong University of Science and Technology (1994).
Precision is the measure of errors identified by both annotators divided by the errors identified by the 'non-expert'. (How many tags were correct out of all the errors s:he tagged?) Recall is the measure of errors identified by the non-expert divided by the errors identified by the expert. (Out of all the errors identified, how many did the non-expert get?) Precision and recall show the distance in tagging between the two annotators.
Computing Formalism/Algorithms
379
I M P R O V E M E N T OF SYNTHESIS OF C O N V E R S I O N RULES BY EXPANDING KNOWLEDGE REPRESENTATION H.MABUCHI Iwate Prefectural University, 152-52 Sugo, Takizawa, Iwate, 020-0173, Japan E-mail: [email protected] K.AKAMA. H.KOIKE AND T.ISHIKAWA Hokkaido University, Kita 11, Nishi 5, Kita-ku, Sapporo, 060-0811, Japan E-mail: [email protected], [email protected] This paper proposes a natural and efficient method for solving a problem by expanding the space using the characteristics of the space in which a problem can not be solved. We deal with the synthesis of conversion rules that simplify a logical circuit as a concrete problem, and by comparing the synthesis of conversion rules in the space before expansion of a knowledge representation with that in the space after expansion, we show that expansion of knowledge representation is efficient for synthesis of conversion rules. A declarative program is treated as a knowledge representation, and the synthesis of conversion rules is performed by equivalent transformation. The expansion of knowledge representation is performed by applying an equivalent transformation rule.
1
Introduction
Each of knowledge representations has been shown to be superior for certain types of problems. However, in some cases a problem can not be solved effectively and automatically in only a certain space of a certain knowledge representation. We therefore consider how to solve such problems effectively and automatically. As a solution, we propose in this paper a natural and efficient method for solving a problem by expanding the space using the characteristics of the space in which a problem can not be solved. We deal with the synthesis of conversion rules that simplify a logical circuit as a concrete problem [1], and by comparing the synthesis of conversion rules in the space before expansion of a knowledge representation with that in the space after expansion, we show that expansion of knowledge representation is efficient for synthesis of conversion rules. The synthesis of conversion rules requires a new conversion rule (i.e., synthesis rule), which is obtained by combining two or more existing successive conversion rules to achieve a new conversion from one state to another. A declarative program is treated as a knowledge representation, and a problem that can not be solved in the space before expansion of declarative
380
program is solved in an expanded space that uses the characteristics of the space before expansion. The space before expansion of declarative program is the same as the space of logic program [2]. In an expanded space, various data structures, including strings and multisets as well as terms, can be treated. The synthesis of conversion rules is performed by equivalent transformation [3,4]. Equivalent transformation is the conversion of a program into another equivalent program. The expansion of knowledge representation is performed by applying an equivalent transformation rule and enables a solution to be obtained efficiently and automatically and at a low cost. 2
Improvement of Synthesis by Expanding Knowledge Representation
The concept of improvement of synthesis by expanding knowledge representation is shown in Fig.l. Let a space before expansion be Ti, and let the space after expansion be IV
\ y i
y 1
/
Figure 1. Improvement of Synthesis by Expanding Knowledge Representation.
A is a program including conversion rules which are synthesized. Program B including a synthesis rule is obtained from A. However, if this synthesis rule is not useful, program D including a useful synthesis rule must be obtained in Ti. When automatically obtaining D from B is difficult (see chapter 3), we must consider other methods to obtain a useful synthesis rule. The problem, however, is the cost of change. Methods such as not changing the way to synthesize or not changing the knowledge representation could be considered to reduce this cost. We therefore propose a method for solving a problem in an expanded space of Ti using the characteristics of space Y\. Data structures are characterized by a mathematical structure called a specialization system [4], and a program is defined on a specialization system. A specialization system is a theoretical foundation of knowledge representation and determines the objects that are treated in each space. By prescribing a specialization system, Ti and 1^ can be made. 1^ must be established so that
381
program C including a useful synthesis rule from B can be obtained efficiently and automatically. To achieve this, we expand Ti to allow treatment of various data structures including multisets and constraints. The transformation from Ti into T2 is performed by an equivalent transformation rule that converts representation of Ti into that of I V As mentioned above, program A can be transformed efficiently and automatically into C through B by equivalent transformation in I V 3
Synthesis in the Space before Expansion of Declarative Program
We describe two conversion rules ("andAnd rule" and "noConnection rule"), which are synthesized, from among the many conversion rules that simplify logical circuits. The andAnd rule means that "when an output terminal of a and element is an input terminal of another and element, two and elements become one" and is represented in the space before expansion as follows. C\ : arc(andAnd,Circuitl,Circuit2)*member_rest([and,E,IN],Circuitl.RestCircuit), member_rest([and,D,R1],RestCircuit,RR), member_rest(D,IN,R2), member_rest([and,E,R3],Circuit2,RestCircuit), union(R1.R2.R3).
The first argument of predicate arc is the name of the conversion rule, the second argument is a logical circuit before conversion, and the third argument is a logical circuit after conversion. As for [and,E,IN], for example, the element is and, output is E, and input is IN. member_rest(X, Y,Z) means that "Z is a list that removes element X from list Y". The noConnection rule means that "when a terminal of an element is not connected to a terminal of another element, the element is removed from the logical circuit" and is represented in the space before expansion as follows. C2 '• arc (noConnection, Circuit, RR1)«— member_rest([ELEMENT,AA,P].Circuit,RR1), notExist(AA.RRl), free(AA).
notExist(AA,RRl) means that AA is not used as a terminal in logical circuit RR1, and free(AA) means that AA is not connected to another terminal. The rule obtained by the synthesis of Cj and C2 is as follows. C3 : newarc(andAndnoConnection,Circuitl,RRl)-(—
382 member_rest([and,E,IN],Circuitl.RestCircuit), member_rest([and,D,R1],RestCircuit,RR), member_rest(D,IN,R2), member_rest([and,E,R3],Circuit2,RestCircuit), union(R1,R2,R3), member_rest([ELEMENT,AA,P],Circuit2,RRl), notExist(AA.RRl), free(AA).
The predicate newarc has the form such that newarc (synthesis rule r, state 1, state 2), and this means that "state 1 is converted into state 2 by r". However, Cz is simply a combination of the body of C\ and that of C2 • By calling one clause Cz instead of calling two clauses Ci and C2, execution time is slightly decreased, but there are no other merits of the synthesis. Thus, C3 in program B is not useful. Therefore, we try to look for a useful synthesis rule (including in program D of Fig. 1) i n T i . Since there are five member_rest literals in the body, we consider reducing the number. Therefore, we consider a transformation in the intersection of plural member_rest literals in the body. For example, suppose that the following possible transformation is selected. Tx : {member_rest(Yl,Y2,Y3), ...(a) member_rest(Y4,Y5,Y2), ...(b) member_rest(Yl,Y5,Yans)} . . . ( c )
I {member_rest(Yl,Y2,Y3), ...(d) member_rest(Y4,Yans,Y3)} . . . ( e ) 7\ means to transform {(a),(b),(c)} into {(d),(e)}. By applying this transformation to Cz, one useful synthesis rule can be obtained. However, finding this transformation is difficult, and this transformation is not equivalent. 4
Example of Improvement of Synthesis by Expanding Knowledge Representation
We look for a useful synthesis rule in an expanded space by applying an equivalent transformation rule to Cz- This processing corresponds to the transformation from B to C in Fig.l. Since there exist five member_rest literals in the body of Cz, we apply the following equivalent transformation rule to these literals.
383
T2: member_rest(X,Y,Z)
4equal(Y,{X|Z}) equal means Y = {X | Z}. By applying T2 to C3, the following clause C 4 is obtained. Then, in an expanded space, logical circuits can be represented as a set of plural elements. C4 : newarc(andAndnoConnection,Circuitl,RR1)<— equal(Circuitl,{[and,E,{D|R2}],[and,D,Rl]|RR}), equal(Circuit2,{[and,E,R3],[and,D,Rl]|RR}),union(Rl,R2,R3), equal(Circuit2,{[ELEMENT,AA,P]|RR1}), notExist(AA.RRl),free(AA). Here, {[and,E,R3],[and,D,Rl] | RR} is unified with {[ELEMENT,AA,P] | RR1}. Then, the unifications could be as follows. (1) [and,E,R3]0 = [ELEMENT,AA,P]a, {[and.D.Rl] IRR}0 = RRICT (2) [and,D,Rl]0 = [ELEMENT,AA,P]CT,{[and,E,R3]IRR}0 = RRICT
(3) RR0 = {[ELEMENT,AA.P] |RR'}<7,{[and,D,Rl] , [and,E,R3] |RR'}0 == RRl
384
Here, o, mpl, mql, etc. are input or output terminals. The examples of (1) and (2) are substituted for Ce- Then, there exists a substitution such that {D / {mp2},Rl / {mql, mq2},E / o,R2 / {mpl}, R R / (),R3 / {mql, mq2 , mpl}}, and this substitution is also true concerning the body. As for C5 and CV, there exists no such substitution. Therefore, C& is a useful synthesis rule corresponding to the given examples. 5
Discussion
By expanding knowledge representation, three synthesis rules were automatically obtained. Therefore, one synthesis rule C3 in the space before expansion tacitly contains information on these three synthesis rules. However, it is difficult to obtain one useful synthesis rule from C3 in I V In contrast, in an expanded space, by giving examples, one useful synthesis rule (C§) corresponding to the given examples can be obtained efficiently and automatically. In the representation of a rule, since most of the logical circuits are represented in the body, in the space before expansion, the body is long and it is difficult to understand connections of circuits. Moreover, since the body is unfolded by equivalent transformation, the processing cost is high when the body is long. In contrast, in an expanded space, since most of the logical circuits are represented in the head as a set, the body is short and it is easy to understand the connections of circuits. References 1. Takeuchi A. and Fujita H.. Competitive Partial Evaluation. Workshop on Patial Evaluation and Mixed Computation. (1987) pp. 317-326. 2. Lloyd J.W.. Foundations of Logic Programming. Second Edition. (Springer-Verlag. 1987). 3. Tamaki H. and Sato T.. Unfold/fold Transformation of Logic Programs. Proc. of 2nd ILPC. (1984) pp. 127-138. 4. Akama K.. Shimizu T. and Miyamoto E.. Solving Problems by Equivalent Transformation of Declarative Programs. Journal of Japanese Society for Artificial Intelligence, vol.13. N0.6 (1998) pp. 944-952. 5. Dejong G. and Mooney R.. Explanation-Based Learning: An Alternative View. Machine Learning, vol.1. No.2 (1986) pp. 145-176. 6. Mitchell T.M.. Keller R. and Kedar-Cabelli S.. Explanation-Based Generalization: A unifying view. Machine Learning, vol.1. No.l (1986) pp. 47-80.
385 A BLOCKS-WORLD PLANNING SYSTEM BHANU PRASAD AND VORAPAT CHAVANANIK.UL School of Computer and Information Sciences, Georgia Southwestern State University, AmericusGA 31709 USA E-mail: [email protected] In this paper we present a planning system for blocks-world domain. This system is inspired by the way how human beings perform real world tasks. An important component of the system is a whole priority list, which guides the system in selecting suitable sub-goals in solving a given problem. The system is entirely different from the existing systems, which are primarily based on either backtracking or on invariant intermediate states or random selection of sub-goals. This system selects sub-goals in a systematic fashion, as guided by the whole priority list. It generates a plan in polynomial amount of time. The system has been implemented using Common Lisp. A graphical user interface is incorporated for the convenient specification of user inputs.
1
Introduction
For a given start and goal states, a plan is a sequence of operators (or states) that connects the start state to the goal state. The process of finding this sequence is the task of planning [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. One of the important domains for demonstrating planning systems is blocks-world [4, 8,9, 12]. A
c A
B C
B
Start state: ON(A, TABLE), ON(C, A), ON(B, TABLE)
Goal state: ON(C, TABLE), ON(B, C), ON(A, B)
Figure 1. A sample start and goal state descriptions. In the literature, various versions of blocks-world planning have been widely investigated [4, 8, 9, 12]. The objects in the problem domain include a finite number of cubical blocks and a table large enough to hold all the blocks. Each block is on a single other object (either another block or the table). For each block b, either b (i.e., the top of b) is clear or a single (unique) block a sitting on b. There is a single action called move, which can move a single clear block, either from another block onto the table, or from the top of an object onto another clear block.
386 A problem in this domain is specified by giving the start and goal state descriptions. For example, figure 1 contains a start and goal state descriptions. In the literature, a number of approaches have been presented for blocks-world planning [4, 8, 9, 12]. These systems either require non-deterministic selection of sub-goals [12] or backtracking [9, 12] or need some invariant intermediate state [4] or require hierarchical arrangement among the pre-conditions of the operators [12] or require a case-base of previous plans [1, 2, 3, 5, 6, 10]. Non-deterministic selection is very expensive in terms of time and computational efforts. Though the intermediate state approach is simple, it often generates non-optimal plans. The problem with the hierarchical approach is that the user has to explicitly define the hierarchies. The case-based approaches are efficient but they are good for structured domains [10]. In this paper we present a system for solving blocks-world planning problem. 2
Proposed System
The proposed system is inspired by the way how human beings perform real world tasks such as building construction (buildings are constructed from bottom to top but not vice-versa). The system is based on the observation that, for a given start and goal states, the sub-goals need to be satisfied from the bottommost level to till the topmost. For example, in figure 1, the sub-goal ON(C, TABLE) is at the bottommost level and the sub-goal ON(A, B) is at the topmost level. •|
Input
1
Graphical User interface 1 J r™"f. . ,. " ijf Priority list eenerator p~~ w
Output
Figure 2. Schematic diagram of the system. Once the structure of a given goal is analyzed, then the analysis is used in determining the order in which the sub-goals are considered while generating a plan. The system consists of three modules namely the graphical user interface, the priority list generator and the planner, as shown in figure 2. The system has been implemented using Common Lisp. In the diagram, the arcs represent the direction of data flow. The graphical user interface module accepts user input and converts it into a suitable format and supply the result to the priority list generator. The priority list generator generates a list containing the ordering relations among the sub-goals and supply the output to the planner. The planner generates a plan and supplies the output to the user. Now we see each of these modules in detail.
387 2.1
Graphical user interface
A graphical user interface is developed for the convenient specification of user inputs. Visual Basic® front end is used as the interface. The interface consists of two forms, one for specifying the start state description and another one for the goal state. If a block X is on top of another block Y then this information is represented as ON(X, Y). Here, ON is the name of a predicate. The information regarding the internal representation of user input is passed on to the priority list generator. 2.2
Priority List Generator
The priority list generator generates information regarding the order among the subgoals. This module is based on a keyword parser [6]. Some concepts that are used in the rest of the paper are presented below. Tower: A vertical structure of blocks is called a tower. Example: In figure 3, the goal state consists of three towers. The first tower is made up of the blocks D, H, G and E, the second one is made up of F, B, and A and the third one consists of C. Priority list of a tower: The priority list of a tower is a sequence of elements of the form ON(X, Y), in which the first element is the bottommost sub-goal in that tower and the next element is the next bottom level sub-goal in the tower and so on. Example: In figure 3, the priority list of the first tower in the goal state is: (ON(D, TABLE), ON(H, D), ON(G, H), ON(E, G)) H
E
G
G
A
F
H
B
E
D
F
Start state
C
Goal state
Figure 3. A sample start and goal states. Whole Priority List (WPL) of a state: For a given state description, the collection of priority lists with each list represents a tower in the state is called the whole priority list of the state or simply WPL of the state. Example: The WPL of the goal state in figure 3 is: ((ON(D, TABLE), ON(H, D), ON(G, H), ON(E, G)) (ON(F, TABLE), ON(B, F), ON(A, B)) (ON(C, TABLE))) Bottommost goal of a tower: From WPL, find out the appropriate priority list of the tower and return its first element. Bottommost goal of a tower is updated whenever WPL of the state is updated. Example: The bottommost goal for the first tower in the goal state of figure 3 is: ON(D, TABLE).
388 Bottommost set: It is the union of the bottommost goals of each of the towers in the goal state. Bottommost set is updated whenever the bottommost goal of a tower in the goal state is changed. Example: the bottommost set for the example in figure 3 is: ((ON(D, TABLE), ON(F, TABLE), ON(C, TABLE)). Topmost block of a state: For a given state description, a block which does not support any other block is a topmost block in that state. Example: The topmost blocks of the goal state in figure 3 are: E, A, and C. Height of a block in a state: For a given state and a block, the number of supporting blocks of the block is called the height of the block in the state. Example: In figure 3, the heights of the blocks D, H, G, E in the goal state are respectively 0, 1,2, and 3. 2.2.1
Algorithm for finding WPL
1. Receive goal state description from the user. 2. Convert the user input into the internal form and call it as INPUT 3. WPL
389 2.3
Planner
The planner generates the plan based on the information supplied by the priority list generator. The planning algorithm is presented below. In this algorithm the start state is represented as S and the goal state as G. 1. Create an empty list called PLAN. Create a Boolean variable FLAG. 2. Until the bottommost set is empty do the following { 3. FLAG
Complexity Results
If the number of blocks in the goal state is n then in the worst case, the INPUT is parsed n + (n-1) + (n-2) +...2 + 1 = n(n-l)/2 times. Therefore, the worst case complexity of the WPL algorithm in section 2.2.1 is O(n'). Steps 3, 4, and 5 are the important ones for the algorithm in section 2.3. The worst-case complexity of step 3 is 0(n"). The worst case complexity of step 4 is 0(n3). This is because, in the worst case, there are n elements in the bottommost set. For each of these elements, the move operator can be instantiated and applied in nc2 ways. The worst case complexity of step 5 is again O(n). As a result, the total complexity of the algorithm in section 2.3 is 0(n4). The total complexity of the system is 0(Maximum(n2, n4)) = 0(n4), which is polynomial type. 4
Conclusion and Future Work
In this paper we present a system for solving blocks-world planning problem. In addition, we have proved that the complexity of this algorithm is 0(n4). Since blocks-world planning is NP-hard [9], in some special cases the plans may not be
390 optimal. Now we are investigating the special cases. We are also applying the above results to other NP-hard problems in computer science. References 1. Bhanu Prasad., Planning With Hierarchical Structures, Proceedings of the Australian and New Zealand International Conference on Intelligent Information Systems, Australia, 1995. 2. Bhanu Prasad., Planning With Case-Based Structures, Proceedings of the American Association for Artificial Intelligence (AAAI) Fall Symposium on Adaptation of Knowledge For Reuse, David Aha and Ashwin Ram, Co-Chairs. MIT, Cambridge, USA, 1995. This paper is also available at http://www.aic.nrl.navv.mil/~aha/aaai95-fss/papers.html 3. Bhanu Prasad and Deepak Khemani, A Hierarchical Memory-Based Planner, Proceedings of the 1995 IEEE International Conference on Systems, Man & Cybernetics, Canada, 1995. 4. Bhanu Prasad and Deepak Khemani, Search Reduction in Blocks-World Planning, Proceedings of the Third International Conference on Automation, Robotics, and Computer Vision, Singapore, 1998. 5. Bhanu Prasad and Deepak Khemani, A Memory-Based Hierarchical Planner, Case-Based Reasoning Research and Development, Manuela Veloso and Agnar Aamodt (eds.), Lecture notes in Artificial Intelligence, Springer-Verlag 1313. 6. Bhanu Prasad and Deepak Khemani, Cooperative Memory Structures and Commonsense Knowledge for Planning, Progress in Artificial Intelligence, E. Costa and A. Cardoso (Eds.), Springer-Verlag, Lecture Notes in Artificial Intelligence, 1323. 7. Do. B. and Kambhampati. S., Solving Planning Graph by Compiling it into a Constraint Satisfaction Problem, International Conference on Artificial Intelligence Planning Systems 2000. 8. Gupta. N and Nau. D.S., Complexity Results for Blocks-World Planning, In AAAI-91, 1991. 9. Gupta. N and Nau. D.S., On the Complexity of Blocks-World Planning, Artificial Intelligence, 56(2-3):223-254, 1992. 10. Hammond. K., Case-Based Planning: Viewing Planning as a Memory Task, Acedemic Press, NewYork, 1989. 11. Lotem. A, Nau. D and Hendler. J., Using Planning Graphs for Solving HTN Problems, In AAAI-99, 1999. 12. Nilsson. N.J., Principles of Artificial Intelligence, Morgan Kaufmann Publishers, 1993 13. Zimmerman. T and Kambhampati. S., Exploiting the symmetry in the Planning-graph via Explanation-guided Search, In AAAI-99, 1999.
391 MULTI-COMPUTATION MECHANISM FOR SET EXPRESSIONS
H. K O I K E Hokkaido
Division of System and Information Engineering, University, Kita 11, Nishi 5, Kita-ku, Sapporo 060-0811, E-mail: [email protected]
Japan
K. A K A M A Hokkaido
Center for Information and Multimedia Studies, University, Kita 11, Nishi 5, Kita-ku, Sapporo 060-0811, E-mail: [email protected]
Japan
H. M A B U C H I Iwate Prefectural
Faculty of Software and Information Science, University, 152-52 Sugo, Takizawa, Iwate 020-0173, E-mail: [email protected]
Japan
A set expression is useful for describing problems in which sets of elements satisfying given conditions are treated. In this paper, we propose a new method for representing and computing sets of terms that satisfy given conditions. We introduce an expression for representing sets, called a 'set-of reference', equivalent transformation rules for computation of the expression, and multi-computation space called 'world ! G In this paper, we also show a sample problem that can be solved by our method but cannot be solved by a logic paradigm.
1
Introduction
A set expression is useful for describing problems in many cases [5]. Logic programming languages such as Prolog [2] each have a built-in predicate 'setof' for finding a set of answers to a given condition as an atom or more [1]. However the 'setof predicate cannot treat infinite sets and does not have logical formulas. In this paper, a new method for treating sets is proposed. The method is based on the equivalent transformation (ET) paradigm [3] in which problems including sets are represented by a declarative description (a set of extended clauses) and computation is performed by equivalent transformations. In our method, problems including sets are solved with theoretical validation of correctness without using the usual theory of logic programming. A combination of the multi-computation mechanism called world mechanism and equivalent transformation rules provides flexible computation and avoids infinite loops. We show a sample problem that can be solved by our method,
392
but cannot be solved by logic programming. 2
Declarative Description
In the ET paradigm, a problem is defined by a declarative description. A declarative description is a set of definite clauses. An example of declarative description is as follows: P = {Ci,C2,Ca,C4}U{Qi}. Ql = (yes «— setof(X, [even(X)], S),mem(4, S).) Ci = (mem{A, [A\Z]) «- .) C 2 = [mem(A, [B\Z]) <- mem{A,Z).) C3 = (even{0) <- .) C4 = (even(X) <- X = Y + 2, even(Y).) The declarative description P consists of clauses from Ci to C4 and Qi- The clause Qi is a query and asks whether a set of all even numbers contains 4. The setof atom in Q\ is called a 'set-of reference' and represents a set of all even numbers. In a later section, the setof atom is described in detail. The clauses C\ and C2 define the mem predicate that determines whether the first argument is an element of the second argument. The clauses C3 and C4 define the even predicate that represents even numbers. In the ET paradigm, a declarative description does not have procedural semantics, but declarative semantics. Some problems rise when we attempt to solve P in Prolog systems. One of them is that C4 causes some errors, since the order of atoms in Prolog is significant. A more serious problem is that the setof atom in Qi, which is implemented as a built-in predicate, may cause infinite loops. Using our method, however, the correct answer to P can be obtained in finite time. 3
Declarative Meaning
A declarative description determines a set of ground atoms, which is called a 'meaning'. Definition 1 Let P be a declarative description. The declarative meaning M(P) of P is defined as follows: M(P)d±f 7>(0)Up>] 2 (0) U[r P ] 3 (0)U = --- = U~ = iP>]"(0), where 0 is an empty set and Tp is the "immediate consequence mapping", which is defined as follows. Definition 2 For any set x of ground atoms, g €Tp(x) iff there is a substitution 0 and a definite clause C in P such that CO is a ground clause, g is the head of CO, and all the body atoms of CO belong to x.
393 An ET rule w.r.t P is a rewriting rule that preserves .M(-P). 4
Set-of Reference
In order to represent a problem of treating a set of terms that satisfy a given condition, we propose a 'set-of reference'. Definition 3 For an arbitrary set X, gterm(X) is a mapping that gives the set of all lists that represent the set X. For example, gterm({a, b}) = [[a, b], [b, a]]. Definition 4 Let AT be an atom, x a term in AT, P a declarative description, X a term, QT a set of all ground terms, and S a set of all substitutions. An atom setofp(x,AT,X) represents the following relation. X G gterm({xO € QT \6 € S,AT6 6 M(P)}). Definition 5 Let ATL be a list of atoms, x a term whose elements are all contained in ATL, P a declarative description, X a term, QT a set of all ground terms, and S a set of all substitutions. An atom setofp(x,ATL,X) represents the following relation. X 6 gterm\{xe EQT\0£S, ATLO C M (P)}). For example, let Q = {p(l,2) <- . ,p(3,4) <- .}. setofQ(X, \p(X, Y)], [S\Z]) is true iff Z = [1]. The setof atom represents a 'set-of reference'. 5
ET Rule
Representation of ET rules ET rules rewrite a declarative description P into P' preserving the relation M.(P) = M{P'). An ET rule is represented by the following form: (head) => {a list of procedures} i, (a list of atoms) i; =>• {a list of procedures^ i (a list of atoms) 2 ; =^- {a list of procedures}n, (a list of atoms) n. An atom (head) is called 'head'. A head represents a matching pattern. For i = 1, 2, ..., n, a pair of {a list of procedures}j, which is a list of procedures such as unifications and arithmetic operations, and {a list of atoms}j is called a 'body' of the ET rule. A body can have a (possibly empty) list of procedures and a (possibly empty) list of atoms. An ET rule replaces an atom in a body of a clause matched by head with a list of atoms in a body of the rule if the
394 rule has one body. An ET rule can have one or more bodies. More generally, when an ET rule is applied to a clause the rule makes n copies from the original clause, executes the list of procedures in each body, rewrites atoms in each duplicate clause, and replaces the original clause with new clauses. If the execution of any of the procedures in the duplicated clause fails, then the clause is deleted. World Mechanism The world mechanism is used to compute set-of references. It provides multicomputation space called 'world' that is concurrently processed. Due to this mechanism, computation is divided into some smaller computations, and we can process them in any order when more than one applicable ET rule exists; thus, flexible computation is possible and we can avoid infinite loops. To process computation with the world mechanism, we introduce three rules; initializing rule, referring rule, and deleting rule. The initializing rule creates a new world and puts a new clause into the world. The new clause is made from a setof atom and is transformed by ET rules to obtain a set of answers. The referring rule is used by a world to refer to states of other worlds. The deleting rule deletes objects created by the initializing rule if they are not needed. Fig. 1 shows an example of the flow of computation with the world mechanism. A detailed explanation is given in a later section. Setof Atom A set-of reference setofp(T, C, S) is represented as: setof (T,C,S), where T is a term, C is a sequence of atoms representing conditions, and S is a list of terms. T represents an element of S and is included in C. S represents a set of ground instances of T satisfying C. The setof refers implicitly to a declarative description P by the world mechanism. 6
Problem-Solving Based on the ET Paradigm
In this section, an example of computation for set-of references is presented. The problem of determining whether 4 is an even number or not is considered. The declarative description P, described in section 2, defines the problem. Qi in P is a query that means the above question. Note that Q\ has a setof atom that represents a set of all even numbers.
395 Suppose a declarative description P is given. The following rules are made from P. (rl) mem(X,Y)
=> {Y = [X\Z]}; =*• {Y = [A\Z]},mem(X,Z). (r2) even(X) => {X = 0}; =>X = Y + 2,even(Y). The rules are ET rules since they preserve M(P). In computation, only <3i is transformed by ET rules. Fig. 1 shows the flow of computation. Worldl in Fig. 1 is implicitly created at the beginning of computation. The computation steps are as follows: 1. First, Qx is rewritten into Qi by the initializing rule, and the rule creates World2 and puts clause Ni into World2. The ref atom in Q2 refers to World2. 2. iVi is next rewritten into N2 by (r2). Since there exists a unit clause (ans(0) «— .) in WorZd2, Q2 is applied by the referring rule. The rule substitutes [OjS'2] for the third term S in the ref atom and deletes the unit clause in World2. Thus, Q3 and N3 are obtained. 3. Qz is transformed into Q4 by (rl), and N3 is transformed into N4 by (r2). 4. Since there exists a unit clause (ans(2) «— .) in World2, Q4 is applied by the referring rule. This rule substitutes [2153] for the third term 52 and deletes the unit clause in World2. Thus, Q 5 and A^5 are obtained. 5. A^5 is transformed into A^ by (r2). Then Q§ and N7 are obtained by the referring rule. 6. Q7 is obtained by applying (rl) to the atom in Q67. Qs is obtained by applying (rl) to the atom in Q-?. 8. Finally, since ref atom no longer has any effect on other atoms, the deleting rule is applied and Qg is obtained. Since there is a unit clause in Qs, computation can be terminated and an answer to the query is obtained.
396 7
Conclusions
We have proposed set-of references for describing problems with sets of terms satisfying a given condition and rules for computing set-of references and the world mechanism. Set-of reference is implemented on the world mechanism, enabling correct computation and avoiding infinite loops. Therefore, our method overcomes some problems that cannot be solved by logic programming. In our method, computation is conducted by using ET rules. We can describe computation corresponding to SLD-resolution and other methods by ET rules. Thus, we can describe more efficient algorithms by ET rules that cannot be realized by SLD-resolution. Furthermore, compared with other implementations for set expressions [5], a combination of the world mechanism and ET rules is simpler. References 1. D. Li: A PROLOG DATABASE SYSTEM, Research Studies Press Ltd., 1984. 2. J.W. Lloyd: Foundations of Logic Programming, Second edition, (Springer-Verlag, 1987). 3. K. Akama, T. Shimizu, and E. Miyamoto: Solving Problems by Equivalent Transformation of Declarative Programs, Jounal of Japanese Society for Artificial Intelligence, vol.13, NO.6, 1998, 944-952. 4. T. Yokomori: A Note on the Set Abstraction in Logic Programing Language, Proceedings of The International Conference on Fifth Generation Computer Systems, 1984, 333^340. 5. B. Jayaraman, K. Moon: Subset-logic programs and their implementation, The Journal of Logic Programming 42, 2000, 71-110.
397
World 1
"'{L
yes«-setof(X, [even(X)], S), mera(4, S). By t h e initializing rule The Initializing rule cre< itesworld2. '
w , VVUI
Id 2
Ans(X)«-even(X).
r
Q2<^
. * By (r2) C_ans(0)»-.; ,_-'"'' insTX)*1-X=M+2, even(M).
By t h e referring rule
r
"I
.,--""
yes«-ref(X, W2, S2), mem(4,[0|S2]).
J J
|
By(rl)
By (r2)
yes«-ref(X, W2, S2), mem(4,[S2]). C'ans(2)«^) , ' - ' ' ~ a n s T X ? - X « M + 2 , M=N+2, even(N).
1
.,'''
1
yes«-re«<X, W2, S3), mem(4,[2 |S3]). *"
* By t h e referring rule
6W
_-—"
_____
" ans(X? : : X=M+2, M=N+2,N=P+2,even(P).
By(rl)
r
*{ r Q8<*
1
ans(X)«-X-M+2,M=N+2,N-P+2,even(P). |
yes«~ref(X, W2, S4),mem(4,[4|S4]). |
}"•
By (r2)
Cans(4)«-D
yes<-ref(X, W2, S4), mem(4,[2,4|S4]).
L
}"•
: 1
ans(X)<-X=M+2, M=N+2,even(N).
r-
h 1
ans(X)*-X=M+2, even(M).
By t h e referring rule
-{
1N2
1
1
r
-{
k J
•
yes*-ref(X, W2, S), mem(4,S).
By(rl]
yes—ref(X, W2, S4). 1 By t h e d e l e t i n g rule
Q9J yes*—.
Fig. 1. Flow of Computation.
}"• 1 }"'
399
P R O V I N G T E R M I N A T I O N OF w R E W R I T I N G
SYSTEMS
Y. S H I G E T A Toshiba
Corporation,
580-1, Horikawa-cho, Saiwai-ku, Kawasaki, E-mail: [email protected]
212-8520,
Japan
K. A K A M A A N D H. K O I K E A N D T . ISHIKAWA Hokkaido
University, Kita 11, Nishi 5, Kita-ku, Sapporo, 060-0811, E-mail: {akama, koke, ishikawa}@cims.hokudai.ac.jp
Japan
A termination problem of a rewriting system proves the non-existence of an infinite reduction sequence obtained by the rewriting system. This paper formalizes a method for proving termination by abstraction, i.e., reducing an original concrete termination problem to a simpler abstract one and then solving it to prove the original problem's termination. Concrete and abstract rewriting systems in this paper are called w rewriting systems. They include very important systems such as term rewriting systems, string rewriting systems, semi-Thue systems, and Petri Nets.
1
Introduction
A termination problem of a rewriting system is to prove that the rewriting system has no infinite reduction sequence. Proving termination by "bruteforce search" [7] can take much (often infinite) time and space, since many (infinite) paths must be checked before all paths turn out to be finite. A better method is to try to map the original concrete termination problem to a simpler abstract one, with the aim of deriving useful information for the solution of the original problem. Such a technique is often applied to complicated problems in computer science and artificial intelligence [8,5,3]. The aim of this paper is to formalize such a method to prove termination of various rewriting systems that can be formalized as u rewriting systems. The class of u> rewriting systems is a very large class of rewriting systems and most important rewriting systems, including term rewriting systems, semi-Thue systems, and Petri Nets, are u rewriting systems. The class of u) rewriting systems is defined on axiomatically formulated base structures, called u structures, which are used to formalize the concepts of "terms," "substitutions," and "contexts" that are common to many rewriting systems. The base domains of abstract rewriting systems must often be defined as domains that differ from the domains of the original concrete rewriting systems. The class of u rewriting systems is large enough to include both concrete and abstract rewriting systems. Adoption of the class of u rewriting
400 systems is essential to establish the present theory of termination. 2 2.1
u Structures and u Rewriting Systems u> Structures
In order to formalize the common base structures of term rewriting systems and other rewriting systems, a structure, called an w structure, was introduced [1,2]. An w structure is used to formalize the concepts of "terms," "substitutions," and "contexts" that are common to many rewriting systems. Definition 1 Let Trm, Sub, and Con be arbitrary sets, e an element in Sub, and • an element in Con. Let Dom be a subset of Trm. Let fss be a mapping from Sub x Sub to Sub, fee a mapping from Con x Con to Con, fas a mapping from Trm x Sub to Trm, frc a mapping from Trm x Con to Trm, and fes a mapping from Con x Sub to Con. Then, the eleven-tuple {Trm, Dom, Sub, Con, e, • , fss, fee,
fTS,frc,fcs)
is called an u> structure when it satisfies the following requirements: Dl D2
Vt€Trm:fTS{t,e)
= t,
VteTrm:fTC{t,n)=t,
D3 Vf G Trm,W1,62
G Sub :
/TS(/TS(MI),02)
D4 Vt G r r m , V c i , c 2 G Con : fTc{fTc(t,c1),c2)
= frs{t,
fss(0i,92)),
= /Tc(t,/cc(cic2)),
D5 yt G Trm, Vc G Con, V0 G Sub : fTs{fTc{t,c),8) = fTc{fTs(t,9),fcs{c,6)). Application of the mappings fSs, fee, fas, frc, and fCs is usually denoted in a more readable manner: fss(61,62) is denoted by 6i82, fcc(ci,c2) by cic 2 , fTs(t,0) byt6, fTc{t,c) byte, and fcs(c,0) by c6. Hence, fTe{fTs(t,0),c) is denoted by (td)c. Left associativity is assumed for such notation. For instance, (• • • ((tci)c2) • • • c„), which is the result of successive application of ci, c2, • • •, cn to t G Trm, is denoted by tc\c2 • • • c„. Thus, the five requirements in Definition 1 can be restated as follows: Dl Vt G Trm :te = t, D2 V* €Trm:tn D3 Vi G Trm,V81,02
= t, G Sub : t6xe2 -
t(dxB2),
401
D4 W G Trm,Vci,e 2 G Con :tc\c2 = t(cic2), D5 V* G Trm,Mc G Con,V0 G 5u6 : ic0 = t6(c$). 2.2
Rewriting Systems on u> Structures
A rewriting system R on an w structure £l = (Trm, Dom,Sub,Con,
rule (I, r) G R, denoted by u —4 v, iff there are 9 G Sub and c G Con such that u = Wc and v = r0c. Definition 3 A term v G Dom is immediately reachable from u G Dom by a R
rewriting system R, denoted by u —> v, iff there is a rule (I, r) G R such that u —4 v. Definition 4 A term v G Dom is reachable from u G Dom by a rewriting system R, denoted by u —>•* v, iff Dom includes terms si, S2, • • •, sn (n > 1) such that R U — Si
R
>• S2
R ¥ •• •
¥ Sn
=
V.
The rewriting relation —>•* is the reflexive and transitive closure of—K For an arbitrary u rewriting system R, the set of all pairs (x, y) such that % —K y will be denoted by [K\t i.e., [ti\ = {(x,y)\x-$.y}. The class of all u rewriting systems includes very important systems [1,2] such as term rewriting systems, string rewriting systems, semi-Thue systems, and Petri Nets. 3 3.1
Homomorphism Definition of Homomorphism
The concept of a homomorphism from an u> structure to an ui structure is introduced. Definition 5 Let Q\ and £l2 be w structures: fii = (Trmi, D o m i , 5 « 6 i , C o n : , e i . D i , / S S I , / C C I , / T S I , / T C I , / C S I ) , Q 2 = (Trmi, Dom2,Sub2,Con2,e2,D2, fSS2, feci, frs2,/TC2>/CS2>-
402
Let hx be a mapping from Trmi to Trm2, hs a mapping from Subi to Sub2, and he a mapping from Coriy to Con2. A triple of mappings (hT,hs,hc) is a homomorphism from Cli to Q,2 iff 1. hT(fTCi(fTSi(t,6),c))
=
fTC2(fTS2{hT(t),hs(9)),hc(c))
for all t £ Trmi, 0 6E Sub\, and c £ Con\, 2. hT{Domi)
C Dom2.
Using the notational convention mentioned earlier, the first requirement for a homomorphism is denoted simply by hT(t$c) = hT{t)hs{e)hc(c). In this paper, a triple of mappings {hx,hs,hc) is assumed to be a homomorphism from an w structure Qi = (Trmi, DomltSubi,Coni,ei.Di, fssi, feci,hsi, foci, fesi) to an u> structure Q 2 = {Trm2, Dom2, Sub2,Con2,€2,^2, fss2,fcc2,fos2,foc2,fcs2)Since h? is a mapping from Trmi to Trm2, it can naturally be extended into the following mappings: hr : Trmi x Trmi —> Trm2 x Trm2, (x,y) L
1—>(h T (x),h T (y)),
. WTrmiXTrmi
_.
Wrrm2X.Trm2
S^{(hT(x),hT(y))\(x,y)eS}. Note that, for the sake of simplicity, all these extensions are referred to by the same name hr- In particular, a rewriting system on fii is transformed into a rewriting system on Q,2 by the mapping u
. oTrmixTrmi
.
nTrm2xTrm2
In other words, if R is a rewriting system on fii, then hx{R) is a rewriting system on Q2. 3.2
Relation between Concrete and Abstract Rewriting
Assume that a concrete rewriting system and an abstract rewriting system are in a homomorphic relation (hT,hs,hc), i.e., (hT,hs,hc) is a homomorphism from the concrete rewriting system to the abstract rewriting system. Then, it can be proven that, if x is rewritten into y by the concrete rewriting system R, then hx(x) is rewritten into hx(y) by the abstract rewriting system hr(R), Proposition 1 (See [3]) Let {hx,hs,hc) be a homomorphism from an w rewriting system fii to an u) rewriting system Q,2. Let R be a rewriting system on fli. If x—>y,
then hx{x)
^—± hx(y).
403
4 4-1
Termination of w Rewriting Systems Termination
Let R be an ui rewriting system on fi. A term t in Dom is non-terminating with respect to R iff there is an infinite sequence of terms ti,tz,-- • ,tn,- • • in Dom such that t = 11 and i,- —> t,-+i for all i = 1,2,3, • • •. A term t in Dom is terminating with respect to R iff i is not non-terminating with respect to
R. Let D be a subset of Dom. A set D is terminating with respect to R iff all terms in D are terminating with respect to R. An u rewriting system R is terminating iff Dom is terminating with respect to R. 4-2
Termination Theorem for u> Rewriting Systems
Termination with respect to an u rewriting system R is determined by the set [R]; i.e., t is non-terminating if and only if there is an infinite sequence of terms ti,^,- • • ,tn,- • • in Dom such that t = ti and (tt,i t -+i) £ [R] for all i — 1 , 2 , 3 , - •. Hence, by the homomorphism theorem, the following theorem is obtained. Theorem 1 [Termination Theorem] Let R be an ui rewriting system on Qj. Let (hT,hs,hc) be a homomorphism from an ui rewriting system fii to an u> rewriting system 0,2- Then, R is terminating ifhT(R) is terminating. Proof. Assume that t in Dom\ is non-terminating with respect to R. Then, there is an infinite sequence of terms ti,t2,- • • ,tn,- • • in Dom such that t — ti and ti — • t,-+i for all i = 1,2,3, • • •. By Proposition 1, there is an infinite sequence hxiti), /»T(^2), • • •, kritn), • • • in Dom such that hxit) = hr{ti) a n d hriU) — • ^r(*t+i) f° r a u J = 1>2,3, •••. Hence hx{t) is nonterminating with respect to hT(R). This proves that if t in Dom\ is nonterminating with respect to R, then hr{t) is non-terminating with respect to hx(R). By contraposition, it follows that t in Dom\ is terminating with respect to R if /&T(*) in Dom,2 is terminating with respect to hx(R). Hence, R is terminating if HT{R) is terminating. •
4-3
Example
The coffee bean puzzle [3,6] is formulated by an w rewriting system R = {bb -j-iu, bXw -*- Xb}. A homomorphism consisting of Ay that mapps a string into the number of b
404 and w in the string, one has hT(R) = { 2 -> 1, 2 + X - » 1 + X } . Since h,T{R) includes only decreasing rules, it follows t h a t h-T(R) is terminating. Therefore, by T h e o r e m 1, R is also terminating. 5
Concluding Remarks
This p a p e r proposes a theoretical foundation for proving termination of ui rewriting systems. T h e theory comprises the following elements; two u structures, two u> rewriting systems, two reachability relations on the two w rewriting systems, a homomorphism between two u structures, a homomorphic relation between the two w rewriting systems, and the termination theorem. References 1. K. Akama, Common Structure of Semi-Thue Systems, Petri Nets, and Other Rewriting Systems, Hokkaido University Information Engineering Technical Report, HIER-LI-9407 (1994), revised version in IEICE Trans, of Information and Systems, E80-D (12), pp.1141-1148 (1997). 2. K. Akama, An Axiomatization of a Class of Rewriting Systems, Hokkaido University Information Engineering Technical Report, HIER-LI-9409 (1994). 3. K. Akama, H. Mabuchi, Y. Shigeta, Homomorphism Theorem and Unreachability for Omega Rewriting Systems, in Xiao-Shan Gao and Dongming Wang (Eds.), Computer Mathematics, Proceedings of the 4th Asian Symposium on Computer Mathematics (ASCM2000), Lecture Notes Series on Computing Vol.8, pp.90-99, (2000). 4. B. Buchberger, History find Basic Features of the Critical-Pair / Completion Procedure, / . Symbolic Computation 3, pp.3-38 (1987). 5. P. Cousot and R. Cousot, Abstract Interpretation and Application to Logic Programs, J. Logic Programming, 13 (2&3), pp.103-179 (1992). 6. N. Dershowitz and J. Jouannaud, Rewrite Systems, Handbook of Theoretical Computer Science, Chapter 6, pp.243-320 (1990). 7. R.E. Korf, Planning as Search: A Quantitative Approach, Artificial Intelligence 33, pp.65-88 (1987). 8. E.D. Sacerdoti, Planning in a Hierarchy of Abstraction Spaces, Artificial Intelligence 5, pp.115-135 (1974).
405 SEMANTICS FOR DECLARATIVE DESCRIPTIONS WITH REFERENTIAL CONSTRAINTS
K. A K A M A A N D H. K O I K E A N D T . ISHIKAWA Hokkaido
University, Kita 11, Nishi 5, Kita-ku, Sapporo, 060-0811, E-mail: {akama, koke, ishikawa}@cims.hokudai.ac.jp
Japan
Higher-order relations, such as not and set-oj, are useful for knowledge representation, especially for description of queries to databases. However, it is very difficult to formalize the semantics for correct computation of higher-order relations. In this paper, we introduce a class of constraints, called referential constraints, the meaning of which is related to the meaning of other atoms, and define the semantics of referential constraints. This theory formalizes a general semantics for constraints (simple and referential constraints), based on which we obtain correct computation of many constraints such as not and set-oj constraints and first-order constraints.
1
Introduction
Constraints in the body of a definite clause are used to restrict instantiation of the definite clause [2,5] and are useful for representing already known relations that can not be defined by a finite set of definite clauses. Usual constraints, which will be called simple constraints in this paper, can not, however, represent "higher-order relations" such as not and set-of constraints, the meaning of which is related to the computation results of some queries. In this paper a concept of referential constraints is newly defined as an extension of usual constraints. A referential constraint has, as its arguments, more than one declarative description, which is a set of definite clauses, each of which may contain referential constraints in the body. Semantics of referential constraints will be defined together with referential declarative descriptions. This theory is essential to the correct computation of referential declarative descriptions, which include not and set-of constraints and first-order constraints [3,6]. 2 2.1
Declarative Descriptions Terms, Atoms, and Substitutions
Let K, F, V, and R be mutually disjoint sets. The four-tuple of K, F, V, and R is called an alphabet and denoted by S. Each element in the sets K, F, V, and R is called, respectively, a constant, a function, a variable, and a predicate (on E). All concepts in this paper will be defined on the alphabet
406
E. However, reference to the alphabet E is often omitted for simplicity. We assume that terms, atoms (atomic formulas), and substitutions (on E) are denned as usual [5]. The definition of ground terms, ground atoms, instances of terms, instances of atoms are assumed to be the same as the ones in [5]. An object is either a term or an atom. A ground object is either a ground term or a ground atom. A substitution {ii/
Declarative Descriptions
Declarative descriptions consisting of atoms and constraints are inductively defined as follows. Definition 1 [Declarative Description] 1. A constraint is a (m+l)-tuple
(<j),di,d2, • • • ,dm), where
• m > 0, • <j> is a mapping from G\ x G2 x • • • x Gm to {true, false}, with each d (i = 1,2,3, • • •, m) being identical to either Q, QT, or 2g. • di (i = 1,2,3, • • • ,m) is either an atom if Gi is Q, a term if Gi is QT, and a declarative description ifG{ is 1?. A constraint (cj),di,d2, • • • ,dm) is called a simple constraint iff di,d2,- • • ,dm are all objects (terms and atoms). A constraint (
407
only simple definite clauses; otherwise it is called a referential declarative description. D This is an inductive definition. Firstly, simple constraints are defined by 1. Secondly, simple definite clauses are defined by 2. Thirdly, simple declarative descriptions are obtained by 3. Next, referential constraints that contain simple declarative descriptions are defined by 1. Then, new definite clauses containing these referential constraints are added, and new declarative descriptions containing these new definite clauses are defined. Repeating such definition, all declarative descriptions are determined. The set of all constraints is denoted by Con. The set of all definite clauses is denoted by Del. The set of all declarative descriptions is denoted by Dsc. Let C be a definite clause H <— B\, B2, • • •, B„. H and (B\, B2, • • •, Bn) are respectively called the head and the body of C. The head of C is denoted by head(C). The set of all atoms Bi in the body of C and the set of all constraints Bj in C are denoted, respectively, by atom(C) and con(C). Let con be a constraint (<j),di,d2, • • • ,dm). The set of all objects d,- in {d\,d2, • • • ,dm} is denoted by cobj(con). The set of all declarative descriptions d{ in {d\,d2, • • • ,dm} is denoted by dsc(con). The set of all declarative descriptions that appear at the toplevel of C, i.e., {d I d £ dsc(con),con G con(C)}, is denoted by dsc(C). A constraint {<j>,di,d2, • • • ,dm) is a ground constraint iff each d,- (i = 1,2, •••,m) is either a ground object or a declarative description. The set of all ground constraints is denoted by Gcon. A definite clause consisting of only ground atoms and ground constraints is called a ground definite clause or, more simply, a ground clause. The set of all ground clauses is denoted by Gels. 2.3
Examples of Declarative Descriptions
A simple declarative description, i.e., a declarative description that does not contain referential constraints, is shown. [ Example 1 ] A definition of even relation is given by the following declarative description. Peven = { even(0) <- . even(Y) <-even(X),(fadd2,X,Y). }. faddi is a mapping from QT X QT to {true, false} such that fadd2(t>s) = true
if t and s are numbers and t + 2 = s,
408
fadd2(t,s) = false otherwise. Next, a referential declarative description is shown. The declarative description Peven in the previous example is used in the referential constraint in order to define a predicate odd. [ Example 2 ] The odd predicate is defined by the following declarative description P0dd, which refers to a declarative description Peven defined in Example 1. Podd = { odd(Z) <(fnot,even(Z),Peven),nat(Z). nat(Z)i-(fnat,Z). }. fnot is a mapping from Q x 2s to {true, false} such that fnot{g,G)=true if g<£G, fnot(g, G) = false if g € G. fnat is a mapping from GT to {true, false} such that fnat{g) = true fnat{g) = false 3
if g is a natural number, if g is not a natural number.
Meaning of Declarative Descriptions
3.1
Specialization by Substitutions
Specialization operation by substitutions to constraints, definite clauses, and declarative descriptions is inductively defined as follows. Definition 2 [Specialization] 1. The result of application of a substitution 8 £ S to a constraint c — {<j>,rfi,d2, • • •, dm), denoted by c0, is defined by c6 = (
409 tions is defined by 1. Then, specialization of new definite clauses containing these referential constraints is added, and specialization of new declarative descriptions containing these new definite clauses is defined. Repeating such definition, specialization of all constraints, all definite clauses, and all declarative descriptions is determined. 3.2
Meaning of Definite Clauses and Declarative Descriptions
The meaning of definite clauses and declarative descriptions is inductively defined as follows. Definition 3 [Meaning] /. A mapping val : Gcon —• {true, false} is defined by val{{(f>,di,d2,--- ,dm)) =
and gi — M{di)
if d, G Dsc.
2. A set Tcon, called the set of all true constraints, is defined by Tcon — {con \ con 6 Gcon, val(con) = true}. 3. The meaning A4(C) of a definite clause C is defined by M(C)
d
= {(head{C0), atom(C0)) \ 0 G sub(C), CO e Gels, con(C6) C Tcon},
where sub(C) is the set of all substitutions on the set of all variables in {head(C)} U atom(C) U cobj(C). 4- A mapping Tp : 2^ —> 2^ for a declarative description P is defined as follows. For any set x of ground atoms, TP{x)d=
{head\ C £P, atom C x, (head, atom) G
M(C)}.
5. The meaning M(P) of a declarative description P in Dsc is defined, using the mapping Tp for P, by oo
M(P)d^def \J[TP]n(9). n=l
410
T h e meaning of a declarative description P is computed as follows. Firstly, val(con) is determined for all simple constraints con. Secondly, the set of true simple constraints is determined as a subset of Tcon. Thirdly, the meaning of simple definite clauses is determined. Fourth, Tp is defined for a simple declarative description P. Fifth, the meaning of declarative descriptions P consisting of only simple definite clauses is determined. Next, val(con) is determined for all referential constraints con t h a t include simple declarative descriptions P. Repeating such operations, the meaning of all declarative descriptions is determined. 4
Concluding Remarks
In order to develop a theoretical foundation for knowledge representation and correct computation with higher-order relations, referential constraints are defined as an extension of usual constraints. Semantics of referential constraints has been defined by assigning a set of ground atoms for each declarative description t h a t may contain simple and referential constraints. Then, correct computation for referential constraints is immediately determined by the equivalent transformation paradigm [1], i.e., declarative descriptions with referential constraints are transformed equivalently preserving their meaning. Based on the theory, we have an interpreter (named E T I [4]), which enables us to correctly compute not, set-of, and first-order constraints. References 1. K. Akama, Y. Shigeta and E. Miyamoto, A Framework of Problem Solving by Equivalent Transformation of Logic Program, J. Japan Soc. Artif. Intell., Vol.12, No.2, pp.90-99 (1997). 2. J. JafFar and J. L. Lassez, Constraint Logic Programming, Technical Report, Department of Computer Science, Monash University, June 1986. 3. H. Koike, K. Akama and H. Mabuchi, Multi-Computation Mechanism for Set Expressions, International Conference on Computing and Information Technologies (ICCIT 2001), (to appear 2001). 4. H. Koike, K. Akama and H. Mabuchi, Equivalent Transformation Language Interpreter ETI, 5th IEEE International Conference on Intelligent Engineering Systems 2001 (INES 2001), (to appear 2001). 5. J.W. Lloyd, Foundations of Logic Programming, Second edition, SpringerVerlag, 1987. 6. T. Yoshida, K. Akama and E. Miyamoto, Program Synthesis from First-order Expressions for Problem Solving in the String Domain, Trans. Information Processing Society, Vol.41, No.SIG 7 (TOM 3), pp.12-22 (2000).
411
SOLVING LOGICAL P R O B L E M S B Y EQUIVALENT T R A N S F O R M A T I O N K. AKAMA AND H. KOIKE Hokkaido
University, Kita 11, Nishi 5, Kita-ku, Sapporo, 060-0811, E-mail: {akama,koke}Qcims.hokudai.ac.jp
Japan
Y. SHIGETA Toshiba
Corporation,
580-1, Horikawa-cho, Saiwai-ku, Kawasaki, E-mail: [email protected]
212-8520,
Japan
H. MABUCHI Iwate Prefectural
University, 152-52 Sugo, Takizawa, Iwate, E-mail: [email protected]
020-0173,
Japan
In logic programming, computation is regarded as inference. In this paper, we propose a new method to solve logical problems by equivalent transformation and develop a theoretical foundation for the correctness of the method. Given a logic program P and a query q, A logical problem (P, q) is formalized as finding the set L(P, q) of all ground instances g of q such that P \= g. The set L(P, q) is represented by t h e declarative semantics of a logic program P' that is produced from P and q. The logical problem (P, q) is solved by transforming P' equivalently into a simpler form, preserving its declarative semantics and utilizing many transformation rules. Inferential (resolution-based) problem solving can be regarded as a special case of the proposed method.
1
Introduction
In logic programming, computation is regarded as inference [4]. Given a logic program P and a query q, Prolog finds substitutions 9 such that P |= q9 by using inference (SLD-resolution). It is widely believed that inference is the unique and best way to solve logical problems. Many extensions of logic programming based on logical inference have also been developed. In this paper, however, we propose a new method to solve logical problems by equivalent transformation, without sticking to inference, and develop a theoretical foundation for the correctness of the method. A logical problem is reformulated as finding the set L(P,q) of all ground instances g of q such that P (= g. The set L(P, q) is represented as a function of the declarative semantics of a logic program P' that is produced from P and q. P' is transformed equivalently into a simpler form, preserving its declarative semantics and utilizing many transformation rules. From the simplified P', the solution of (P, q) is obtained. This method provides a more general class of computation than the
412
logical inference in logic programming does in the sense that any computation by SLD-resolution can be obtained by equivalent transformation and there is more efficient computation by equivalent transformation that is not obtained by SLD-resolution.
2 2.1
Logical Problems Logic Programs
Let A be an alphabet for the predicate logic. Let A be the set of all atoms on A, Q the set of all ground atoms (atoms that do not include variables) on A, and S the set of all substitutions on A. An instance of an atom is an atom obtained by application of a substitution to the atom. A ground instance of an atom is a ground atom that is an instance of the atom. The set of all ground instances of an atom a is denoted by rep(a). A definite clause on A is a formula of the form H «— B\, • • •, Bn (n > 0), where H, Bi, • • •, Bn are elements in A. H and ( 5 i , • • •, Bn) are called the head and the body of the definite clause, respectively. The head of a clause C is denoted by head(C), and the set of all atoms in the body of a clause C is denoted by body(C). Atoms that occur in the body of a definite clause are called body atoms. A definite clause consisting of only ground atoms is called a ground clause. An instance of a definite clause is a definite clause obtained by application of a substitution to all atoms in the definite clause. A ground instance of a definite clause is a ground definite clause that is an instance of the definite clause. A logic program on A is a set of definite clauses on A. A logic program is often called simply a program in this paper. The set of all definite clauses on A and the set of all logic programs on A are denoted by Dclause(A) and Program(A), respectively.
2.2
Interpretation and Model
An interpretation / on A is a subset of Q. A ground clause C is true with respect to an interpretation / iff head(C) 6 / or body(C) <£. I. An interpretation / is a model of a definite clause C iff all ground instances of C are true with respect to / . An interpretation / is a model of a program P iff / is a model of all definite clauses in P.
413
2.3
Logical Consequence
A set Ei of definite clauses is a logical consequence of a set E\ of definite clauses [E\ |= i?2) iff any model of E\ is a model of Ei- A definite clause C is a logical consequence of a set E of definite clauses (E \= C) iff any model of E is a model of C.
2-4
Logical Formalization of Problems
In Prolog, a problem to be solved is specified by a pair of a program P and an atom (called a query) q. To find all substitutions 6 that satisfy P |= (q6 <—) is the aim of the problem °. Computation in Prolog is regarded as solving these problems by "reductio ad absurdum", and is formalized as SLD-resolution. Assume that substitutions 9\, $2, • • •, 0m are obtained by SLD-resolution. The soundness and completeness theorem of SLD-resolution guarantees that the set of these substitutions 0\, 62, • • •, #m is a correct answer to the query q with respect to P in the sense that, the set Ui 6 { l i 2 ,...,m}{ftp|3/9G5}, i.e., the set of all substitutions that are more specific than one of 9\, 82, • • •, 6m, is identical to the set {0\P\=(q9<-)}, i.e., the set of all substitutions 0 that satisfy P (= (q0 <—). In this paper, however, a solution to be found by logic programming is formulated not as a set of substitutions but as a set of ground atoms. More precisely, we introduce the following definition. A pair (P, q) of a logic program P on A and a query q € A is called a logical problem, which requires finding L{P, q) = {g\P\=(g <-), g £ rep(q)}, which is a subset of Q. L(P,q) is called a solution set of the logical problem (P,q). When a substitution 0 is obtained by SLD-resolution, let 6 be regarded as a representative of the set rep(q6), i.e., the set of all ground instances of q6, and consider that all elements of rep(q$) are obtained. Then, the soundness and completeness theorems of SLD-resolution guarantee to compute L(P,q) correctly.
°(a) In the sequel, only definite clauses will be used for logical formulas. Thus, P |= V(q6) in the conventional theory is denoted by P \= (q6 <—).
414
3 3.1
Transformation of Logical Problems Introduction of New Predicates
Let P be a logic program on A, and q an atom on A. Consider a new logic program P ' = P U {4>{q) <— q}, where
Basic Propositions
Three propositions are given for investigating in Section 3.3 the relation between a logic program P on A and a logic program P U {
CM'.
is a model of {
Proposition 2 Let M be any model of P U {4>{q) <— }. If M = M C\Q and M' = M C\Q', then M is a subset ofQ and a model of P, and M' is a subset ofQ' that satisfies <j>{M D rep(q)) C M'. Proposition 3 Let M (C Q) be any model of P, and M' (C Q') a set that satisfies
Transformation of Logical Problems
From Proposition 1, 2, 3, and 4, the next theorem is obtained.
415
T h e o r e m 1 The next two conditions are equivalent. (1) P\=(g<-),gerep(q). (2)PU{
{g\PU{
Representation of Logical Problems using Declarative Semantics Minimal Model
Let P be a logic program. A subset MM(P) of Q is a minimal model of P iff MM{P) is a model of P and MM(P) C M for all model M of P. The following theorems regarding the minimal model of a logic program are well known [4]. T h e o r e m 3 Any logic program P has a minimal model. The minimal model of P is the intersection of all models of P. When a logical consequence is a unit clause, the relation of logical consequence can be represented by the inclusion relation of two sets. T h e o r e m 4 P \= (a <-) «=>• MM(P) D rep(a). 4-2
Declarative Semantics of Logic Programs
Declarative semantics of a logic program will be defined. Firstly, a mapping Tp : 2 e —> 2 e is defined for a logic program P on A. Definition 1 ( M a p p i n g Tp) A mapping Tp : 2 e —>• 2 e for a logic program P on A is defined by TP{I) d= {head(C6) \CeP, 0eS, CO e Gclause(A), body{CB) C J } for each ICQ, where Gclause(A) is the set of all ground clauses on A. Declarative semantics of a logic program P is defined by using the mapping TP. Definition 2 (Declarative semantics of a Program) Let P be a program on A. Declarative semantics of a program P, denoted by M(P), is
416
defined by oo
M(p)^{j[TPn®), n= l
where 0 denotes the empty set. It is already known [4] that declarative semantics of a program P is identical to the minimal model of P. Theorem 5 M(P) = MM(P). 4-3
Representation of the Solution Set of a Logical Problem by using Declarative Semantics
Theorem 6 The solution set L(P,q) of a logical problem (P,q) is equal to
Solving Logical Problems by Equivalent Transformation
5.1
Method of Solving Logical Problems by Equivalent
Transformation
From Theorem 6, the solution set L(P,q) of a logical problem (P,q) can be computed by finding M{PU {
417 4. Obtain t h e solution set L(P,q)
of t h e logical problem (P, q) by
L(P,
Conclusion
A theoretical foundation of solving logical problems by equivalent transformation is developed. Many problems, including the kind of problems solved by Prolog, can be formalized and solved by using this method. In this method, computation is correct as long as all the rules are correct. Various rules are available for correct computation by equivalent transformation, while only definite-clause rules are used in logic programming. Hence, the proposed m e t h o d allows various computation p a t h s compared with Prolog, which is one of the key points for more efficient computation [1]. We have implemented an interpreter called E T I and a compiler called E T C for programming based on equivalent transformation [3]. Using these systems, experiments on integrated processing of syntactic and semantic analysis of n a t u r a l languages [2] and development of automatic generation of programs from specifications [5] have been carried out. References 1. K. Akama, Y. Shigeta and E. Miyamoto, A Framework of Problem Solving by Equivalent Transformation of Logic Program, J. Japan Soc. Artif. Intell., Vol.12, No.2, pp.90-99 (1997). 2. M. Hatayama, K. Akama and E. Miyamoto, Improvement of knowledge processing systems by addition of Equivalent Transformation Rules, J. Japan Soc. Artif. Intell., Vol.12, No.6, pp.861-869 (1997). 3. H. Koike, K. Akama and H. Mabuchi, Equivalent Transformation Language Interpreter ETI, 5th IEEE International Conference on Intelligent Engineering Systems 2001 (INES 2001), (to appear 2001) 4. J.W. Lloyd, Foundations of Logic Programming, Second edition, SpringerVerlag, 1987. 5. T. Yoshida, K. Akama and E. Miyamoto, Program Synthesis from First-order Expressions for Problem Solving in the String Domain, Trans. Information Processing Society, Vol.41, No.SIG 7 (TOM 3), pp.12-22 ! ! (2000)
419 DECIDING THE HALTING PROBLEM AND PRELIMINARY APPLICATIONS TO EVOLUTIONARY HARDWARE, AND HYBRID TECHNOLOGY A.A. ODUSANYA Biomedical Computing Research Group (BIOCORE) School of Mathematical and Information Sciences, Coventry University, Priory Street Coventry, CV1 5FB, UK E-mail: [email protected] The halting problem is a historical problem with computers, this paper puts forward an abstract computational machine that would decide the halting problem, and thus improves the quest for aspects of artificial intelligence, in this case, automated verification of computational devices as in evolutionary computation, and the inherent ability for computers to construct other computer hybrids.
1
Introduction
Turing in 1936 had indicated earlier on that the halting problem was not decidable, in this paper I present a conclusive proof that shows why the halting problem is decidable. 2
Why the halting problem is decidable
The rhetorical question as was put forward largely by the founders of computer science can be rephrased as what precisely can a computing machine do logically, and what can't it do? In other words given a task, is it computable (Turing [1, 2] Godel [3], Hofstadter [4], Hopcroft and Ullman [5], and Chaitin [6])? The halting problem is one instance of the class of undecidable problems (problems that can't be solved by any mechanical procedure). Theorem 1. A (computational) machine can decide the halting problem in finite time, once its steps are accomplished in finite time exactly equal to zero. Proof. The halting problem states that a Turing machine determines whether another Turing machine will halt, once started: and according to Turing [1] (see also Hopcroft and Ullman [5]), intuitively this however is the case when we have atomic steps, that complete in some finite time, no matter how small. Following the example of Turing himself who proposed the abstract one step machine, which later was actualized into serial and parallel computers, we can propose a machine (say a zero-duration) that has each step achieved in zero time. This machine is equivalent
420 to the formally defined Turing machine, except that each transition function §(•) in a Turing machine that runs in some finite time t >= 0, the zero-duration machine would have the same transition function 8(») run in exactly time t = 0. The feasibility of this machine is sound as further discussed. This machine would be able to simulate to completion any Turing machine in exactly zero-time. And thus the halting problem is solvable, since a zero-step machine necessarily halts no matter how many non-empty zero-duration steps are taken. Corollary 1. An instance of this sort of machine is a Language machine. A plausible example of a zero-step machine is a language. In principle a language can be classified as a computational device, given the necessary variations that exist and thus allow distinction and combination, as the symbols. More formally, a language is a computational device of equal power to a Turing machine, and it is since we can represent a Turing machine as a language (see Hopcroft and Ullman [5], for an example of a Turing machine defined as a language for simulation), and a language as a Turing machine (a Turing machine accepting a given a language is representative of that language), then it implies that all languages are equivalent according to the Church's thesis (Hopcroft and Ullman [5]). And thus undecidable languages are the same as decidable languages. And thus the halting problem is indeed decidable (solvable by a mechanical process).
3 3.1
A few consequences of the halting problem in practice The Application to Evolutionary Hardware
There exist examples of evolutionary hardware (Sipper and Ronald [7]) as a further corollary to evolutionary software algorithms. The fitness function or their equivalents in evolutionary hardware, according to the conventional wisdom behind the halting problem, would never be able to adequately verify an offspring. However a window of opportunity for computational hardware successfully verifying subsequent offspring computational hardware exists based on the preceding discussion for example. 3.2
A Universal Theory of Hybrid Technology
The question can a computer decisively construct a hybrid at random that solves a given problem in decidability. It is sufficient to have a machine arbitrarily juxtapose two given machines together in a way that is syntactically correct (a hybrid), and then determine if the resulting hybrid accomplishes a required task. The first part of the juxtaposition is trivial, the second part is a problem in verification: would the hybrid return proper outputs for proper inputs, and would the hybrid halt after every
421 run. (If the hybrid fails the verification test, then another random hybrid is generated.) Theorem 2. (The universal theory of hybrid technology) A machine that solves the halting problem can trivially solve any hybrid problem in any way possible. Proof. The problem can be represented as some languages, and thus is decidable. 4
Conclusion
The physical/chemical elements that would enable the construction of this type of machine has not been discovered to the best reckoning of this writer, but the considerations put forward in this paper ought to highlight the feasibility of improved computers. References 1.
2.
3. 4.
5.
6. 7.
Turing A. M., On computable numbers with an application to the entscheidungsproblem, Proceedings of the London Mathematical Society 2, (1936) pp. 230-265. Turing A. M., On computable numbers with an application to the entscheidungsproblem. A correction, Proceedings of the London Mathematical Society 2, (1937) pp. 544-546. G5del K. On Formally Undecidable Propositions, (Basic Books, New York, 1962). Hofstadter D. R., Godel, Escher, Bach: An Eternal Golden Braid. A Metaphorical Fugue on Minds and Machines in the Spirit of Lewis Carroll, (Penguin Books, Singapore, 1979). Hopcroft J. E. and Ullman J. D., Introduction to automata theory, languages and computation, (Addison-Wesley Publishing Company, Inc., Philippines, 1979). Chaitin G. J., A century of controversy over the foundation of mathematics, Complexity 5, (2000) pp. 12-21. Sipper M. and Ronald E. M. A., A new species of hardware, IEEE Spectrum 37, (2000) pp. 59-64.
423
A N E W A L G O R I T H M FOR T H E C O M P U T A T I O N OF I N V A R I A N T CURVES U S I N G A R C - L E N G T H PARAMETERIZATION K. D. EDOH Department of Computer Science, Montclair State Upper Montclair NJ 07003, USA E-mail: edohk@mail. montclair. edu
University,
J. LORENZ Department of Mathematics and Statistics, University of New Mexico Albuquerque NM 87131, USA E-mail: [email protected] In this paper we introduce a new algorithm for computing invariant curves of a family of dynamical systems using arc-length parameterization of the curves. The main feature is that no smoothness (differentiability) of the curves is required. The algorithm is fast and robust; the results compare favorably with those of existing methods.
1
Introduction
A manifold M is said to be invariant under the diffeomorphism / if for any xo € M we have fn{x0) € M for all integers n. Finding efficient algorithms to compute invariant manifolds in dynamical systems has been a major research topic in recent years. We present a new method to compute invariant curves using arc-length parameterization and performing a sequence of iterations on a set of equally distributed grid points on the curve. In each iteration the grid points are first mapped by / to new points and then a cubic spline interpolation is used to approximate the curve that passes through the new points. Second, a set of equally distributed points on the new curve is determined using the spline interpolation. In addition, we introduce a simple adaptive strategy in which points are added to/removed from the circle to increase the accuracy of our results. Some of the existing methods include the Hadamard graph transform algorithm for computing attracting invariant manifolds [1,2]. In this method computing the invariant curves of Poincare maps requires solving finitely many boundary value problems for each graph transform. This may result in a very large computation time. The direct iteration method used to compute invariant curves in [3] often breaks down when the curve has a fixed point on it. In that case the approximating curve can collapse to the fixed point.
424
Other methods include the polygonal approach by van Veldhuizen [4]. It uses a polar coordinates representation of the invariant circle and may suffer from the unbounded norm of higher order interpolation schemes. The new algorithm was tested on the Van-der-Pol equation and the delayed logistic map. Our results compare favorably with the existing ones. Our method has resulted in a simple, fast, and robust algorithm. With the arc-length parameterization of the curves we eliminate the difficulties and restrictions posed by using polar coordinates. The adaptive scheme introduced into the algorithm has increased the accuracy of our results. As parameters in the diffeomorphism change, the invariant curves typically deform and they may lose their smoothness. This breakdown of smoothness often corresponds to a transition from quasiperiodic to chaotic dynamics. Since this bifurcation (transition) is of great interest for applications, one wants to have an algorithm that can approximate invariant curves that are not smooth. The algorithm presented here does not make use of any tangent information of the invariant curves and is therefore well-suited for path following of non-smooth curves. A restriction of the algorithm is that it requires attractivity of the invariant curves. 2
The N e w M e t h o d
Consider an orientation preserving diffeomorphism f : R2 —> R2, and let r c f l 2 denote a simply closed, continuous curve which is invariant under / , i.e., /(T) = r . Suppose that an approximation to the unknown invariant curve T is denoted by the parameterized curve T*:(X(l),Y(l)),
0<1
(1)
where L is the length of T* and the parameter I is approximately arc-length. (This assumes that T* is piecewise C 1 ; in fact, we use periodic cubic spline interpolation to determine the parameterization functions X(l) and Y(l).) Given an equidistant mesh in the parameter interval, U:0 = lo
= L,
(2)
the discrete points (X(li),Y(li)),i = 0,...,N, form an approximation to T which is also approximately equidistant. We now describe an iteration step that can be used to improve the approximation. Denote pi = (X°ld(k), Yold{k)), i = 0,2,...,N, with p0 = pN- Here old (X (l), Yold(l)) denotes a parameterization of a known approximation to T: r o W : (Xold(l),Yold(l)),
0
425
%, %
Figure 1. The mesh points along the invariant curve.
We get the new approximation Vnew
. (XneW(l),Ynew(l)),
0
as follows:
1. Compute the points q± = f(pi) and determine the distances di = \qi - qi-i\,
i=
l,...,N.
2. Compute the values N
Lnew = Y,du i=l
AL = Lnew/N, k = iAL.
and i-0,...,N
3. Modify the (equidistant) mesh U in the parameter interval using the following rules: • If \qk — qk-i | > 10AL for any fc = 1 , . . . , N then add the mesh point (h + Zfc_i)/2, modify N, and goto step 1. • If \qu — qk-i\ < 1/10AL for any k = 1 , . . . , N then delete the mesh point Ik or lk-i, modify N, and goto step 1.
426
Note that the final mesh Z* is equidistant in the parameter interval. 4. Compute cubic interpolants (Xnew(l),Ynew(l)) for the points qk using the mesh m* = J^=i fy, i.e., the mesh m* reflects that the points qu axe not equidistant. 5. Compute new pointsp t = (Xnew(h),Ynew(li)), ating at the equidistant mesh h.
i = 0,...,N,by
evalu-
The points pi form a new, approximately equidistant, discrete approximation to the unknown curve F, and the whole process can be repeated etc. The following criteria are used to determine when to stop the iteration. Here ej,2 are specified tolerances. Criterion 1. Require \Lnew - Lold\ < ex . Criterion 2. Require maxi=o,...,Ndist(Tnew,qi) < ti . If both criteria are satisfied then stop the iteration. To (approximately) determine the distance dist(Tnew, qi), one can proceed as follows. a) Find the points pk andp*+i which are closest to q^. b) Determine a point along the line through pk and Pk+i that is closest to qi. Though there is no guarantee that this leads to the actual distance of q^ from rnew, we found this to be satisfactory in practice.
Figure 2. The approximation of the shortest distance from the curve.
Step a) is easy to compute. Step b) is determined as follows: Let Pk = (pi,Pk),
Pk+i = (pl+i,Pk+i)
and tt = (?{>«?)•
Denote s
= (ri+i -PkiPk+i ~Pk)
and 6 = (q\ -p\,q?
-p\)
427
a =
< a,b> —Ti2—
ds = b — aa = Then \ds\ = (dsf + dsl)i Pk+i3
(dsi,ds2)
is the distance of qt from the line through pk and
Results
The method was tested on two problems; cubic spline interpolation was used to interpolate the points qk = f(Pk)3.1
Delayed logistic map
This is a population model that has been commonly used as a test problem. The equation of this model is given by Pn+1 = aPn(l - P B _!)
(3)
where Pn is the scaled population size in the nth generation of a species and a is a parameter reflecting the growth rate of the species. Equation (3) can be rewritten in the form Fa(xn,yn)
= (i„+i,2/ n +i) = (yn,ayn(l
- xn))
(4)
The diffeomorphism i ^ has the fixed points (x*, y*) = (0,0) and ^ ^ ( 1 , 1 ) . The point ^ ^ ( 1 , 1 ) loses stability when a increases from a < 2 to a > 2, and an invariant curve is born through a Neimark-Sacker bifurcation at a = 2. The invariant curves were computed using continuation in a from a = 2.0 to a = 2.25 with N = 400 points. The results are comparable to those of [1,4]. 3.2
Van-der-Pol
equation
The second problem is the Van-der-Pol oscillator. It is used to model electric circuits with a triode valve and also to model some biological problems. We considered the forced oscillator with periodic forcing. The equation of the forced system is given by x + a(x2 -l)x
+ x = 0cos(ut),
a, /? C R .
(5)
Under the transformations [5] p(x) = x3/3 -x,
y = x + ap(x),
(6)
428
Figure 3. The invariant curves for the delayed logistic map.
we obtain the system x = y- ap(x)
,?.
y = —x + Pcos{<jjt).
*• '
This equation has the form i = f(z,t) (8) where z = (x, y) and f{z,t) is periodic with period T = 2-K/U. Using the new method, we follow the invariant circle of the corresponding Poincare map to the parameter value where it collapses into a fixed point. Let K = P/2a and a = (1 — ui2)/a. Figure 4 shows the invariant curves for ft values 0.38 - 0.3925 with a = 0.55 and a = 0.4. The results are in agreement with those of van Veldhuizen [1,6]. Acknowledgments Research on this project has been supported by DOE grant DE-FG0395ER25235.
429
Figure 4. The invariant circles for van der Pol Oscillator.
References 1. K. Edoh, A numerical algorithm for the computation of invariant circles, DIM ACS series in discrete mathematics and theoretical computer science, 34 117 (1997). 2. N. Fenichel, Persistence and smoothness of invariant manifolds for flows, Indiana Univ. Math. J., 21 193 (1971). 3. D.G. Aronson, M.A. Chory, G.R. Hall and R.P. McGehee, Bifurcation from an invariant circle for two parameter families of maps of the plane: a computer-assisted study, Comm. Math. Phys. 83 303 (1982). 4. M. Van Veldhuizen, Convergence results for invariant curve algorithms, Math. Comp. 5 1 , 677 (1987). 5. J. Guckenheimer and P. Holmes, Nonlinear Oscillations, Dynamical Systems and Bifurcation of Vector Fields, Springer- Verlag, New York (1983). 6. M. Van Veldhuizen, A new algorithm for the numerical approximation of an invariant curve, SIAM J Sci. Stat. Comp. 8 951 (1987).
AI/Fuzzy Sets Application and Theory
433 C O M P A R I S O N OF I N T E R V A L - V A L U E D FUZZY SETS, INTUITIONISTIC FUZZY SETS, A N D BIPOLAR-VALUED FUZZY SETS KEON-MYUNG LEE Dept. of Computer Science, Chungbuk National University, and Advanced Information Technology Research CenteriAITrc), Cheongju, 361-763, Korea E-mail: [email protected] KYUNG-MI LEE Dept. of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan KRZYSZTOFJ.CIOS Dept. of Computer Science and Engineering, University of Colorado at Denver, Denver, Colorado, USA There are several kinds of fuzzy set extensions in the fuzzy set theory. Among them, this paper is concerned with interval-valued fuzzy sets, intuitionistic fuzzy sets, and bipolarvalued fuzzy sets. In interval-valued fuzzy sets, membership degrees are represented by an interval value that reflects the uncertainty in assigning membership degrees. In intuitionistic sets, membership degrees are described with a pair of a membership degree and a nonmembership degree. In bipolar-valued fuzzy sets, membership degrees are specified by the satisfaction degrees to a constraint and its counter-constraint. This paper investigates the similarities and differences among these fuzzy set representations.
1
Introduction
Fuzzy sets are a kind of useful mathematical structure to represent a collection of objects whose boundary is vague. In fuzzy sets, membership degrees indicate the degree of belongingness of elements to the collection or the degree of satisfaction of elements to the property corresponding to the collection. There have been proposed several kinds of extensions for fuzzy sets.[l] Type 2 fuzzy sets represent membership degrees with fuzzy sets. L-fuzzy sets are a kind of fuzzy set extension to enlarge the range of membership degree [0,1] into a lattice structure. Interval-valued fuzzy sets represent the membership degree with interval values to reflect the uncertainty in assigning membership degrees. [6] Intuitionistic fuzzy sets have membership degrees that are a pair of membership degree and nonmembership degree. [2] Bipolar-valued fuzzy sets have membership degrees that represent the degree of satisfaction to the property corresponding to a fuzzy set and its counter-property. [4] In this study, we are concerned with these three fuzzy set extensions: interval-valued fuzzy sets, intuitionistic fuzzy sets, and bipolar-valued fuzzy sets. These fuzzy sets have some similarities and some differences in their representation and semantics.
434 This paper is organized as follows: Section 2, 3, and 4 briefly describe the interval-valued fuzzy sets, the intuitionistic fuzzy sets, and the bipolar-valued fuzzy sets, respectively. Section 5 compares the interval-valued fuzzy sets with intuitionistic fuzzy sets, and Section 6 compares intuitionistic fuzzy sets with bipolar-valued fuzzy sets. Section 7 gives some examples to use these fuzzy set representations. Finally Section 8 draws conclusions.
2
Interval-valued Fuzzy Sets
Interval-valued fuzzy sets are an extension of fuzzy sets, where membership degrees of elements can be intervals of real numbers in [0,1]. An interval-valued fuzzy set A is formally defined by membership functions of the form A = {(*, fiA(x)) \xeX] M A W : X - * P([0,1]), where M-tW ' s a closed interval in [0,1] for each x e X.[6] Suppose that A and B are interval-valued fuzzy sets whose membership degrees of elements x are represented like this: HA(x) = [n'A (x),nrA(x)] nB (*) = t / 4 (*)> Ms (*)] The basic set operations for interval-valued fuzzy sets are defined as follows: AuB = [(x,nAuB(*))Ixe X} nAuB(x) = [fi'AKjB(X),/i^B(x)] firAuB(x) =
[i'A„B(x) = max.{iiA(x),nB(x)) AnB
= {(x,/iAnB(JC))Ixe
X}
nAnB(x)
HAnB(x) = min{nA(x),n'B(x)} A=i(x,fiA-(xy)\xe
X] r
n'A(x) = l-v Ax)
n-{x)
=
[fi'AnB(x),fiAnB(*)]
^AnB(x) =
max{nA(x\nB(x)}
=
min[nA{x),fia(x)}
r
[n'-(x),fi A(x)]
^ W = 1"MMW
In interval-valued fuzzy sets, interval values are used as membership degrees in order to express some uncertainties in assigning membership degrees. The larger the interval is, the more uncertainty there are in assigning membership degrees.
3
Intuitionistic Fuzzy Sets
The intuitionistic fuzzy set theory is an extension of the fuzzy set theory by Atanassov[2]. Here we give some basic definitions for the intuitionistic fuzzy sets. Let a set X be the universe of discourse. An intuitionistic fuzzy set A in X is an object having the form A = {(x,fiA(x),vA(x))\xe X},
435 where the functions jiA(x) : X -> [0,1] and vA(x) : X -> [0,1] define the degree of membership and the degree of non-membership respectively of the element x e X to the set A, which is a subset of X, and for every x e X, 0
The amount rcA(x) - 1 - (/iAW + vA(x)) is called the hesitation part or intuitionistic index, which may cater to either membership degree or nonmembership degree. It means that the intuitionistic fuzzy sets are a representation to express the uncertainty in assigning membership degrees to elements. If A and B are two intuitionistic fuzzy sets on the set X, their basic set operations are defined as follows[2]: A u B = {(x, nAuB (x),vAuB (x)) I x e X} AnB
HAUBW = max{nA(x),liB(x)} = {(x,nAnB (x),vAnB (x)) \xeX) MAr,B(x) = min{fiA(x),fiB(x)}
vAuB(x) =
mm{vA(x),vB(x)}
vAnB(x) =
max{vA(x),vB(x)}
A={(x,nA(x),vA(x))\xeX} VA(x)=vA(x) 4
v-(x) = [iA(x)
Bipolar-valued Fuzzy Sets
Bipolar-valued fuzzy sets are an extension of fuzzy sets whose membership degree range is enlarged from the interval [0, 1] to [-1,1]. In a bipolar-valued fuzzy set, the membership degree 0 means that elements are irrelevant to the corresponding property, the membership degrees on (0,1] indicate that elements somewhat satisfy the property, and the membership degrees on [—1,0) indicate that elements somewhat satisfy the implicit counter-property. [4] In bipolar-valued fuzzy sets, two kinds of representation are used: canonical representation and reduced representation. In the canonical representation, membership degrees are expressed with a pair of a positive membership value and a negative membership value. That is, the membership degrees are divided into two parts: positive part in [0, 1] and negative part in [-1, 0]. In the reduced representation, membership degrees are presented with a value in [—1, 1]. The following gives the definitions for those representation methods. Let X be the universe of discourse. The canonical representation of a bipolar-valued fuzzy set A on the domain X has the following shape: A = {(x,(fiA(x\n^(x)))\xe P
X) N
H A(x): X->|ft 1] ft A (x): X-» [-1,0] The positive membership degree HA(x) denotes the satisfaction degree of an element x to the property corresponding to a bipolar-valued fuzzy set A, and the negative
436 membership degree fiA (x) denotes the satisfaction degree of x to some implicit counter-property of A. If ^ ' ( x ^ O a n d [iA(x) = 0, it is the situation that x is regarded as having only positive satisfaction for A. If \xpA (x) = 0 and /xA (x) * 0, it is the situation that x does not satisfy the property of A but somewhat satisfies the counter-property of A. In the canonical representation, it is possible for elements x to be fipA (x) * 0 and \iA (x) * 0 when the membership function of the property overlaps that of its counter-property over some portion of the domain. The reduced representation of a bipolar-valued fuzzy set A on the domain X has the following shape: A = {(x,fiRA(x))\xeX]
AI*:X-»[-1,1]
The membership degree/if (x) for the reduced representation can be derived from its canonical representation as follows:
MAW =
P ifH =0 y H-AA(x) '
tf{x) *A W f(/J.A(x),^lA(x))
otherwise
Here f(nA(x),/j.A(x)) is an aggregation function to merge a pair of positive and negative membership values into a value. Such aggregation functions f(lip(x),iJ.A(x)) can be defined in various ways. The choice of the aggregation function may depend on the application domains. [4] Suppose that there are two bipolar-valued fuzzy sets A and B expressed in the canonical representation as follows: A = Hx,(jifo),nZ{x)))\ XBX) B = {(x,(nPB(x),^(x)))\ xe X] The set operations for bipolar-valued fuzzy sets are defined as follows: A u B = {(x, fiAuB (x)) I x e X}
/iAuB(x) = (ppwB(x),
fi^B(x))
PAUB(*) = ma x{j"£W>VB (*)) f^Aua(x) = minf/i A (x),HB (x)} AnB
= {{x,nAnB(x))\xe
X]
HAnB(.x) = mm{nP(.x),(iPB(x)} A = {(x,fij(x)) I xe X} A*£(x) = l - j u ; ( x )
5
tJ.AnB(x)^(nPnB(x),^nB(x)) fiAnB(x) =
max{^(x),fiB(x)}
n-(x) = 0 i | ( x ) , / i f (x)) /i£(x) = - l - / i ? ( x )
Comparison of Interval-valued Fuzzy Sets with Intuitionistic Fuzzy Sets
Intuitionistic fuzzy sets can be regarded as another expression for interval-valued fuzzy sets. According to this interpretation, we can convert an intuitionistic fuzzy set into an interval-valued fuzzy set as follows:
437
Intuitionistic fuzzy sets A = {(x, fiA(x), vA(x)) I x s X}, Interval valued fuzzy sets A = {(x,[nA(x),nrA(x)])\xe X} where, fi'A(x) = )iA(x) firA(x) = 1 -v A (x) From the correspondence between boundary values of interval membership degrees in interval-valued fuzzy sets and the pairs of membership and nonmembership degrees in intuitionistic fuzzy sets, we can deduce that the basic set operations for interval-valued fuzzy sets and intuitionistic fuzzy sets have the same roles. To begin with, let us see the case of union operations. AUB
= {(X,[^'AUB(X),HAWB(X)])\XB
X)
HAwB(x) = mzK{n'A(x),n'B (x)} liAuB (x) = m a x { ^ (x),jUB (x)} The lower bound HAuB(x) = m&x.{n'A(x), fi'B(x)} of interval-valued fuzzy set union can be transformed by the correspondence relationship fi'A(x) = nA(x) like this: fiAKjB(x) = max{n'A(x),LiB(x)} = max{nA(x),HB(x)} = l*AuB(x) This is the same with the union fiAwB (x) of the intuitionistic fuzzy sets. The upper bound /J.AKJB(X) = max{/i^(x),/*g(x)}can be transformed by the relationship lxrA (x) = 1 - vA (x) as follows: ^B(x) = m&x{nA{x),ixB(x)} = max{l-v/,(x),l-vB(x)} =\-mm{vA(x),vB(x)} When we rewrite the above equation using the relationship /nA(x) =l-vA(x), we can see that the upper bound of the union operation of interval-valued fuzzy sets corresponds to the nonmembership degree vAuB(x) =min{v /1 (x),v B (x)}. It means that both union operations of interval-valued fuzzy sets and intuitionistic fuzzy sets are the same. In a similar way, we can prove that the intersection operations for both kinds of fuzzy sets are the same. The following shows the equivalence in negation operations. A={(x,[iiA-(x),nA-(x)])\xeX}
nL(x) = l-nrA(x)
nrA(x) =
l-^'A(x)
r
fJ.'A(x) and l^ A(x) can be rewritten as follows: li'A{x) =
\-vA{x)=\-{\-vA(x))=vA(x)
r
H A(x) = l-n'A(x) = l-nA(x) We can see that n'A(x) and Mj(x) correspond to HA(x) and v^(x) respectively. From those observations, we can see that interval-valued fuzzy sets and intuitionistic fuzzy set have the same expressive power and the same basic set operations. 6
Comparison of Intuitionistic Fuzzy Sets with Bipolar-valued Fuzzy Sets
When we compare a bipolar-valued fuzzy set A = {(x, (/iA(x),fiA(x))) I x e X} with an intuition-istic fuzzy set A = {(x, )J.A{x), vA(x)) I x e X] under the conditions
438 \ipA (x) = nA(x) and fiA (x) = -v A(x) , bipolar-valued fuzzy sets and intuitionistic fuzzy sets look similar each other. However, they are different each other in the following senses: In bipolar-valued fuzzy sets, the positive membership degree fiA (x) characterizes the extent that the element x satisfies the property A, and the negative membership degree fiA(x) characterizes the extent that the element x satisfies an implicit counter-property of A. On the other hand, in intuitionistic fuzzy sets, the membership degree fiA(x) denotes the degree that the element x satisfies the property A and the membership degree vA(x) indicates the degree that x satisfies the Tier-property of A. Since a counter-property is not usually equivalent to notproperty, both bipolar-valued fuzzy sets and intuitionistic fuzzy sets are the different extensions of fuzzy sets. Their difference can be manifested in the interpretation of an element x with membership degree (0, 0). In the perspective of bipolar-valued fuzzy set A, it is interpreted that the element x does not satisfy both the property A and its implicit counter-property. It means that it is indifferent (i.e., neutral) from the property and its implicit counter-property. In the perspective of intuitionistic fuzzy set A, it is interpreted that the element x does not satisfy the property and its nof-property. When we regard an intuitionistic fuzzy set as an interval-valued fuzzy set, the element with the membership degree (0, 0) in intuitionistic fuzzy set has the membership degree [0, 1] in interval-valued fuzzy set. It means that we have no knowledge about the element. On the other hand, their set operations union, intersection, and negation are also different each other. These things differentiate bipolar-valued fuzzy sets from intuitionistic fuzzy sets. The intuitionistic fuzzy set representation is useful when there are some uncertainties in assigning membership degrees. The bipolar-valued fuzzy set representation is useful when irrelevant elements and contrary elements are needed to be discriminated. 7
Examples
This section gives some examples to use the three fuzzy set representations for a fuzzy concept frog's prey. The next is an interval-valued fuzzy set for frog's prey: frog's prey = {(mosquito, [1,1]), (dragonfly, [0.4,0.7]), (turtle, [0,0]), (snake, [0,0])) The following shows an intuitionistic fuzzy set corresponding to the above intervalvalued fuzzy set: frog's prey = {(mosquito, 1, 0), (dragonfly,
0.4,0.3), (turtle, 0, 1), (snake, 0, 1)}
From those examples, we can see that interval-valued fuzzy sets and intuitionistic fuzzy sets have the same expressive power. The next shows a bipolar-valued fuzzy set for frog's prey: frog's prey = {(mosquito, (1,0)), (dragonfly, (0.4,0)), (turtle, (0,0)),(snake, (0,-1))}
439 For the element snake, the above interval-valued fuzzy set and the intuitionistic fuzzy set have 0 membership degree which just means that snake does not satisfy the property corresponding to frog's prey despite that snake is a predator offrog. On the other hand, the above bipolar-valued fuzzy set has -1 membership degree which indicates that snake satisfies some counter-property with respect to frog's prey. Meanwhile, interval-valued fuzzy sets and intuitionistic fuzzy sets can express uncertainties in assigning membership degrees to elements.
8
Conclusions
This paper compared three fuzzy set representations: interval-valued fuzzy sets, intuitionistic fuzzy sets, and bipolar-valued fuzzy sets. It showed that interval-valued fuzzy sets and intuitionistic fuzzy sets have the same expressive power and the same basic set operations. Interval-valued fuzzy sets and intuitionistic fuzzy sets can represent uncertainties in membership degree assignments, but they cannot represent the satisfaction degree to counter-property. On the other hand, bipolar-valued fuzzy sets can represent the satisfaction degree to counter-property, but they cannot express uncertainties in assigning membership degrees.
9
Acknowledgements
The works was supported by the Korea Science and Engineering Foundation through the Advanced Information Technology Center(AITrc). References 1. 2. 3.
4. 5. 6.
H.-J. Zimmermann, Fuzzy Set Theory and Its Application, Kluwer-Nijhoff Publishing, 1985. K. T. Atanassov, Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, Vol.20, pp.87-96, 1986. T. Ciftcibasi, D. Altunay, Two-Side (Intuitionistic) Fuzzy Reasoning, IEEE Trans, on System, Man, and Cybernetics -Part A, Vol.28, No.5, pp.662-677, 1998. K.-M. Lee, Bipolar-valued fuzzy sets and their operations, Fuzzy Sets and Systems (accepted). H. Bustince, Construction of intuitionistic fuzzy relations with predetermined properties, Fuzzy Sets and Systems, Vol.109, pp.379-403, 2000. G. J. Klir, T. A. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall Editions, 1988.
441 I N T R O D U C I N G USER C E N T R E D DESIGN INTO A H Y B R I D INTELLIGENT INFORMATION SYSTEM METHODOLOGY KATE ASHTON and SIMON L. KENDAL School ofCET, St Peter's Campus, University of Sunderland, Sunderland SR6 ODD UK E-mail: [email protected] Hybrid intelligent information systems (HIIS) present a special case in the context of integrated systems in that both intelligent and conventional component systems are integrated. To date methodologies for development of these increasingly important systems concentrate on essential strategies for integration of component knowledge-based and conventional system technology. This is expected to be an absorbing issue in the early stages of HIIS methodology evolution, but integration of usability now forms an interesting challenge. The HIIS methodology known as HyM, currently undergoing development at Sunderland University, has been extended to include user centred design and the potential for responding to a varied user population by means of intelligent interface technology. Targets were set to preserve the ethos of the original methodology while providing seamless integration into it. The modular structure of HyM was exploited to achieve integration of volatile user knowledge without interference with stable data and knowledge essential for core system functions. Modularity and object-orientation promoted economical integration of a shared user modelling component. The extended HyM methodology was applied to enhancement of a HIIS. Evaluation indicates that smooth and seamless extension of HyM to incorporate user centred design has been achieved.
1. Introduction Hybrid intelligent information systems (HIIS) are integrated computer systems in which conventional information systems and knowledge-based systems co-operate to share knowledge and data. HIIS are potentially complex interactive systems, the variety of their components giving rise to the possibility of disparate user populations. User involvement is a major factor in the development of most successful computer systems [15]. This feature is absent from HIIS. They are, therefore, important and legitimate targets for the development of user centred design in their methodologies. It is expected to have increased significance in HIIS but so far this is largely unexplored. The HyM methodology for HIIS currently under development at Sunderland University was targeted to investigate the potential for incorporating user centred design. Its fundamental modularity was exploited to provide for this feature and for integration of a simple, consistent, shared user model representing user populations. 2.
Hybrid Intelligent Information Systems (HIIS): a Special Case
Hybrid intelligent information system (HIIS) are computer systems that consist essentially of an integration of a variety of potentially physically separate (although co-operating) conventional and knowledge based systems. This integration of reasoning with conventional processes may result in development problems caused by the fundamentally different structures required by differing components. HIIS
442
are likely to process very large volumes of data from various sources [6], to require more levels of security than conventional systems and to inherit problems or strengths from radically different component technologies and integration techniques. The nature and complexity of these systems suggests that both user and stakeholder populations may be large and disparate. User centred design in this context is a relevant consideration. 3.
Importance of User Centred Design
User centred design implies that system users are made a central issue throughout the design process. It remains a salient topic [17]. The advisability of involving users in system design is well documented [15,10]. User involvement varies from participation only at evaluation stages, to right through the entire life cycle [10]. Shackel believes human factors to be paramount, advocating that design must start with the end user [15]. However, user centred design does not require end users to be given priority over data and knowledge needed to deliver valid systems. 3.1
HIIS Users
Interactive HIIS and their methodologies reported in literature have been examined to gather information about user populations. Although increasing in number, HIIS are still not commonplace. Few references to potential system users occur as methodologies concentrate on software engineering techniques [2,4,8,11,12]. Evidence of user populations that consist of distinct groups is reported by Fedra [6]. The user population of HMISD, a hybrid diagnostic system developed according to HyM [3], consists of patients, nurses, audiologists, secretaries and doctors (consultant or general). Such disparate groups cannot be accommodated by one set of help or information messages (the canonical option). An alternative is to make use of intelligent interface technology. 4.
Intelligent Interface Technology
Intelligent interface technology (IIT) incorporates a wide range of methods where some intelligence is applied to user interface design and implementation by means of a user model [1]. Complexity of HIIS and disparity of their user populations makes them an interesting target for introducing IIT. However, increasing the complexity of already complex systems must detract from the advantages of tailoring system response to users. Only a judicious choice of user model has the potential for minimising this effect. 5.
HyM: An Object-Orientated HIIS Methodology
The HyM prototype methodology [9] already under development at Sunderland
443
University organises, refines, develops and builds upon earlier methodologies (waterfall process, prototyping and model-based approaches). Integration procedures are controlled and facilitated by making full use of the object-orientation paradigm throughout its life cycle in the processing of procedural information and declarative logic. A salient feature of the HyM life cycle is the combined analysis and design stage. These features, established before the start of the user centred design extension, provide the potential method by which its integration may be achieved. 5.1
User Centred Design in HyM
Figure 1: HyM Life Cycle with Expanded Analysis and Design Detail
In the HyM the integrated analysis and design stage provided the location for smooth extension to accommodate an interface design cycle (Figure 1). Core system engineering deals with the processing of stable data and knowledge in the system cycle, including the user only as a member of the design team. The interface cycle extension allows separation of volatile user issues. The two separate communicating cycles now address clearly distinguishable issues. The system cycle models the iterative development of component systems within the HIIS, processing data or knowledge essential for developing valid systems. The interface cycle contains all aspects of user-system communication and accommodates user interface design and
444
procedures deemed necessary for its satisfactory development. The interface cycle, therefore, models a close interaction between the HIIS and its components, users and designers and hence incorporates principles of human computer interaction. Two distinguishable user tasks are advocated, a restricted role as a member of the design team, responsible only for core system development and a separate and wider role in interface design. 5.1.1
Interface Cycle
The interface cycle (Figure 1) models an iterative process consisting of interface analysis, interface design and interface evaluation. The one-way connection between model evaluation and interface analysis indicates that, at each pass through the system cycle, information from increasingly valid HIIS components becomes available. At some stage, in depth user analysis is appropriate. The interface cycle is then an important parallel development with priority equal to the system cycle. User Modelling in HyM A user model is an essential part of intelligent interface technology. It is the system's model of a set of those characteristics of its users that affect their interaction, the model being clearly distinguishable from other system knowledge. In HyM the user model is an integral part of the interface cycle (Figure 1). It is modelled as an optional subset of each stage resulting in expansion as in Figure 3. Information accumulating at interface analysis enables a decision to be made on incorporating IIT, system designers having the option of rejecting the technology. If it is accepted, design and implementation follow
Figure 2: Integrating User Modelling into HyM In the case of HIIS, the user model's role is clear: to model users for the purpose of tailoring system responses. A striking feature about a chunk of data characterising HIIS users is its capacity to exploit advantages of frame systems and stereotypes.
445 The object-orientated nature of the HyM methodology provides an ideal medium for constructing a shared model by means of the same paradigm It allows sharing, consistency, modularity and integration by message passing to already established system classes. Stereotypical models of users' experience seem to be the representation most used by system builders [7]. They have been applied to user modelling from adaptive hypermedia systems [5,14] to computer integrated manufacturing systems [13]. 6.
Evaluation Issues
To evaluate ideas arising in the course of HyM extension they were applied to the HMISD system including construction of an object-orientated user model representing group and individual users. The object-orientated approach to stereotype construction enabled HIIS systems to communicate with the hierarchy of user classes created. One user model was accessible by all HIIS components. Empirical evaluation with users confirmed that disparate groups with different interests existed in the HMISD user population and were then accommodated. Evaluation of the extended methodology made use of the GQM paradigm [16] to reveal inconsistencies between priorities of the extended HyM methodology and life cycle and those of the earlier version 1 methods. This evaluation indicated that consistent integration of user centred design into HyM had been achieved. 7.
Conclusions
User centred design is so far largely absent from increasingly important HIIS methodologies. A HIIS methodology (HyM) has been extended to include this. An important feature of the extended methodology is the separation of volatile user data from stable core system data and knowledge by accommodating them in separate, communicating core system and interface cycles. Accommodating the enhancement in the crucial analysis and design stage makes usability central in the extended methodology. Human factors are not paramount in the enhanced HyM; they have status equal with that of system data and knowledge. Accommodation of disparate user populations of HIIS was provided for by optional incorporation of IIT. Objectorientation enabled user model and interface to be integrated smoothly. HyM is a versatile HIIS methodology promoting seamless extension and integration of new methods and components.
446
References 1. 2.
3.
4. 5.
6. 7. 8.
9.
10. 11.
12.
13.
14. 15.
Benyon D. R., Murray, D. M. (Editorial) Special Issue on Intelligent Interface Technology: Editor's Introduction. Interacting with Computers 12 (2000) Bravo-Aranda G., Hernandez-Rodriguez, F. et al Knowledge-Based System Development for Assisting Structural Design. Advances in Engineering Software 30 (1999). Chen X., Kendal, S. L. et al Development and Implementation of a Hybrid Medical information System. Medical Informatics Europe '96 J. Brender et al (Eds.) IOS Press (1996). Chen Z., Zhang H., Zhu et al An Integrated Intelligent System for Ceramic Kilns. Expert Systems with Applications 16 (1999). Di Lascio L. Fischetti E. et al., A fuzzy-based Approach to Stereotype Selection in Hypermedia. User Modeling and User Adapted Interaction 9(4) (1999). Fedra K A., Decision Support for Natural Resources Management: Models, GIS and Expert Systems. A.I. Applications 9(3) (1995). Hook K., Steps to Take Before Intelligent Interfaces Become Real Interacting with Computers 12 (2000). Karunaratna, D. D., Gray, W.A. et al., Establishing a Knowledge Base to Assist Integration of Heterogeneous Systems. Advances in Databases - 16th British National Conference on Databases BNCOD Proceedings (1998) Kendal S. L., Chen X. et al., HyM: a Hybrid Methodology for the Development of Integrated Hybrid Intelligent Information Systems, Proceedings of Fusion 2000. Third International Conference on Information Fusion Paris, (2000). Madsen K. H. The Diversity of Usability Practices. Communications of the ACM. 42(5), (1999). Matthews K. B., Sabbald A. R. et al., Implementation of a Spatial Decision Support System for rural Land Use Planning: Integrating Geographic Information System and Environment Models with Search and Optimisation Algorithms. Computers and Electronics in Agriculture 23 (1999). Molina M., Sierra J.L., et al., Reusable Knowledge-Based Components for Building Software Applications: A Knowledge Modelling Approach. International Journal of Software Engineering and Knowledge Engineering (3) (1999). Monfared R. P., Hodgson A. et al., Implementing a Model-Based Generic User Interface for Computer Integrated manufacturing Systems. Proceedings of the Institute of Mechanical Engineers Part B - Journal of Engineering Manufacture 212: (7) (1998). Pagesy R. et al., Improving Knowledge Navigation with Adaptive Hypermedia. Medical Informatics and the Internet in Medicine 25 (1) (2000). Shackel B (1997) Human-Computer Interaction: Whence and Whither? Journal of the American Society for Information Science 48 (11) (1997).
447
16. van Solingen R., Berghout E. The Goal/Question/Metric Method: a practical guide for quality improvement of software development. Publisher McGraw-Hill Companies, (1999) 17. Vredenburg K. Increasing Ease of Use. Communications of the ACM 42(5) (1999).
449 TOWARDS HYBRID KNOWLEDGE AND SOFTWARE ENGINEERING S. KENDAL, X. CHEN School ofCET, St. Peter's Campus, University of Sunderland, Sunderland UK, SR6 ODD E-mail: simon. kendal@sunderland. ac. uk Software Engineers face many requirements for the development of large-scale and complex systems. A new challenge is the study of Hybrid Intelligent Information Systems (HIISs) that integrate conventional software systems and knowledge-based systems. This paper describes a hybrid methodology HyM for the development of such large-scale hybrid systems, which combines conventional software system development models with knowledge-based system development approaches. The method provides a hybrid life-cycle process model to combine the waterfall process, incremental development, rapid prototyping and model-based approaches, which results in a hybrid knowledge and software engineering approach to systems development.
1
Introduction
Over past few years, the development of large-scale and complex hybrid systems has generated much interest in the artificial intelligent (AI) community, for several reasons [5]: Many current knowledge-based systems (KBSs) are very large and complex, consisting of both intelligent system components and database systems. This has demonstrated a clear need to develop and support the seamless integration of knowledge based systems with conventional information systems. Therefore, it is important for a systematic approach to be suitable for the development of different components. There is no a single method in software engineering and knowledge engineering that perfectly covers all phases and aspects of system engineering. Conversely, the use of several independently developed methods has a number of drawbacks such as inconsistency, redundancy, increase of change effort and possible loss of information. In an attempt to provide at least a partial solution to these problems, we propose a hybrid knowledge and software engineering methodology HyM for the development of large-scale and complex hybrid AI / conventional systems. This provides a hierarchical architecture with three levels and a hybrid life-cycle process model to combine the conventional waterfall process, incremental development, rapid prototyping and model-based approaches. Recently researchers have suggested several process models and approaches for the development of large-scale intelligent systems. Gillies [7] described a strategy to avoid ill-defined requirements and reduce time scales. However, in this model, analysis and design are still two independent phases. Complete requirement
450
specifications are required before the design phase can start. Thus this model still has some of the limitations inherent in the waterfall model. Other models, also provide a hybrid process for the development of complex software systems [2]. The incremental development life-cycle is an improved rapid prototyping model where each delivered increment provides needed operational capability. This shifts the management emphasis from developmental products to risk assessment. The incremental development model can reduce the frequency of loops and effort in the conventional rapid prototyping model. The spiral model emphasises the use of three process models together to develop different parts of the system. However these models almost always assume that analysis is a static process that can be separated from design and is independent of any implementation consideration. The development of Hybrid Intelligent Information Systems (HIISs) requires a gradual shift from analysis concerns to design concerns [8]. Model-based methodologies [9,11,12] have been suggested for the development of KBSs. These approaches provide many advantages however they mostly emphasise the problems in KBS development and give a few considerations to conventional software systems development. A life-cycle model is proposed that incorporates advantages of the evolutionary approaches and systematic model-based approaches. 2
The Hierarchical Architecture of Complex Software Systems
A hierarchical architecture concept for a complex software system is proposed to support the development of HIISs, as Figure 1 illustrates. There are three levels in this architecture, the repository level, component level, and hybrid intelligent information system level. Following is a brief explanation of these levels. 2.1
Repository level
The lowest level is the repository level. This level deals with coding technologies, transactions implementation and repositories of data and knowledge. Repositories act as basic building blocks of a system component. They contain descriptions of various types of data and knowledge that are produced, managed, exchanged and maintained in a software system. A key challenge for repositories is an ability to handle and manage many types of data and knowledge. This requires powerful means for representing and mapping different data and knowledge models at multiple levels of abstraction.
451 2.2
Component level
The component level consists of those relatively independent system components based on models in a complex software system. A component may integrate data,
Figure 1. A hierarchical architecture of complex software systems Modelling is one of the most important technologies to determine a model design and implementation in developing a component. There are two major modelling activities, conceptual modelling (model analysis) and formal modelling (model design) which are associated with model transformation, design and implementation. 2.3
Hybrid intelligent information system level
In the HIIS level, various system components are combined to configure a hybrid system. This level deals with techniques for building an ideal system architecture and producing good interoperability among the system components. Potentially a hybrid system could be an abstract entity made up entirely from physically independent, distributed, and parallel processed components working on multi platform environments. All co-operating towards some larger goal.
452
Working above the HIIS level requires techniques to support systems integration and co-operation. One approach is to use multi-agent systems [1] or to develop systems with sharable component libraries [6]. The hierarchical architecture, proposed here, provides two views for the development of complex systems. From the technical view, these levels are independent of each other, e.g., there seems no direct relationship between techniques of database design in the lowest level and modelling technologies in the component level. On the system view they are interrelated, e.g., every HIIS consists of components and their data, knowledge and procedures. Many current software techniques can also be mapped onto these levels. From the view of the software development process, the conventional top-down process starts in the HIIS level and the bottom-up process begins from the lowest level. Model-based processes are used to form the component level. 3
Proposed Hybrid Process Model
A new life-cycle process model is proposed that supports hybrid knowledge and software engineering. This combines four conventional process models: waterfall process, incremental development, rapid prototyping and model-based approaches, as shown in Figure 2.
Figure 2.
The HyM life-cycle
This process model consists of two iteration sub-processes: internal and external. The external process is a cross between the waterfall life-cycle and incremental prototyping. The internal process is a rapid prototyping process, which crosses phases of requirements analysis and system design, i.e. there is a gradual move from analysis phase to design phase. The internal process includes steps
453
related to system models: model analysis, model design, model evaluation and similar steps related to the development of the interface. This hybrid life-cycle model has many benefits for the development of hybrid information systems. It encourages strategic decisions within the feasibility study phase of the project to promote good project control. It promotes the smooth transition from analysis to design. The internal iteration process is a rapid prototyping model suitable for small knowledge module development and allows for thorough evaluation. When a system component is modelled into a data model or a procedure model, little iteration is required. For a knowledge component the iteration process is completed in a few cycles with the component being prototyped in software and having finally passed a strict quality control review to ensure that the reasoning is complete and at an appropriate depth. Using model-based concepts, the new process can model and partition system components based on the objectoriented paradigm. The new life-cycle process overcomes those problems in the waterfall life-cycle when developing a KBS and the problems associated with the use of rapid prototyping when developing a conventional software system. Finally, separation of stable functional requirements from volatile user considerations facilitates the development of re-usable repositories and components 4
Applications
The HyM methodology [10,3]. integrates four existing methods using two integration approaches: intra-process and inter-process. In the requirements analysis phase, a structured method is applied to function analysis, an information modelling method is applied to data analysis, and a knowledge acquisition method is applied to knowledge analysis. An intra-process approach is then used to integrate these techniques. Finally, an object-oriented method is applied to the design and implementation of hybrid information systems. Using this methodology, a hybrid medical information system for dizziness (HMISD), a complex medical domain, was developed [4]. Following evaluation the use of this system is being expanded to other regional hospitals. 5
Conclusions
Along with rapidly increasing requirements to develop large-scale and complex intelligent systems, new technologies, introduced daily, profoundly impact on developing applications and will require equally profound changes in software system architectures and development process models. In this paper, we propose a hybrid knowledge and software engineering approach, consisting of a hierarchical architecture and a hybrid life cycle process model, for the development of largescale and complex hybrid AI / conventional software systems.
454 References 1. Aylett, R.; Brazier, F.; Jennings, N. et al, Agent Systems and Applications, The Knowledge Engineering Review, Vol.13, No.3, (1998). 2. Boehm, B.W. A Spiral Model of Software Development and Enhancement, IEEE Computer, Vol.21, No.5. (1988). 3. Chen, X.; Kendal S.; Potts I and Smith P, Towards an Integrated Method for Hybrid Information System Development, IEE Proceedings on Software Engineering, Vol.144, No. 5-6, (1997). 4. Chen, X; Vaughan-Jones, R.; Hawthorne, M. et al, HMISD: an Hybrid Medical Information System for Dizziness, Proceedings of the First European Conference on Health Informatics., (1995). 5. Gaspari, M.; Moffa, E. and Stuff, A. An Open Framework for Cooperative Problem Solving, IEEE Expert, (1995). 6. Gennari, J.; Stein, A. and Musen, A. Reuse for Knowledge-Based Systems and CORBA Components, Proceedings of Knowledge Acquisition Workshop (KAW'96), (1996). 7. Gillies, A. The Integration of Expert Systems into Mainstream Software, Chapman & Hall Computing, (1991). 8. Harmon, P. and Hall, C. Intelligent Software Systems Development - An IS Manager's Guide, John Wiley & Son, Inc, (1993). 9. Lee, J. and Yen J., Enhancing the Software Life Cycle of Knowledge-Based Systems Using a Task-Based Specification Methodology, International Journal of Software Engineering and Knowledge Engineering, Vol.3, No.l., (1993). 10. Kendal S., Chen X. and Masters A., HyM: a Hybrid Methodology for the Development of Integrated Hybrid Intelligent Information Systems. Proceedings of Fusion 2000 - 3rd International Conference On Information Fttf/on. Paris, (2000). 11. Pour, G. Towards Component-Based Software Engineering, Proceedings of the 22nd IEEE Annual International Conference on Computer Software and Applications, (1998). 12. Schreiber, G.; Welinga, B. and Breuker, J. CommonKADS: A Comprehensive Methodology for KBS Development, IEEE Expert, December, (1994). 13. Song, X. and Osterweil, L. Experience with an Approach to Comparing Software Design Methodologies, IEEE Transactions on Software Engineering, Vol.20, No.5., (1994).
455 DYNAMICAL COMPUTING, COMMUNICATION, DEVELOPMENT AND HIERARCHICAL INFERENCE
H. M. HUBEY Department of Computer Science, Montclair State University, Upper Montclair, New Jersey, 07043, USA E-mail: [email protected] P. B. IVANOV International Science and Technology Center, 9 Luganskaya Street, P. O. Box 25, Moscow, 115516, Russia E-mail: [email protected] A model of computing is suggested, combining the approach of analytical mechanics with the principles of a general psychological theory of activity. Thus reformulated, the traditional picture of computation allows generalizations of interest for distributed and parallel computing, artificial intelligence, or consciousness studies. The notion of hierarchical computing is discussed, stressing the communicative aspect; the directions of increasing the complexity of both computational universe and the computing agents are indicated. The idea of computability is reconsidered in the light of the new approach. The basic principles of hierarchical logic are presented as a tool for constructing generic formal systems.
1
Introduction
Using a computer, one has to arrive to useful results starting from some raw material. The principal question is that of computability. First computers were relatively simple, and the famous Godel theorems reformulated for various formal systems [1-3] indicated the limits of primitive sequential computing. With the development of the Internet, the problem of computers talking to each other gained importance, and the rapid development of parallel computing and peer-to-peer technologies requires a different theoretical picture reflecting the present situation. The inherent insufficiency of the traditional logical systems in a complex environment has been demonstrated by Hubey [4]. In studies of human behavior, computer analogies are still popular, which may hinder the inverse process, understanding computation as a primitive analog of consciousness. A general theory of activity developed in Russian psychology since 1920s [5,6] could provide a solid framework for analysis of the communities of computers. The key principle of this theory, sociality of development, perfectly reflects the practices of the World Wide Web, and may serve as a source of ideas in designing efficient computer protocols approaching conscious communication. Hierarchical structures and systems are necessary for efficient computation in a developing world [7]. However, the general principles of hierarchical organization
456 are still poorly explicated in the literature, and the relation of hierarchy to development is far from being well understood. In this paper, we present a summary of hierarchical approach to computation. A general model of dynamical computing serves to translate the traditional static notions into a language more suitable for description of motion and development. Then we consider communicating computers and demonstrate how the opposition of the inner and outer world appears. We also present a formal scheme of hierarchy, replacing the traditional idea of inference with the directed construction.
2
Dynamics of computation
Traditionally, theories of computing were developed as formal models of an isolated computer operating in an essentially static world. Such an approach complies with the classical paradigm of mathematical study, but its application to real computation can only be limited, since the results of one computation serve to shape many other computation processes. That is why alternative pictures of computing may be useful. 2.1
The configuration space
Every computation occurs in some universe, so that successive operations would change the state of that universe. We admit that its distinct states can be somehow specified, and the collection of all the possible states forms what physicists usually call a configuration space X, which may be modeled with some mathematical structure (e.g. a finite set, a Euclidean space, a Hilbert space, a functional space, or a manifold). Points x of the configuration space X represent both the possible initial data and the possible outcomes of computation. 2.2
The agent
The agent is a device that can perform computation. According to A. N. Leontiev's theory [6,8], we distinguish the following levels of any agent's functioning. 2.2.1
Operations
An elementary operation changes the state of the universe, which is naturally represented as transition from one point of the configuration space to another. In every particular state (point x) there is a variety of admissible changes; by analogy to analytical mechanics [9,10], we will call it the tangent space to X in point x, Tx, the union of all the Tx is called the tangent space to X and denoted with T. Configuration space X together with all the tangent spaces Tx forms the phase space of the system, similar to a stratified manifold. Different agents are represented by different tangent spaces T.
457 The points of X that can be connected with a single operation are considered as adjacent. With thus introduced notion of relative adjacency, points adjacent for one agent may be not adjacent for another. For instance, different processors may emulate each other's functionality on the microprogrammatic level. 2.2.2
Actions
In this model, a computation process is represented by a trajectory in the phase space. Formally, there is a mapping d: X —»T, so that every point x of X corresponds to a single element dx from T^ . Given the initial and final states, xx and jcf, one can choose an admissible trajectory to arrive from xt to xs; the class of such trajectories is called an action. That is, contrary to operations that connect only adjacent points, actions link distant points via a sequence of operations. Different agents may have different classes of admissible trajectories, and die same action may either be unavailable to some agents, or be realized in different ways. The range of possible actions is intimately related to the nature of the agent, and it can usually be derived from a few fundamental principles. Thus, in classical mechanics, the principle of minimum action normally selects a single trajectory for fixed Xj and xt. The same holds for quantum mechanics, but the trajectory in a functional or operator space is considered instead of the usual 3-dimesional space. 2.2.3
Activities
In simple configuration spaces, only operations and actions are possible. In a more complex case, the points of the space X form a number of classes X t ; any trajectory connecting the points of the same classes Xj and X 2 belongs to the same action class, which is called activity. An activity is like higher-level operation, connecting adjacent classes; on the other hand, activity is non-local, since it demands action. For an example of activity, one can consider an infinite trajectory in some configuration space X: the points on the trajectory belong to the same class, and any action that can be represented by a finite segment of that trajectory connects that class to itself, hence belonging to the same activity. Yet another important example: if the initial and final states are structured, any action transforming a component of the initial state to some component of the final state will be a representative of the same activity. Such partial actions (iterations) may fail to converge; however, such an activity often leads to quite acceptable results (e.g. using asymptotic series expansions in special function approximation). 2.3
The computable world
Every agent encounters certain initial (boundary) conditions and operates following its built-in logic. However, due to the limited operational capacity, the agent cannot achieve any point of the world at all. Some points are unachievable because they are
458 not connected to the initial state by any admissible trajectory; some other points are only asymptotically approached; there may also be dynamic singularities that cannot lie on any admissible trajectory regardless of the initial conditions. That is, any single agent can only span a subspace of the full configuration space X; this subspace is called the world W of that particular agent, in given conditions. A world is an analog of a dynamic flow (a bunch of trajectories) in analytical mechanics. In the simplest case, the world reduces to a single trajectory. This individual world may have structure quite different from the structure of the configuration space in general, actualizing only a part of the possibilities available. For instance, in a Euclidean configuration space, the individual world may form a sphere, a torus, or a fractal. Also, the worlds spanned by the same agent with different initial conditions may be quite different. The agent can never break out of its individual world unless there are other agents, and hence a hierarchy of agents operating in a hierarchical world.
3
Hierarchical computing
The very distinction of the levels of operation, action and activity is already introducing hierarchy in the model, implying a hierarchical organization of both the configuration space and the agent. In this section, we consider communication as the source of hierarchical development, which gives way to numerous implications of importance in distributed computing and artificial intelligence. Let there be two agents Al and A2 operating in the same computational universe. Since a point in the configuration space X is a distinct state of the universe, and since, in this model, any operation changes the state of the whole universe, the two agents cannot act simultaneously, save in the trivial cases, and sequential operation is the only possibility: ... —»*i —>A\ —> x 2 —>^2 —»-*3 —>A\ —>*4—> —
This means that, from the viewpoint of each agent, the state of the universe between successive operations or actions may "spontaneously" change, which is impossible in the traditional approach to computability. That is, the activity of another agent results in discontinuities of individual trajectories, up to switching to an entirely different class of trajectories (activity). Similarly, assuming a universe developing according to some natural laws, we arrive to the necessity of accounting for the regularities of such development in the individual computation processes. However, in this work, we are mostly interested in agent-produces changes in the universe and do not consider naturally developing worlds. Agents Ai and A2 exist in their individual worlds Wj and W 2 , in general, spanning different parts of the whole configuration space X . This leads to a number of useful notions characterizing the possible relations between the worlds. Non-
459
intersecting worlds W! and W 2 imply that agents A | and A2 cannot operate together; if one of them works, the other must be stopped. An operation of agent Ai is A2compliant iff the resulting state of the universe belongs to W 2 . Such operations do not change the activity of agent A2 , rather influencing the timing of an action; alternatively, they could be called boosts. An operation of agent A i is A2-compatible iff it results in a point x that can lie on some trajectory of A 2 , maybe with different initial conditions. In other words, there are points of X adjacent to x in A i . Obviously, all A2-compliant operations of Aj are also A2-compatible. Existence of non-compliant operations means that the actual configuration space of an agent does not coincide with the whole X and hence is reducible to some subspace of X. However, as it will be shown below, there are no such domain limitations in hierarchical agents. Indeed, one can consider sequential operations performed by different agents as a higher-level operation performed by an agent A consisting of both A, and A2: ... —» JCJ —> ( A ! — > x 2 — > A 2 ) —>JC 3 —» ...
The intermediate state x2 (point of X) can be interpreted as internal for A, and the elementary operation of A transforming xx into x3 is a composition of the operations of A i and A 2 ; the point JC2 of X, beside being a specific state of the universe, represents a particular composition of operations. Points of X that serve as internal for some agent A (and hence mediate communication of lower-level agents) are called products. State s of the universe that is exclusively used to switch activity from one agent to another is called a symbol. Alternatively, one can consider a hierarchy of operations. The original tangent space T now contains only direct operations, while there also are indirect operations mediated by other agents. In the above example, x3 may be unachievable for A] with any direct operation, but it becomes achievable with a second-order operation involving another agent. Hierarchical agents imply hierarchical worlds composed of many individual worlds, plus the points x achievable via collective actions. In the above scheme, the points Xi and agents A, become interchangeable: ... -»Ai —>(*i —>A2 —>x2) —>A3 —>... Like points x may become internal for hierarchical agents, transformations of the world (operations and agents) can be considered as occurring in the interior of a higher-level point of the hierarchical configuration space. The difference between agents and the states of the computational universe hence becomes relative.bFrom the hierarchical viewpoint, one could consider any action as an operation of a higher-level agent arising from self-communication: ... —> JC] —» (A —> x2 —» A ) —> * 3 —> ...
460 The agent A thus becomes composed of two specialized components: one of them (the afferent component) transforming an outer state of the universe x\ into an inner state of the agent x2 , and the other (efferent) component producing an outer state x3 from the inner state. 4
Hierarchical inference
So far, we considered hierarchical computing in a static universe, so that only its state could be changed. Beside the already mentioned natural development, this picture can be complicated by new objects produced by the agents. Once the states of the universe become represented by some other states (symbols), the operations on symbols may develop in a very complicated area. After all, agents do not stop on symbolic computation, and pass to material production, which enormously extends the configuration space and opens new direction of hierarchical development. Formally, this process could be modeled in a peculiar logic, containing the following rules of inference. 1.
(Reflexivity) If there is an object O, there is a link —» of this object to itself:
o->o 2.
3. 4. 5.
6.
(Unfolding links) For any link —> there is an object O' mediating it, so that —» is equivalent to —»(?'—> ; the resulting links are different from the original and denoted with the same arrow merely for brevity. (Folding links) The reverse of (2): any mediated link can be folded in a higherlevel link. (Abstraction) For any linked objects Ot —> 02 , there is an object O representing the link. (Unfolding objects) Any object mediating a link —» 0 ' - > is a contracted form of a triad of input, inner state, output: —» (S' —> C" —>/?)—> ; this rule might be replaced with an equivalent: —»0'—» implies —> (S' —>R') —>, and then -> (S' -> C" -> R1) -> by rule (2). (Refoldability) —> (0\ —> 02) —> is equivalent to —> Oi) —» ( 0 2 —> , with a proper re-interpretation of links.
The entity obeying these laws is called a hierarchy. This set of rules is not minimal, and there may be many equivalent formal systems. Explicitly specifying the levels of hierarchy for both objects and links, one can construct rather complex structures, then fold them into simple schemes, and unfold in different way. As it is easy to see, objects do not differ much from links in a hierarchy as a whole, while they will certainly be different in every hierarchical structure. Obviously, no hierarchy can be complete, since any element can be unfolded in a complex structure, producing additional elements and additional types of links. Hierarchical logic is a method of construction, rather than construction itself.
461 However, a hierarchy possesses a kind of absolute integrity, since every element is related to each other, and the hierarchy can always be unfolded in a structure, in which these elements are connected with a direct link. One could put forward the hypothesis that any static formal system, as known in modern mathematics, can be obtained via hierarchical development, as one of the possible unfoldings.
5
Conclusions
We have outlined an alternative approach to computation based on the hierarchical ideas. This approach conveniently links the traditional notions of analytical mechanics to the studies of human behavior within a general psychological theory of activity. Such a synthesis may be productive enough, to give birth to various nonstandard theories of computation and inference, efficient methods of distributed and parallel computing, new forms of artificial intelligence. Even if not so, it presents one more possible conceptualization, which is not reducible to any known mathematical structure, rather being a tool for reconstructing any integrity at all.
References 1. Mendelson, E. Introduction to mathematical logic (Princeton, NJ: D. van Nostrand, 1964). 2. Cutland, N. Computability: An introduction to recursive function theory. (Cambridge: Cambridge Univ., 1980) 3. Mesarovic M. D. and Takahara Y. General Systems Theory: Mathematical Foundations (N.Y.: Academic, 1975) 4. Hubey H. M. The Diagonal Infinity (Singapore: World Scientific, 1998) 5. L.Vygotsky, Thought and language (Cambridge, MA: MIT Press, 1986) 6. Leontiev A. N. Activity, Consciousness and Personality. (Englewood Cliffs, NJ: Prentice Hall, 1978) 7. Efimov E. I. Intellectual Problem Solvers. (Moscow: Nauka, 1982) 8. Ivanov P. B. A hierarchical theory of aesthetic perception: Scales in the visual arts Leonardo Music Journal, 5 (1995) pp. 49-55 9. Arnold V. I. Mathematical Methods of Classical Mechanics (Moscow: Nauka, 1979) 10. Dobronravov V. V. Foundations of Analytical Mechanics (Moscow: Vysshaya Shkola, 1976)
Imaging Applications
465 CATADIOPTRIC SENSORS FOR PANORAMIC VIEWING R. ANDREW HICKS, RONALD K. PERLINE AND MEREDITH L. COLETTA Department of Mathematics and Computer Science Drexel University Philadelphia PA 19104, USA E-mail: [email protected] We describe a family of reflective surfaces for panoramic viewing which achieves approximately cylindrical projections. The requirement of satisfying the single viewpoint constraint restricts the surface type to conies; in contrast, relaxing this requirement allows us to obtain a novel class of sensors which give a highly accurate approximation of a cylindrical projection. Design parameters for these sensors enable control of the region of space to be imaged, therefore increasing effective resolution by excluding unwanted portions of the scene.
1
Introduction
A panoramic image is one that provides a "360 degree" field of view. There are numerous ways to capture such images, from wide-angle lenses to cameras with slits that rotate around a piece of film wrapped on a cylinder. An approach which has recently generated much interest within the computer vision community is the use of catadioptric sensors, which consist of combinations of cameras with conventional lenses (dioptrics) and curved mirrors (catoptrics). In this paper, our attention will be focused on this type of sensor. Their usefulness in panoramic imaging stems from the following idea: if one points a camera at a curved mirror (usually convex), the field of view of the camera can be increased. The image may then be digitized and numerically transformed as desired, including the generation of various different projections from the acquired image. For example, one might use the sensor to generate a perspective projection of a small portion of the image. A major asset of this type of panoramic sensor is that it operates in real-time, facilitating video applications. A consequence of the increased field of view of a catadioptric sensor is that image resolution tends to low. To make this precise, we define the resolution of the sensor to be the total number of pixels in the image divided by the total solid angle (steradians) that have been imaged. (Recall that solid angle simply refers to the area of a region on a unit sphere; hence a hemisphere corresponds to an angle of 27r. A true omnidirectional sensor can view a whole sphere - a solid angle of 4TT.) Suppose one chooses a fixed camera with a conventional glass lens; this choice effectively fixes accumulated pixels and resolution. If one augments the sensor by introducing a reflective surface
466
component which increases the imaged solid angle, the resolution obviously decreases. Decreased resolution is a major disadvantage for important applications such as stereo imaging and tracking. It is therefore worthwhile to design sensors which maximize resolution by more "efficiently" imaging the view sphere, and ignoring regions not of interest to the observer. The shape of the catoptric component of a catadioptric sensor is crucial in determining not only the field of view, but also the types of projections that may be mimicked by the sensor (possibly coupled with digitally transforming the image). An important example is the class of surfaces that yield a single effective viewpoint. We will say that a sensor has a single effective viewpoint if it measures the intensity of light passing through some fixed point in space, in every possible direction. This point, which is known as the effective viewpoint, acts as a sort of virtual center of projection. If a sensor does have a single effective viewpoint, then perspective images with respect to that point may be recovered. This may be achieved by backprojecting the image onto the plane of choice. For example, consider a parabolic mirror being viewed by an orthographic-type" camera. This sensor design, due to Nayar *, exploits the fact that the focus of a parabola can play the role of a single effective viewpoint if the parabola is viewed orthographically.
Horizon line Figure 1. Here we see a schematic depiction of an image from a parabolic sensor. The region in side of the dotted circle is of no use when creating a panoramic image, and so those pixels are "wasted".
Suppose we wish to create a panoramic image with such a parabolic sensor, where the region on the view sphere of interest lies between parallel "Two basic projection models for a camera are orthographic and perspective. In the orthographic case the camera is assumed to detect light rays that are parallel to one another in a fixed direction. In the perspective case, the rays detected are all those that pass through a fixed point called the pinhole or center of projection.
467
latitudinal lines - imagine the region of space which is swept out by the rotating beam of light of a lighthouse, the "beam sweep". A large fraction of the image obtained by the parabolic sensor is devoted to the camera (centered in the middle), and its immediate neighborhood - an area likely not of interest to the observer. Thus, valuable pixels are wasted; the resolution of the sensor is not optimized. See figure (1) for a diagram illustrating this phenomenon. To circumvent this difficulty, we have designed an "exotic" (non-conic) rotationally symmetric reflective surface which gives an approximate cylindrical projection of a specified "beam sweep". Recall that a cylindrical projection is a mapping from the world to the plane obtained by choosing a center of projection on the axis of symmetry of a cylinder and projecting points in the world onto the cylinder along the lines that contain the center of projection. Then the cylinder is "unrolled" by a software transformation to obtain a 2-dimensional image (see figure (2)). The design specification for our mirror translates mathematically into the requirement that the cross-sectional profile of the surface satisfy a certain differential equation (see section 3).
World point projected onto the cylinder
. World point
Cut the cylinder along a line
Figure 2. The cylindrical projection
Any rotationally symmetric mirror will have the property that horizontal circles on a cylinder (whose symmetry axis coincides with that of the mirror) will be imaged without distortion, up to scaling. Our mirror enjoys the additional property that vertical lines along a given cylinder are imaged without distortion, that is, they are subjected to a simple linear scaling. We have observed experimentally that this property corresponds to a certain robustness for the unwarping process - the sensor is designed to image objects at a certain distance, but continues to work reasonably well for objects at varying distances.
468
2
Previous Work
There are numerous systems for creating panoramic images: wide-angle and fisheye lenses, mechanical means, stitching images, etc. We will not survey all of these methods, since we are interested in catadioptric sensors. The reader interested in these systems may consult the surveys by Svoboda and Pajdla 2 and Yagi 3 . An early use of mirrors for panoramic imaging is a patent by Rees 4 , who proposes the use of a hyperbolic mirror for television viewing. Another patent is by Greguss 5 , which is a system for panoramic viewing based on an annular lens combined with mirrored surfaces. Nalwa 6 describes a panoramic sensor that uses flat mirrors and multiple cameras that has a single effective viewpoint. Chahl and Srinivasan 7 describe a mirror that has the property that there is a linear relationship between the angle of incidence of the light and its angle of elevation. The mirror is described using a differential equation. Hicks and Bajcsy 8 consider a mirror which provides perspective images without digital unwar ping. Nayar 1 has described a true omni-directional sensor, with the goal of reconstructing perspective views. Nayar and Peri 9 investigate two mirror systems with a single effective viewpoint. 3
The Polar Sensor
ax+b x
c
Figure 3. Derivation of the differential equation
We now derive the differential equation that describes our panoramic mirrors for cylindrical projection. In figure (3) we see a schematic of the
469
mirror geometry. The cross section of the mirror is the graph of a function f(x). We assume that the mirror is being viewed orthographically 6 and that the image is formed on the z-axis. We fix the vertical line at x = c and assume also that the mirror shape is such that the point (0, x) is the image of a point (c, T(x)) where we require T(x) = ax + b; this corresponds to a linear relationship between distance measured along the vertical line x = c, and the "film" located along y = 0. Thus the image obtained from such a sensor can be used to create a cylindrical projection by applying a simple polar coordinate transformation, hence we refer to this sensor as the polar sensor. 9 is the angle between the normal to the curve and the light ray, where we assume of course that the angle of incidence is equal to the angle of reflection. We derive our equation by computing tan(20) is two different ways, and setting the two expressions equal to each other. ,a=9 4
u* l/u /
3 2
°
/ 0.5
1.5 2
Figure 4. Cross sections of mirrors for use with orthographic projection, with scaling factors o = 1 (a straight line of slope 1), a = 3 and a = 9, all with 6 = 0 and c = 10.
On one hand, since tan(0) = / ' ( # ) , tan(20) = Y^FM^- ®n * n e ° t n e r hand, from the diagram tan(20) = ./X\ZT(X) • Here we are mostly interested in the case when T(x) — ax + b. Thus we have our basic equation:
2f'(x) l-/'(x)2
c—x
f{x)-ax-b"
Solutions to this equation tend to look like straight lines or concave up monotonic functions (see figure (4)). In figure (5) we see a panoramic view of a scene obtained from a prototype of our catadioptric sensor. In this case the "vertical field of view" is about 45 degrees. The image quality decreases towards the top of the image, where there is a discontinuity in the unwarping process. 6
This is a reasonable approximation for a camera with telecentric lens. A similar derivation is possible for the case in which the camera projection is assumed to be perspective.
470
Figure 5. On the left we see an image obtained directly from our sensor. On the right we see the same image unwarped.
4
Conclusions
Using differential equation techniques, we have introduced a new class of mirrors for panoramic imaging. By employing these mirrors, one can construct catadioptric sensors which efficiently image a given latitudinal range of the view sphere. In general, extensions of these techniques can be employed to construct catadioptric sensors with specific design constraints while optimizing their effective resolution. We axe currently investigating these research directions. References 1. S. Nayar. Catadioptric omnidirectional camera. In Proc. Computer Vision Pattern Recognition, pages 482-488, 1997. 2. T. Svoboda and T. Pajdla. Panoramic cameras for 3d computation. In Proc. of the Czech Pattern Recognition Workshop, pages 63-70, 2000. 3. Y. Yap. Omnidirectional sensing and its applications. IEICE Trans, on Information and Systems, E82-D(8), pages 568-579, 1990. 4. D. Rees. Panoramic television viewing system. United States Patent, (3,505,465), April, 1970. 5. P. Greguss. Panoramic Imaging Block for Three-dimensional space. United States Patent, (4,566,736), January, 1986. 6. V. Nalwa. A true omnidirectional viewer. Technical Report, Bell Laboratories, Bolmdel, NJ 07788, USA, 1996. 7. J.S. Chahl and M.V. Srinivasan. Reflective surfaces for panoramic imaging. Applied Optics, 36:8275-8285, 1997. 8. R. Hicks and R. Bajcsy. Catadioptic sensors that approximate wide-angle perspective projections. In Proc. Computer Vision Pattern Recognition, pages ' 545-551,- 2000. 9. S. Nayar and V. Peri. Folded catadioptric cameras. In Proc. Computer 'Vision Pattern Recognition, pages 217-223, 1999.
471 HIGH-PERFORMANCE COMPUTING FOR THE STUDY OF EARTH AND ENVIRONMENTAL SCIENCE MATERIALS USING SYNCHROTRON X-RAY COMPUTED MICROTOMOGRAPHY HUANFENG Department of Earth and Environmental Studies, Montclair State University, Upper Montclair, NJ 07043, USA E-mail: fengh@mail. montclair. edu KEITH W. JONES Environmental Sciences Department, Brookhaven National Laboratory, Upton, NY 11973, USA E-mail: [email protected] MICHAEL MCGUIGAN, GORDON J. SMITH, JOHN SPILETIC Information Technology Division, Brookhaven National Laboratory, Upton, NY 11973, USA E-Mails: [email protected], [email protected], [email protected]
Synchrotron x-ray computed microtomography (CMT) is a non-destructive method for examination of rock, soil, and other types of samples studied in the earth and environmental sciences. The high x-ray intensities of the synchrotron source make possible the acquisition of tomographic volumes at a high rate that requires the application of highperformance computing techniques for data reconstruction to produce the threedimensional volumes, for their visualization, and for data analysis. These problems are exacerbated by the need to share information between collaborators at widely separated locations over both local and wide-area networks. A summary of the CMT technique and examples of applications are given here together with a discussion of the applications of high-performance computing methods to improve the experimental techniques and analysis of the data.
1
Introduction
Materials studied in the earth and environmental sciences are generally very inhomogeneous and complex materials. Investigation of the three-dimensional properties of these materials is essential. However, there are relatively few ways that these properties can be measured using non-destructive methods. These methods include laser confocal microscopy, magnetic resonance imaging, and x-ray computed tomography. The use of x-ray computed tomography is particularly powerful since it gives information on x-ray attenuation coefficients and thus can distinguish between different minerals, pore space, and liquid-filled pore space in specimens that can range from a few millimeters to many centimeters in size. Our purpose here is to describe the application of computed tomography techniques to these problems based on the use of synchrotron radiation and to discuss the application of high performance computing to improve the technique.
472
2
Instrumentation and Method
A schematic diagram of the CMT apparatus at the Brookhaven National Synchrotron Light Source (NSLS) is shown in Figure 1 [7]. The x rays are detected with a thin scintillator of CsI(Na) or YAG:Ce. A mirror/lens system is used to focus light from the scintillator onto a charge-coupled device (CCD) camera. Blurring effects caused by scattering of the x rays in the scintillator are minimized by the small depth-of-field of the magnifying lens. The spatial resolution of the system is energy dependent and of the order of 0.005 mm. The CCD cameras used employed CCD chips with 1317 x 1035 and 3072 x 2048 pixels. In practice, the pixels are often binned to reduce the size of the tomographic volumes and thereby ensure practicable times for data acquisition and reconstruction. Monoenergetic beams are used at energies to about 20 KeV, and filtered white beam is used for measurements at higher energies to obtain higher x-ray intensities. Typical data collection times are of the order of 1-2 hours.
^ s ^ ^ # %%m$m &?$m&'
<—i^m^mS"
Figure 1. A schematic diagram of the apparatus.
Data acquisition produces a set of camera frames taken as the sample is rotated in, ^ series of steps determined Jby the number of pixels desired in the volume. The procedure covers an angular range of 180 degrees with respect to the incident x-ray beam, the number and size of files depend on the sample size and desired spatial resolution. , The data reconstruction proceeds in 3 phases. Phase 1 applies a whitefield normalization, any filters needed, and writes files containing data frbm all views for
473
a single slice (horizontal row of pixels), one file per slice. Phase 2 processes each slice independently. It applies the view-by-view air value normalization, optionally applies a filter to reduce the ring artifacts, computes the location in the images of the center of rotation, and converts the data to a sinogram. Phase 3 is the actual reconstruction. It generates a square array with dimensions of the horizontal row size. After this, the reconstruction is completed and the visualization process begins. This is a much more varied process and depends strongly on the particular sample being analyzed. 3
Experimental Results and Discussion
We have used tha CMT apparatus to investigate many different materials relevant to the earth sciences. In particular, sandstones are of wide interest, and typical data is presented here to show a specific example of the usefulness of synchrotron CMT. The data can be presented in several ways. A volume representation showing a 3-D view of sandstone from the Vosges region of France is shown in Figure 2. The grain structure of the material is clearly visible. A view of a single section through the sample is shown in Figure 3. This type of view helps to highlight the pore-grain relationships. The data can then be processed according to the measured attenuation coefficients to display the data in binary form representing either solid or pore space. Analysis of this data then gives the two-dimensional correlation function, porosity, permeability, and tortuosity. The measured microgeometry also can be used as a realistic basis for fluid flow calculations [2].
F i g u r e 2. Isosurface rendering of a three-dimensional tomographic volume of a sample of sandstone from the Vosges. The color scale indicates the values of the measured absorption coefficients. The pore space is shown as blue.
474
N
250-f
41
a soo625 750 125 250 375 500 625 75Q
-D jamas &cra I
0 . 0 0 0 0 0 . 0 0 2 5 0 . 0 0 5 0 0 . 0 0 7 5 0.0100
'slice F i g u r e 3* Single section through the sandstone volume shown in Figure 2. Absorption coefficients are indicated by the color scale.
4
High Performance Computing
The present system used at BNL and others, similar in design at the NSLS, APSS ESRF, md other synchrotrons, have demonstrated their worth in varied experiments on environmental and earth science topics. However, the system performance and usefulness can be vastly improved by use of high-performance computing technology. First, present day CCD cameras can produce 15-20 frames per second with a size of about 1000 x 1000 pixels. Assuming that about 1500 frames will be necessary to acquire data for a tomogram, approximately 6 gigabytes of data are produced per second (2 bytes per pixel). The goal is then to carry out all steps of the reconstruction process in the time needed to acquire the data, or about 1500 frames/15 frames/s or 100 s. Second, in order to be able to adjust experimental parameters in near real time, it is necessary to produce data displays on about the same time scale. There needs to be control of the display by the experimenter so that different views are feasible and thresholding can be set to display changes in pore structures or fluid motion.
475
There are other aspects of the problem that include high-speed networking, rapid data storage, and remote viewing that present challenges to computer science. Here we concentrate our attention on the first two points, data reconstruction and viewing. We have developed methods for doing these tasks on the time scales required by recourse to parallel computation techniques. X-ray computed microtomography is a highly computer intensive and memory intensive application. The large volumes and small grid spacing required for micrometer resolution push the limits of even the most powerful workstations. For this reason we are applying high performance parallel computation and data compression for remote access of tomographic data sets. The two main areas that need to be addressed are reconstruction and visualization. Reconstruction takes many projections obtained from the high resolution CCD camera and uses a FFT transform method developed by Robert Marr at Brookhaven to form a threedimensional gridded data set. Visualization of the data set either as a volume or an isosurface is used to closely inspect the sample and extract new features. This is typically done within a complete visualization package such as OpenDX or VTK. Reconstruction of large data sets can take several hours. Visualization on large grids can also take tens of minutes just to form a particular view of an isosurface. The use of parallelization to address these problems is one way to achieve necessary speedups for fast reconstruction and visualization. Basically one divides up the data set across multiple processors and reassembles the final result by synchronization and communication among processors. There are two main protocols for parallelization: Parallel Virtual Machine (PVM) and the Message Passing Interface (MPI). MPI is the more recent of the two protocols and is becoming a standard particularly on clusters of Linux computers. Parallel reconstruction using MPI is currently being used at the APS, and this technique can also be applied to data from the NSLS beam line. The results of applying parallel visualization using MPI to the NSLS beam line data are described here. 4.1
Parallel Visualization
Parallel visualization refers to the use of multiple computers for the graphical depiction of large data sets. It has its origins in parallel rendering where one breaks up the image to be rendered into smaller pieces and has each processor render its own part with all the pieces brought together for the final composite image [6, 8]. For example, Baily [5] has ported Pixar's RenderMan rendering software to the parallel Intel paragon machine. The free ray tracing program POV has also been ported to Linux clusters using MPI and PVM and achieved near linear speedups comparable with more expensive specialty supercomputers. Recently, parallel
476
visualization has gone a step beyond this by breaking up the data set itself. This data parallel model allows large data sets of high resolution, such as the x-ray tomography data sets discussed here, to be manipulated by using the cumulative memory and processing power of a Linux cluster. The data parallel model has been implemented in OpenDX 4.1.2 and VTK using MPI [1,3]. 4.1.1
OpenDX
OpenDX is an open source visualization package that can be applied to a wide variety of data sets [1]. Currently, we use the software to give a quick view of x-ray tomographic data sets, extract isosurfaces and slices, and convert from NetCDF data format to a data format that can be read into VTK. Recently, a port of the software using MPI has been achieved, and we are planning to apply this version to x-ray tomography. Figure 4 shows a screen save of the OpenDX software we have developed. OpenDX builds applications by dragging modules of specific functionality into the visual programming editor canvas and linking them together to form a network. The network shown uses an isosurface module, an export module to do the NetCDF conversion, and an image module for the final rendering. The data set shown is a high resolution x-ray tomographic data of the thigh bone of a rat used in osteoporosis studies.
Figure 4. OpenDX application for isosurface extraction and conversion from NetCDF data format to VTK format.
477
4.1.2
VTK
The Visualization Toolkit (VTK) is a multiplatform visualization package with C++, TK/TCL and Python bindings. It is known for its high quality implementation of the latest algorithms in computer graphics. Its most recent version includes a parallel MPI implementation of a subset of its modules [3, 4]. Here we discuss two programs built on this subset which we have applied to x-ray tomographic data sets. The first program, Paralso, breaks up the data set according to the command line arguments. It then computes isosurfaces for each piece on separate processors and then renders the final image with colors to indicate the work of the separate processors. An example for a tomographic data set is shown in Figure 5. The computation was done on a 4 Xeon processor 1400 Linux server from SGI. The second MPI program DataParallelism breaks up the data more directly by using an image reader module. It includes it's own sample data and explicit timing functions to measure the performance of the parallel isosurface computation. The results for the sample data are shown in Figure 6 and indicate nearly a factor of three speedup using all four processors.
Figure 5. Output from the VTK program Paralso. The four colors indicate the portion of the isosurface computed by each of the four processors on the Linux cluster.
478
100 _ m
B CM
CO
'
# processors Figure 6. Graph showing the speedup using the parallel MPI VTK software DataParallelism to compute isosurfaces.
4.2
Remote Visualization
The use of large computer facilities to perform x-ray tomographic reconstruction and visualization is of no use unless the final rendering can be delivered to the scientist's desktop. This can mean delivery over a high-speed intranet or more remote delivery connecting dispersed researchers over the Internet or wide area network. We are currently testing two software packages designed to perform remote visualization. Both packages use compression algorithms to speed up transmission to the desktop. The first remote visualization software, VizServer from SGI, transmits OpenGL visualization remotely from a multipipe SGI Onyx2 computer to SGI, Sun, or Linux clients. The software was able to achieve frame rates useful for interactive viewing (approx. 10frames/sec)for typical data sets. The Onyx2 we used had 2 GB of Ram, 6 processors and 2 pipes. As only one pipe is available for vizserving, this means at most one user at a time can access the data remotely from this system. A second remote visualization software has recently been released by TGS as part of Openlnventor 3.0. Openlnventor is a high level C++ and Java graphics toolkit built on top of OpenGL. This software works on a variety of platforms and doesn't require separate pipes for each user. It delivers Openlnventor applications directly to the desktop, and these can include x-ray tomographic data in Inventor and VRML format.
479 Another approach to remote visualization is the construction of a VRML server to deliver 3-D models directly over the Internet. VTK itself can be used to construct such a server in conjunction with a VRML browser. Currently we are using Cosmoplayer and parallelgraphics VRML browsers. The later can deliver on a variety of platforms including wireless PDA devices. 4.3
Stereoscopic Viewing
To understand the three-dimensional structure from high-resolution x-ray computed tomography it is useful to have a stereoscopic presentation of the data. Currently we are using a passive system with two Barco projectors and polarized filters [9]. The projectors are connected to an Onyx2 computer and the visualization is rear projected on a special screen that preserves polarization. The final visualization is suitable for group viewing in a visualization theatre with inexpensive polarized glasses. At the desktop, stereoscopic viewing can be done using a page-flipping method and active glasses. Both methods can be used with OpenDX or VTK visualization toolkits. The parallel graphics VRML browser can also be operated for remote stereoscopic visualization on the Internet. 5
Conclusions
In this paper we briefly described the use of synchroton CMT for investigation of earth and environmental sciences samples and then described improvements to the CMT system by application of high performance computing methods. We used OpenDX and MPI versions of VTK to perform the visualizations and achieved a factor of three increase in speed using a four processor Linux cluster. We also studied the application of visualization server software for desktop delivery and achieved reasonable frame rates for typical data sets. In the future, we will apply MPI versions of OpenDX and the use of VRML servers in order to make these high performance techniques widely available. 6
Acknowledgments
We wish to thank C. Law, B. Geveci, and S. Murtagh for assistance with parallel VTK and C. Holmes for assistance with SGI vizserver as well as Betsy Dowd for help with CMT data sets. Work supported under DOE Contract DE-AC0298CH10886.
480
References 1. Abram, G. and Treinish, L. An extended data-flow architecture for data analysis and visualization. Proceedings of Visualization, 263, IEEE Computer Society Press (1995). 2. .Adler, P. M, Thovert, J.-F., Jones, K. W., Peskin, A. M., Siddons, D. P., Andrews, B., and Dowd, B. Determination of topology and transport in porous media using synchrotron computed microtomography (abstract). Presented at 1996 Spring Meeting of The American Geophysical Union, Baltimore, Maryland (1996). 3. Ahrens, J., Law, C, Shroeder, W., Martin, K., and Papka, M. A parallel approach for efficiently visualizing extremely large, time-varying datasets. LANL Technical Report LAUR-00-1620. 4. Ahrens, J., Martin, K., Geveci, B., Law, C, and Papka, M. Large-scale visualization using parallel data streaming. (preprint) 5. Bailey, M. Parallel RenderMan. http://www.sdsc.edu, (1996). 6. Crockett, T. W. Parallel rendering. In SIGGRAPH '98 "Parallel Graphics and Visualization Technology" Course #42. (1998) 7. Dowd, B. A., Andrews, A. B., Marr, R. B., Siddons, D. P., Jones, K. W., and Peskin, A. M. Advances in x-ray computed microtomography at the NSLS. Presented at 47th Annual Denver X-Ray Conference, Colorado Springs, Colorado, August 3-7, 1998. In Advances in X-Ray Analysis. Vol. 42, Plenum Publishing Corp., New York, New York. (1999) 8. Ellsworth D., Polygon Rendering for Interactive Visualization on Multicomputers, PhD thesis, University of North Carolina at Chapel Hill. (1997) 9. Smith, G., Andrews, B. ITD's BNL Visualization Lab, http://www.itd.bnl.gov/ visualization/vis3.html, ITD, BNL, Upton, NY (1997).
481
AUTHOR INDEX
Abdul Rahman Araby, N., 261 Akama, A., 379, 391, 399, 405, 411 Antoniou, G., 221, 229 Ashton, K., 441 Barat, R., 283 Beidler, J., 213 Benigno, D., 87 Benyoub,A., 121 Bitincka, L., 221 Bloor, C , 87, 93, 105 Carmo, L.F., 9 Celik,L, 229 Chavananikul, V., 385 Chen, X., 449 Christou, C , 127 Cios, K., 433 Cockton, G., 87, 93 Coletta, M.L., 465 Daoudi,E.M., 121,177 Davis, B., 87 De Arriaga, F., 247 Delicato, F.C., 9 Doherty,E., 81,87 Dorronsoro, J.R., 239 Eachus, H.T., 111 Edelson, W., 255 Edoh, K.D., 423 El Alami, M., 247
Feng, H., 471 Fitzpatrick, E., 369 Gargano, M., 255 Gibson, R., 345 Gnanayutham, P., 93 Gutierrez, A., 201, 207, 269 Han,C.Y., 195 He, B., 311 He,Y., 255 Hicks, R.A., 465 Honan, P.J., 303 Hou,W.-C.,317 Hubey, H.M.,15, 71, 275, 455 Ishikawa, T., 379, 399, 405 Ivanov, L., 165 Ivanov, P., 15, 455 Jenq,J., 157, 185 Jones, K. W., 471 Jorgenson, J., 293, 333 Junker, A., I l l Kaneko, K., 275 Kawaguchi, A., 3 Keerthi, S.S., 31 Kendal, S., 441, 449 Kimm, H., 143 Koike, H., 379,391,399,405,411
482
Layton, H., 55 Lee, Keon-Myung, 433 Lee, Kyung-Mi, 433 Lejk, E., 105 Li, J., 293 LLW.N., 157, 185 Lopez, V., 239 Lorenz, J., 423 Luczaj, J. E., 195 Mabuchi,H., 379,391,411 Maj, S.P., 99, 359 Manneback, P., 177 Manikopoulos, C , 127,293,311,: McGuigan, M., 471 Middleton, W., 105 Moore, L., 55 Mowshowitz, A., 3 Murthy, K.R.K., 31 Murty,M.N.,31 Odusanya, A.A., 419 Perline, R., 465 Pirmez, L., 9 Pitman, E.B., 55 Prasad, B., 385 Prim, M., 23 Rizzo, J., 81, 87 Rosar, M., 63 Santa Cruz, C , 239 Seegmiller, S., 369
Shigeta,Y.,399,411 Sigiienza, J.A., 239 Sigura, I., 275 Singh, P., 261 Slanvetpan, T., 283 Smith, G.J., 471 Somolinos , A., 201, 207, 269 Spiletic, J., 471 Stevens, J.G., 283 Stephenson, G., 81 Su,M., 317 Sypniewski, B.P., 353 Tomaszewski, S., 229 Tureli, U., 303 Tuszynski, J., 41 Ucles, J., 333 Ugena, U., 247 Veal, D., 99, 359 Wang,D., 137, 151 Wang, H., 317 Zaritski, R., 55 Zbakh, M., 177 Zhang, H., 317 Zhang, P., 275 Zhang, Z., 333 Ziavras, S., 127
•l 1
ISBN 981-02-4759-1