ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA
ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA Zhejiang University is one of the leading universities in China. In Advanced Topics in Science and Technology in China, Zhejiang University Press and Springer jointly publish monographs by Chinese scholars and professors, as well as invited authors and editors from abroad who are outstanding experts and scholars in their fields. This series will be of interest to researchers, lecturers, and graduate students alike. Advanced Topics in Science and Technology in China aims to present the latest and most cutting-edge theories, techniques, and methodologies in various research areas in China. It covers all disciplines in the fields of natural science and technology, including but not limited to, computer science, materials science, life sciences, engineering, environmental sciences, mathematics, and physics.
Xingui He ShaohuaXu
Process Neural Networks Theory and Applications
With 78 figure s
'w:"
T
ZHEJIANG UNIVERSITY PRESS
mrjI*~lliJt&U
~ Springer
Authors Prof. Xingui He School of Electronic Engineering and Computer Science Peking University 10087 1, Beij ing, China E-mail: hexg @cae.cn
Prof. Shaohua Xu School of Electronic Engineering and Computer Science Peking University 100871, Beijing, China E-mail: xush62@ 163.com
Based on an original Chinese edition: ct~; if ~ ;;, I'l) ~ (Guocheng Shenjing Yuan Wangluo), Science Press, 2007.
<
>
ISSN 1995-6819 e-ISSN 1995-6827 Advanced Topics in Science and Technology in China ISBN 978-7-308-05511-6 Zhejiang University Press, Hangzhou ISBN 978-3-540-73761-2 e-ISBN 978-3-540-73762-9 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2008935452
© Zhejiang University Press, Hangzhou and Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. springer.com
Coverdesign: Frido Steinen-Broo, EStudio Calamar, Spain Printed on acid-free paper Springer is part of Springer Science-Business Media (www.springer.com)
Preface
The original idea for this book came from a conference on applications of agricultural expert systems, which may not seem obvious. During the conference , the ceaseless reports and repetitious content made me think that the problems the attendees discussed so intensely, no matter which kind of crop planting was involved, could be thought of as the same problem, i.e. a "functional problem" from the viewpoint of a mathematical expert. To achieve some planting indexes, e.g. output or quality, whatever the crop grown, different means of control performed by the farmers, e.g. reasonable fertilization, control of illumination, temperature, humidity, concentration of CO2, etc., all can be seen as diversified time-varying control processes starting from sowing and ending at harvest. They could just as easily be seen as the inputs for the whole crop growth process. The yield or the quality index of the plant can then be considered as a functional dependent on these time-varying processes. Then the pursuit of high quantity and high quality becomes an issue of solving an extremum of the functional. At that time, my research interest focused on computational intelligence mainly including fuzzy computing, neural computing, and evolutionary computing, so I thought of neural networks immediately . I asked myself why not study neural networks whose inputs and outputs could both be a time-varying processes and why not study some kinds of more general neural networks whose inputs and outputs could be multi-variable functions and even points in some functional space. Traditional neural networks are only used to describe the instantaneous mapping relationship between input values and output values. However, these new neural networks can describe the accumulation or aggregation effect of the outputs on the inputs on the time axis. This new ability is very useful for solving many problems including high-tech applications in agriculture and for elaborate description of the behavior of a biological neuron. The problems that the traditional neural networks solved are function approximation and function optimization , and the problems we need to solve now are functional approximation and functional optimization, which are more complicated. However, as a mathematician my intuition told me that there existed the possibility of resolving these problems with certain definite constraints and that there might be the prospect of broader applications in the future. In research during the following years, I was attracted by these issues. In addition to numerous engineering tasks (e.g. I had assumed responsibility in China for manned airship engineering) , almost all the rest of my time was spent on this study. I presented the
VI
Preface
concept of the "Process Neural Network (PNN)", which would be elaborated in this book. In recent years, we have done some further work on the theories, algorithms, and applications of process neural networks, and we have solved some basic theory issues, including the existence of solutions under certain conditions, continuity of the process neural network models, several approximation theorems (which are the theoretical foundations on which process neural network models can be applied to various practical problems), and we have investigated PNN's computational capability. We have also put forward some useful learning algorithms for process neural networks, and achieved some preliminary applications including process control of chemical reactions, oil recovery, dynamic fault inspection, and communication alert and prediction. It is so gratifying to obtain these results in just a few years. However, the research is arduous and there is a long way to go. Besides summarizing the aforementioned preliminary achievements, this monograph will highlight some issues that need to be solved. At the time of completing this book, I would like to express my sincere thanks to my many students for their hard work and contributions throughout these studies. Furthermore, I also wish to thank those institutes and persons who generously provided precious data and supported the actual applications.
Xingui He Peking University Beijing April, 2009
Contents
1 Introduction
1
1.1 Development of Artificial Intelligence
I
1.2 Characteristics of Artificial Intelligent System
5
1.3 Computational Intelligence
9
1.3.1 Fuzzy Computing
9
1.3.2 Neural Computing
12
1.3.3 Evolutionary Computing
12
1.3.4 Combination of the Three Branches
15
1.4 Process Neural Networks
16
References
17
2 Artificial Neural Networks
20
2.1 Biological Neuron
21
2.2 Mathematical Model of a Neuron
22
2.3 FeedforwardlFeedback Neural Networks
23
2.3.1 FeedforwardlFeedback Neural Network Model ..............................
23
2.3.2 Function Approximation Capability of Feedforward Neural Networks
25
2.3.3 Computing Capability of Feedforward Neural Networks
27
2.3.4 Learning Algorithm for Feedforward Neural Networks
28
2.3 .5 Generalization Problem for Feedforward Neural Networks
28
2.3.6 Applications of Feedforward Neural Networks
30
2.4 Fuzzy Neural Networks
32
2.4 .1 Fuzzy Neurons
32
2.4 .2 Fuzzy Neural Networks
33
VIII
Contents
2.5 Nonlinear Aggregation Artificial Neural Networks 2.5.1 Structural Formula Aggregation Artificial Neural Networks
35 35
2.5.2 Maximum (or Minimum) Aggregation Artificial Neural Networks 2.5.3 Other Nonlinear Aggregation Artificial Neural Networks
35 36
2.6 Spatio-temporal Aggregation and Process Neural Networks
37
2.7 Classification of Artificial Neural Networks
39
References ..
40
3 Process Neurons
43
3.1 Revelation of Biological Neurons
43
3.2 Definition of Process Neurons
44
3.3 Process Neurons and Functionals
47
3.4 Fuzzy Process Neurons
48
3.4.1 Process Neuron Fuzziness
49
3.4.2 Fuzzy Process Neurons Constructed using Fuzzy Weighted Reasoning Rule
50
3.5 Process Neurons and Compound Functions
51
References
52
4 Feedforward Process Neural Networks
53
4.1 Simple Model of a Feedforward Process Neural Network
53
4.2 A General Model of a Feedforward Process Neural Network
55
4.3 A Process Neural Network Model Based on Weight Function Basis Expansion 4.4 Basic Theorems of Feedforward Process Neural Networks
56 58
4.4.1 Existence of Solutions
59
4.4.2 Continuity
62
4.4.3 Functional Approximation Property................................................
64
4.4.4 Computing Capability ..................... ................ .................................
67
4.5 Structural Formula Feedforward Process Neural Networks
67
4.5.1 Structural Formula Process Neurons
68
4.5.2 Structural Formula Process Neural Network Model
69
4.6 Process Neural Networks with Time-varying Functions as Inputs and Outputs
71
4.6.1 Network Structure
71
Contents
4.6.2 Continuity and Approximation Capability of the Model .................
IX
73
4.7 Continuous Process Neural Networks
75
4.7.1 Continuous Process Neurons
76
4.7.2 Continuou s Process Neural Network Model. ..................................
77
4.7.3 Continuity, Approximation Capabil ity, and Computing Capability of the Model........................................ ...........................
78
4.8 Functional Neural Network
83
4.8.1 Functional Neuron
84
4.8.2 Feedforward Functional Neural Network Model ............................
85
4.9 Epilogue
86
References
87
5 Learning Algorithms for Process Neural Networks
88
5.1 Learning Algorithms Based on the Gradient Descent Method and Newton Descent Method
89
5.1.1 A General Learning Algorithm Based on Gradient Descent
89
5.1.2 Learning Algorithm Based on Gradient-Newton Combination
91
5.1.3 Learning Algorithm Based on the Newton Descent Method
93
5.2 Learning Algorithm Based on Orthogonal Basis Expansion
93
5.2.1 Orthogonal Basis Expansion of Input Functions
94
5.2.2 Learning Algorithm Derivation
95
5.2.3 Algorithm Description and Complexity Analysis
96
5.3 Learning Algorithm Based on the Fourier Function Transformation 5.3.1 Fourier Orthogonal Basis Expansion of the function in L
2[0,2n]....
5.3.2 Learning Algorithm Derivation 5.4 Learning Algorithm Based on the Walsh Function Transformation
97 97
99 lOl
5.4.1 Learning Algorithm Based on Discrete Walsh Function Transformation
101
5.4.2 Learning Algorithm Based on Continuous Walsh Function Transformation 5.5 Learning Algorithm Based on Spline Function Fitting
105 108
5.5.1 Spline Function
108
5.5.2 Learning Algorithm Derivation
109
5.5.3 Analysi s of the Adaptability and Complexity of a Learning Algorithm
III
X
Contents
5.6 Learning Algorithm Based on Rational Square Approximation and Optimal Piecewise Approximation............................................ ......
112
5.6.1 Learning Algorithm Based on Rational Square Approximation ...
112
5.6.2 Learning Algorithm Based on Optimal Piecewise Approximation
119
5.7 Epilogue
126
References
126
6 Feedback Process Neural Networks 6.1 A Three-Layer Feedback Process Neural Network
128 129
6.1.1 Network Structure
129
6.1.2 Learning Algorithm .
130
6.1.3 Stability Analysis
132
6.2 Other Feedback Process Neural Networks
135
6.2.1 Feedback Process Neural Network with Time-varying Functions as Inputs and Outputs 6.2.2 Feedback Proce ss Neural Network for Pattern Classification
135 136
6.2.3 Feedback Process Neural Network for Associative Memory Storage
137
6.3 Application Examples
138
References
142
7 Multi-aggregation Process Neural Networks
143
7.1 Multi-aggregation Process Neuron
143
7.2 Multi-aggregation Proces s Neural Network Model
145
7.2.1 A General Model of Multi-aggregation Process Neural Network
145
7.2.2 Multi-aggregation Process Neural Network Model with Multivariate Process Functions as Inputs and Outputs 7.3 Learning Algorithm
147 148
7.3.1 Learning Algorithm of General Models of Multi-aggregation Process Neural Networks
148
7.3.2 Learning Algorithm of Multi-aggregation Process Neural Networks with Multivariate Functions as Inputs and Outputs
152
7.4 Application Examples
155
7.5 Epilogue
159
Contents
References
XI
160
8 Design and Construction of Process Neural Networks
161
8.1 Process Neural Networks with Double Hidden Layers
161
8.1.1 Network Structure
162
8.1.2 Learning Algorit hm
163
8.1.3 Application Examples
165
8.2 Discrete Process Neural Network
166
8.2.1 Discrete Process Neuron
167
8.2.2 Discrete Process Neural Network
168
8.2.3 Learning Algorithm
169
8.2.4 Application Examples
170
8.3 Cascade Process Neural Network
172
8.3.1 Network Structure
173
8.3.2 Learning Algorithm
175
8.3.3 Application Examp les
176
8.4 Self-organizing Process Neural Network
178
8.4.1 Network Structure
178
8.4.2 Learning Algorithm
179
8.4.3 Application Examples
182
8.5 Counter Propagation Process Neural Network
184
8.5.1 Network Structure
185
8.5.2 Learning Algorithm
185
8.5.3 Determination of the Number of Pattern Classificatio ns
186
8.5.4 Application Examples
187
8.6 Radial-Basis Function Process Neural Network
188
8.6.1 Radial-Basis Process Neuron
188
8.6.2 Network Structure
189
8.6.3 Learning Algorithm
190
8.6.4 Application Examp les
192
8.7 Epilogue
193
References
193
9 Application of Process Neural Networks
195
9.1 Application in Process Modeling
195
9.2 Application in Nonlinear System Identification
198
XII
Contents 9.2.1 The Principle of Nonlinear System Identification
199
9.2.2 The Proces s Neural Network for System Identification
200
9.2.3 Nonlinear System Identification Process
201
9.3 Application in Process Control
203
9.3.1 Process Control of Nonlinear System
204
9.3.2 Design and Solving of Process Controller
204
9.3.3 Simulation Experiment
208
9.4 Application in Clustering and Classification
210
9.5 Application in Process Optimization
215
9.6 Applicat ion in Forecast and Prediction
216
9.7 Application in Evaluation and Decision
224
9.8 Application in Macro Control
226
9.9 Other Applications
227
References
231
Postscript
233
Index
238
1 Introduction
As an introduction to this book, we will review the development history of artificial intelligence and neural networks, and then give a brief introduction to and analysis of some important problems in the fields of current artificial intelligence and intelligent information processing. This book will begin with the broad topic of "artificial intelligence", next examine "computational intelligenc e", then gradually turn to "neural computing", namely, "artificial neural network s", and finally explain "process neural networks", of which the theories and applications will be discussed in detail.
1.1 Development of Artificial Intelligence The origins of artificial intelligence (AI) date back to the 1930s-1940s. For more than half a century, it can be said that the field of artificial intelligence has made remarkable achievements, but at the same time has experien ced many difficultie s. To give a brief description of artificial intelligence development, most events and achievements (except for artificial neural networks) are listed in Table 1.1. The main purpo se of artificial intelligence (AI) research is to use computer models to simulate the intelligent behavior of humans and even animals, to simulate brain structures and their function s, the human thinking process and its methods . Therefore, an AI system generally should be able to accompli sh three tasks : (a) to represent and store knowledge ; (b) to solve various problems with stored knowledge; (c) to acquire new knowledge when the system is running (that is the system has the capability of learning or knowledg e acquisition ). AI has been develop ing rapidly over the past 50 years. It has been widely and successfully applied in many fields, such as machine learning , natural language comprehension, logic reasoning, theorem proving , expert systems, etc. Along with the continuous extension of AI application fields and with the problem s to be solved becoming more and more complex, traditional AI methods based on a symbol processing mechanism encountered more and more difficulties
Process NeuralNetworks
2
Table 1.1 The milestones of artificial intelligence Date 1930s -
I940s
Leading players Frege, Whitehead , and Russell
1936
Turing
1946
Turing
1948
Shannon
1956
McCarthy et al.
1960
McCarthy
1964
Rubinson
1965
Zadeh
1965
Feigenbaum
1977
Feigenbaum
Description and significance of event or production Established mathematical logic system and gave us new ideas about computation Established automata theory, promoted the research of "thinking " machine theory, and proposed the recursive function based on discrete quantities as the basis of intelligent description Pointed out the essence of the theory "thinking is computing " and presented formal reasoning in the process of symbolic reasoning Established information theory which held that human psychological activities can be researched in the form of information , and proposed some mathematical models to describe human psychological activities Proposed the terminology "artificial intelligence" (AI) for the first time which marks the birth of Al based on symbol processing mechanism Developed the list processing language LISP which could deal with symbols conveniently and later was applied widely in many research fields of AI Proposed the inductive principle which marks the beginning of research into machine proving of theorems in AI Proposed the fuzzy set, and pointed out that the membership function can describe fuzzy sets, which marked the beginning of fuzzy mathematics research. Binary Boolean logic especially was extended to fuzzy logic Proposed an expert system which used normative logical structure to represent expert knowledge with enlightenment, transparency , and flexibility which was widely applied in many fields Proposed knowledge engineering that used the principles and methods of AI to solve application problem s. Established expert systems by develop ing intelligent software based on knowledge
with artificial intelligence technology when solving problems such as knowledge representation, pattern information processing, the combinatorial explosion, etc. Therefore, it has practical significance to seek a theory and method that have intelligent characteristics such as self-organization, self-adaptation, self-learning, etc., and which is suitable for large-scale parallel computation. Almost at the same time as the above research activities, some scientists were also seeking methods of representing and processing information and knowledge from different viewpoints and research domains. In 1943, the physiologist
Introduction
3
McCulloch and the mathematician Pitts abstracted the first mathematical model of artificial neurons [1] by imitating the information processing mechanism of biological neurons, which marked the beginning of artificial neural networks research based on connectionism. In 1949, the psychologist Hebb proposed the Hebb rule [2), which can achieve learning by modifying the connection intensity among neurons, and make the neuron have the ability to learn from the environment. In 1958, Rosenblatt introduced the concept of the perceptron [3]. From the viewpoint of engineering, this was the first time that an artificial neural network model was applied in information processing. Although the perceptron model is simple, it has characteri stics such as distributed storage, parallel processing, learning ability, continuous computation, etc. In 1962, Widrow proposed an adaptive linear element model (Adaline) [4J that was successfully applied to adaptive signal processing. In 1967, Amari implemented adaptive pattern classification [5] by using conferring gradients. The period from 1943 to 1968 can be considered as the first flowering of artificial neural networks research. In this period, there were many more important research achievements, but we have not listed all of them here. In 1969, Minsky and Papert published Perceptrons [6] , which indicated the limitation of function and processing ability of the perceptron , that it cannot even solve simple problems such as "Xor". The academic reputation of Minsky and the rigorous discussion in the book, led their viewpoints to be accepted by many people, and this made some scholars who had engaged in artificial neural networks earlier to tum to other research fields. Research in artificial neural networks came into a dormant period that lasted from 1969 to 1982. Although research in neural networks encountered a cold reception, many scholars still devoted themselves to theoretical research. They proposed lots of significant models and methods, such as Amari's neural network mathematical theory [7] (in 1972), Anderson et al:' s BSB (Brain-State-in-Box) model [8) (in 1972), Grossberg's adaptive theory [9] (in 1976), etc. In the early 1980s, the physical scientist Hopfield proposed a feedback neural network (HNN model) [10] (in 1982) and successfully solved the TSP (Traveling Salesman Problem) by introducing an energy function. Rumelhart et al. proposed the BP algorithm in 1986 that preferably solved the adaptive learning problem [111 of feedforward neural networks. From 1987 to 1990, Hinton [12], Hecht-Nielson [13], Funahashi [14] and Hornik et al. [15] separately presented the approximation capability theorem of multi-layer BP network which proved that multi-layer feedforward neural networks can approximate any Lz function. This theorem established the theoretical basis for the practical application of neural networks, and helped the theory and application of neural networks to mature gradually. Artificial neural networks came into a second flowering of research and development. In 1988, Linsker proposed a new self-organizing theory [16) based on perceptron networks, and formed the maximum mutual information theory based on Shannon 's information theory. In the 1990s, Vapnik and his collaborators proposed a network model called Support Vector Machine (SVM) [17-19J according to the structural risk minimization principle based on learning theory with a limited sample, and it was widely applied
4
Process Neural Networks
to many problems such as pattern recognition, regre ssion, density estimation, etc. In recent years, many novel artificial neural network models have been established and broadly applied in many areas such as dynamic system modeling [20,21 1, system identification [221, adaptive control of nonlinear dynamic systems [23,24 1, time series forecasting [251, fault diagnosis 1261, etc. In 2000, we published process neuron and process neural network (PNN) models after years of intensive study [27,28 1• The input signal s, connection weight s, and activation thresholds of process neurons can be time-varying functions, or even multivariate function s. Based on the spatial weighted aggregation of traditional neurons. an aggregation operator on time (or even more factors) is added to make the process neuron have the ability to process space-time multidimensional information. This expands the input-output mapping relation ship of the neural networks from function mapping to functional mapping, and greatly improves the expre ssion capability of neural networks . A series of basic theorems (including existence theorem, approximation theorem, etc.) of proce ss neural networks have been proved and some related theoretical problems have been solved . Practice shows that PNN models have broad applications in many actual signal processing problems relating to process. These will be the core content in this book . At present , there are thousands of artificial neural network models, of which there are more than 40 primary ones . The application scope of these models covers various fields including scientific computation, system simulation, automatic control, engineering applications, economics, etc., and they show the tremendous potential and development trends of artificial neural networks . However, most present neural networks are traditional neural networks with spatial aggregation and have no relation with time. Traditional AI methods based on symbol processing mechanisms and neural networks based on connectionism are two aspects of AI research , and each of them has its own advantages and limitations. We assume that the combination of both methods can draw strengths from each other to offset the weaknesses. For example, the setting and connection mode of neural network nodes (neurons) can definitely connect the solving goal with the input variables . We once observed that the specific reasoning rules can be considered as the network nodes (neurons) and "reasoning" can be converted into "computing". At the same time, according to the rules described by knowledge in the practical field, the connection mode and activation threshold among the network nodes can be properly chosen and modified to express more reasonable logical relationships among the described problem s, and the corresponding expert system can be designed in terms of a structure of a neural network. The term AI, as its name suggests, involve s making "intelligence" artificially, or even making an intelligent system. Its short-term goal is to implement intelligence simulation in an existing computer and endow the computer with some intelligent behavior, while its long-term goal is to manufacture an intelligent system and endow it with intelligence similar to (or perhaps exceeding in some aspects) that of animals or human beings . Using AI to study autocorrelation problems in the human brain seems to be a paradox in logic and involves complex recursive processes in
Introduction
5
mathematics, and is high in difficulty . The problem is that how the brain works might never be understood in some sense, because the brain itself is also changing and developing while people are studying it. If some aspects of the brain at some time were studied clearly, the brain function at that time might develop, the former state might change again, and this would not be the same as the original research objective . However, such a spiral research result is still very significant and can be applied in various practical problems . Therefore, we think that, on the one hand, AI should have a long-term research goal and this goal can be gradually approximated; on the other hand, we still need to propose various short-term goals and these goals should not deviate from practical applications to reach for that which is beyond our grasp. The development history of AI in this respect has already given us many lessons, which are worth remembering by AI researchers . In short, the development of Artificial Intelligence has experienced ups and downs during the past 60 years. Because of the increased demands in science fields and practical applications, we believe that AI will undergo further development, play a more important role in the advancement of science and technology through its role in tackling human and other problems that are difficult to solve with traditional method s at present , and that it will also make great contributions to producing intelligent systems for human beings in the future.
1.2 Characteristics of Artificial Intelligent System What system can be called an intelligent system? This is a question that we should answer before setting about researching intelligent systems. It can be said that we should set up a research goal. Of course, the understanding of this question changes dynamically , and we cannot answer it clearly in a moment. In fact, we can first find some rough answers from analysis of the intelligent behavior of biological systems .
(1) An intelligent system is a memory system From the perspective of neurophysiology, what is called memory is the storage capacity and the processing procedure for information obtained from the external or produced from the internal. There is a large amount of information that comes from the outside world, through sense organs, inwards to the brain . The brain does not store all the information that the sensory organ directly receives , but only stores the information obtained through learning or that is of some significance. Therefore, the intelligent system must have memory storage capacity; otherwise, it will lose the object and cannot store processing results, just as a person who has completely lost his memory will no longer have intelligence. In addition, the artificial intelligent system is not completely identical with the human brain; the latter has a powerful memory ability which will decrease gradually, so the former should simulate the latter in aspects of memory and forgetting in some way.
6
Process Neural Networks
(2) An intelligent system is a computation system Cognitive science considers that "cognition is computing": it combines intelligence with computation closely and forms a new concept-computational intelligence. What is called computation refers to the process by which we carry out various operations and combinations (including digital or analog) repeatedly using a certain symbol set according to some rules. The acquisition, representation, and processing of knowledge can all come down to the computation process . Therefore, an artificial intelligence system should also have this computing capability to accomplish the corresponding functions . In the Chinese language , there is an alias "electronic cerebra" for the computer, which is of great significance. Carrying out various digital or analog operations fleetingly is the strong point of the computer, so a computer is quite suitable for simulating some intelligent behaviors. However, there are some troubles and problems when we directly use current digital machines or analog machines to handle fuzzy information or qualitative data, and indeed sometimes they cannot handle it at all, so we expect to use a digital machine that has an analog operation component. Such a machine is different from a general digital/analog mixed machine, it should have a uniform digital/analog mixed memory in which to deposit the processing object, and its processor should possess a uniform mixed processing ability for this mixed information . We believe that research on the computing capability of an intelligent system is very important and worth strengthening, and that research and development of the computing capability of an intelligent system (such as fuzzy neural computation) will greatly promote basic research on intelligence, or even the whole development of computer science. (3) An intelligent system is a logical system Traditional logic is binary logic and it is adequately utilized in the von Neumann computer, but in fact, the reasoning logic of humans does not strictly abide by binary logic. Especially when the cognition of something is unclear or not completely clear, we only describe this by a qualitative or fuzzy concept, and handle this with a qualitative method or by fuzzy logic. Therefore, an artificial intelligent system should be able not only to carry out routine logical reasoning, but also to represent and process various qualitative and fuzzy concepts that are described by natural language, and then execute the corresponding qualitative or fuzzy reasoning . Consequently, an artificial intelligent system becomes a strong logical processing system. In addition to logical reasoning, the system should also be able to execute complex logical judgments, and adopt appropriate action or reaction according to the judgments. The current computer is competent for binary logic or finite multi-valued logic, but is helpless when it comes to some continuous-valued logic (for example, fuzzy logic) and qualitative logical reasoning. We need the above digital/analog unified processing hybrid computer to meet these demands . (4) An intelligent system is a perceptive system An important characteristic of a biological system is that it can perceive the outside
Introduction
7
environment through various sensory organs and acquire various bits of information, and make responses based on the information received. Many researches on artificial intelligent systems have been done to apperceive the outside environment by various sensors, e.g. a variety of robot systems . It should be said that this perception not only acquires information from the outside environment by sensors, but also pretreats the information. An artificial neural network perceptron, especially a multi-layer perceptron, has strong processing ability (for instance, the BP network with a single-hidden layer can approximate any functions on ~), and can complete this pretreatment. Perception is a "black box" problem and belongs at the bottom level of cognitive behavior. A neural network provides an effective approach to solving such a black box problem. Perception is the basis of an intelligent system to understand the outside environment, so its simulation system should also have this ability. (5) An intelligent system is an interactive system Biological systems need to interact with the outside environment. Here we do not consider physical interactions, but only discuss information and knowledge communication. Commonly , a biological system cannot complete the acquisition and processing of knowledge at one time; it often needs to supplement and continuously modify the acquired knowledge according to outside circumstances, and verify the correctness of the knowledge obtained from the outside environment to perfect itself. Thereby, in principle , interactivity of an artificial intelligent system is a necessary function. In seeking self-improvement according to a change in the environmental conditions or the practical requirements of users, the system must interact with the outside environment. When using the system, if a user wants to control the behavior of the system or give the system some necessary information at any moment, the user must demand that the intelligent system has its own interactive ability and a convenient man-machine (machine-machine) interactive interface and means. (6) An intelligent system is a learning system Learning is a process by which a biological system acquires knowledge through interaction with the outside environment and learning ability is an important factor in intelligence. There are different levels of learning ability. They range from the low-level information of conditioned reflex to the high level of imparting of language knowledge, so an artificial intelligent system should also be divided into different levels to simulate the learning process of the biological system. Learning and memorizing require interaction. The learning result needs to be memorized, while significant memory is acquired by learning training samples repeatedly . In the process of learning, knowledge can be acquired in two ways. One is to learn knowledge from teachers or judge the concept by specific hint information to accumulate and update knowledge. The other way does not need a teacher's guidance; it is "independent", which means that it can modify the stored knowledge in a neural system according to observation and learning from the environment to better accord with the inherent rules of the environment and essential characteristics of the outside
8
Process Neural Networks
environment. A system without learning ability cannot be called an intelligent system , but only a memorizer. It is becau se the intelligent system has learning ability that it is able to acquire knowledge from the outside constantly, ju st like a biological system. In addition , it can process acquired knowledge, reject useless or outdated knowledge, modify old knowledge, add new knowledge, and constantly improve its own intelligence level. The system can show strong adaptive ability and fault-tolerance becau se of its learning ability. At the same time, the system will not be paralyzed in case of a local breakdown or error , and will not suffer large deviation due to the interference of the outside environment. Consequently, it can improve its ability to adjust to the changes in the environment by learning constantly. (7) An intelligent system is a self-organizing system Self-organization, self-adaptation, and self-perfection are important characteristics of a biological system. From a macroscopic perspective, the nervous system in the brain of a biological system can not only memorize various acquired knowledge, but also understand new unknown information by self-learning, and adapt itself to variou s complex environments. From a microscopic perspective, the brain neural network system can reconstruct and reform its neural network in the process of adapting to the environment. Therefore, an artificial intelligent system should also have the characteristics of self-organization and self-adaptability, so that it can learn from an unknown environment or simulate independently some learning mechanism like competition, and adjust and reorganize its system structure properly . (8) An intelligent system is an evolutionary system Learning and evolution are two concepts that are interrelated but different from each other , because learning is an individual behavior, while evolution is a sort of group behavior. Because an intelligent system has learning ability , each individual in an intelligent system can acquire experience and knowledge through interaction with the constantly changing environment so as to adapt to the changes . However, the learning abilities of various individuals are different. As a biological group, they also adjust themselves constantly to the change in the environment and change their functions from simple to complex and from low-clas s to high-class. This development process is just the so-called "evolutionary proces s". Similarly, self-organization is also just individual behavior, but it supports the evolution of the whole species together with learning ability. The group in an artificial intelligent system should have the ability to simulate the process of biological evolution ; therefore, an intelligent system is an evolutionary system. In virtue of its evolutionary ability, the intelligent system group can con stantly improve its adaptation to the environment, and thus it has a strong competitive ability . (9) An intelligent system is a thinking system Thinking is a brain function unique to Primate s, and only human being s have real
Introduction
9
thinking ability. Thinking is generally divided into logical thinking and image-based thinking that are controlled by the two hemispheres of the brain respectively . In a narrow sense, thinking is often equal to association; in a broad sense, thinking can be considered as various activities and abilities of the brain. The characteristics of an intelligent system outlined above, such as memory ability, computing capability , logical reasoning ability, perception ability, interaction ability, learning ability, self-organizing ability, the evolutionary characteristic, etc., can all be considered as the basis of the brain 's more advanced thinking activity. It is the ideal and aim of artificial intelligence scholars to achieve an intelligent system with thinking . Though this aim is great and there are many difficulties to be encountered and the way to go is still considerable, we believe that as long as we propo se a reasonable intermediate targets and search for the correct way persistently, the great aim will be realized gradually .
1.3 Computational Intelligence Biological species make progress and are optimized by natural competition. How artificial intelligence simulates this evolutionary process is worth studying. For example, evolutionary computations simulate the process of biological evolution in nature, and there are some highly parallel and multi-directional optimization algorithms that can overcome the fatal weakness that a single locus descent algorithm easily falls into a local extremum. In recent years, research and application results of various genetic algorithms and evolutionary algorithms have attracted great attention in the artificial intelligence field. Computational intelligence is a quite active and relatively successful branch of the artificial intelligence field at present. Computational intelligence is a subject that acquires and expresses knowledge and simulates and implements intelligent behavior by means of computing . At present, the three most active fields in computational intelligence are fuzzy computing, neural computing, and evolutionary computing, as well as the combination and mutual mingling of them.
1.3.1 Fuzzy Computing Fuzzy computing is based on fuzzy set theory. It starts with a domain and carries out various fuzzy operations according to certain fuzzy logic and reasoning rules. (1) Fuzzy set and fuzzy logic In 1965, while researching the problem that in the objective world there are lots of fuzzy concepts and fuzzy phenomena which are difficult to describe by classic binary logic or finite multi-valued logic, Zadeh proposed fuzzy set theory [29J which
10
Process Neural Networks
provided a cogent descriptive and analytical tool and opened a scientific way forward for solving fuzzy problems. In fact, fuzzy logic is a method for solving and analyzing inaccurate and incomplete information. Using a fuzzy set, human thinking and reasoning activities can be simulated more naturally to a certain extent. A fuzzy set has flexible membership relations and allows an element to belong partly to the set, which means that the membership of an element in a fuzzy set can be any value from oto 1. In this way, some fuzzy concepts and fuzzy problems can be expressed easily and reasonably in a fuzzy set. Logic is the theoretical basis of a human being's thinking and reasoning, and is the science of the relationship between antecedent and conclusion. In fact, people often handle such logical reasoning where the relationship between the antecedent and conclusion is not clear but includes various fuzziness . Therefore, logic is divided into precise logic and fuzzy logic. Abstractly speaking, any logic can be regarded as an algebra whose elements are conjunctive logical formulas with certain true values and whose operations are composed of some logical operations (such as "and", "or", "not") and reasoning rules (such as syllogism) . Each logic has some axioms to reason whether a conjunctive logical formula is a theorem of this logic or not. In artificial intelligence, we often adopt some rules that express the relationship between antecedent and conclusion to describe certain knowledge, and then adopt logical reasoning or computing to solve problems. Fuzzy computing generally refers to various computing and reasoning methods including fuzzy concepts. For example, suppose that there are K fuzzy if-then rules, and the form of the rule k is as follows . If Xl is AkJ, Xz is A kZ, ... , and Xn is A km then Yl is B k" yz is B kZ, ... , and Ym is B km, where A ki and Bkj are the fuzzy sets in the universe of discourse V i and V; respectively, X=(Xl,xZ,. .. ,xn)TE V1xVZx.. . xU; and Y=(YJ,Yl," .,Ym)TE V,xV1x ... xv; are respectively the inputs and outputs of the fuzzy logical system. The above reasoning process can be completed by fuzzy computing . (2) Weighted fuzzy logic In traditional fuzzy logic, if there are multiple antecedents, the true value of the antecedent conjunction is generally defined as the minimum of all true values of sub-formulas . Although this fuzzy logic reflects some objective principles to a certain degree, sometimes it does not correspond with the practical situation. Often in the reasoning process, the degree of importance of each antecedent to a conclusion is different, and traditional fuzzy logic cannot embody the relative degree of importance of each sub condition . To solve this problem, we proposed weighted fuzzy logic in 1989 [301• A weighted fuzzy logic can be denoted by a 4-tuple : WFL={E,A,O,R} where E denotes a set of atomic logical formulas ; O={ negation, weighted conjunction, implication} where a weighted conjunctive logical formula is the formula which starts from E and executes the operations in for finite times; A denotes a set made up of some weighted conjunctive logical formulas and is called an axiom set; R={the first syllogism, the second syllogism}. The theorem in weighted fuzzy logic is the weighted conjunctive logical formula obtained by repeatedly carrying out reasoning
°
Introduction
11
rules in R for finite times starting from A. The reasoning rules of syllogism are described as follows. The first syllogism reasoning rule: given that the truth degree of the logical formula Xi is T(Xi) (-I:ST(x;):SI; ;=1,2,.. .,n) and that the truth degree of the weighted n
implication i::' WiXi ~ Y is
I
n
TC i::, wix ~ y) j
n
where
Wj
=I,
then the truth degree
j= 1
of the logical formula y is T( y) = T(~ WX. ~ y)x j='
I
,
I W xT(x). )=1
J
The second syllogism reasoning rule: when
(1.1)
J
TC;
,- I
WjXj
~ y)+ IW)XT(x))~I, ) =1
the truth degree of the logical formula y is n
n
T(y)=T(i::,WjXj ~ y)+ ~W)XT(X)-1.
But when
T(~ WjXj ~ y)+ IW)XT(x))
)=,
(1.2)
, we cannot use this weighted
implication to reason. When computing the true value of the weighted conjunction, we adopt arithmetic addition CD and arithmetic multiplication (x) which in fact can be replaced by proper "union" and "intersection" operations, such as "max" and "min" , to get various kinds of "weighted fuzzy logic" . (3) Fuzzy computational logic In weighted fuzzy logic, by the introduction of the weighted conjunction operation, the problem of losing information after the conjunction operation of a common set is solved to some degree, such that it can express the inequality of the influence of the truth degree of each sub-item in the conjunction formula on the truth degree of the whole conjunction formula. In 1990, based on weighted fuzzy logic employing the idea that "reasoning is computing" in logic, we introduced fuzzy computational logic with stronger expression ability [3 IJ. Fuzzy computational logic is also denoted by a 4-tuple: FCL={E,A,O,R}, where E is a set containing atomic logical formulas, A is a set of group of axioms, O={T, l\,v,--t,V,3,Ef>,&}, and R is a set ofreasoning rules such as fuzzy syllogism. We can obtain all the logical conjunctive formulas of fuzzy computational logic if we start from E and execute the operations in for finite times, and get theorems in this logic by using syllogism to reason repeatedly starting from A. The expression ability of fuzzy computational logic is very strong and can be used to describe and denote various fuzzy knowledge .
°
12
Process Neural Networks
1.3.2 Neural Computing Neural computing is inspired by biology, a parallel and non-algorithmic information process ing model established by imitating the information processing mechanism of a biological neural system. Neural computing presents the human brain model as a non-linear dynamic system using an interconnected structure, i.e. an artificial neural network simulates the human brain mechanism to implement computing behavior. In this interconnected mechanism, it is unnecessary to establish an accurate mathematical model in advance. The solving knowledge of the artificial neural network is denoted by the distributed storage of connection weights among a great many interconnected artificial neurons, and the input-output mapping relationship is established by learning samples in given sample sets. At present, various artificial neurons and artificial neural networks can be used as the model for neural computing, such as the MP neuron model, the process neuron model, the BP neural network, the process neural network, etc. In neural computing, there are two key steps, namely constructing a proper neural network model and designing a corresponding learning algorithm according to practical application s. It has already been proved that any finite problem (a problem that can be solved by a finite automatic machine) can be solved by a neural network and vice versa, so the solving capacity of a neural network is equal to a finite automatic machine. In a continuous situation, a multi-layer feedforward neural network can approximate any multivariate functionfRn---+Rm in ~(where R" is n-dimensional real number space). The neural computing problem will be expounded in detail at the back of this book.
1.3.3 Evolutionary Computing Many phenomena in nature or in the objective world may profoundly enlighten our research, and a very good example is the simulation of the law of biological evolution to solve some more complex practical problems . In this example, better solutions are gradually yielded by simulation of the natural law without describing all the characteristic s of the problem clearly. Evolutionary computing is just a generalized solving method based on the above thinking, which adopts simple coding technology to express complex structures, and guides the system to learn or determine the search direction through simple genetic operations and optimizing natural selection by a group of codes. Because evolutionary computing adopts the way a population organizes a search, it can search many regions in the solution space at the same time, and has intelligent characteristics, such as self-organization, self-adaptability and self-learning, and the characteristic of parallel processing. These characteristics mean that evolutionary computing has not only higher learning efficiency, but also some characteristics of simplicity, easy operation and generalization. Hence, it earns attention from a broad range of people. An evolutionary algorithm is a class of random search algorithms learned from natural selection and genetic mechanisms in the biological world. They mainly comprise three algorithms, namely the generic algorithm (GA), evolutionary programming (EP), and evolutionary strategy (ES), and they can be used to solve
Introduction
13
such problems as optimization and machine learning . Two primary characteristics of evolutionary computing are population search strategy and information exchange among individuals in a population. Because of the universality of the evolutionary algorithm, it has broad applications and is especially suitable for handling complex and non-linear problems that are difficult to solve by traditional search algorithms. Next, we will simply introduce GA, EP and ES. (1) Generic algorithm The generic algorithm (GA) [32J is a computing model simulating the biological genetic proce ss. As a global optimization search algorithm , it has many remarkable characteristics including simplicity and easy generalization, great robustness, suitability for parallel processing, wide application scope and so on. GA is a population operation that takes whole individuals in the population as objects. Selection, crossover and mutation are three main operators of GA, which constitute so-called genetic operations that other traditional algorithms do not possess . GA mainly involves five basic elements: (a) the coding of individual parameters; (b) the enactment of the initial population; (c) the design of the fitness function; (d) the design of the genetic operation; (e) the enactment of the control parameter (mainly referring to the scale of the population, the probability of genetic operation on individuals in a population, etc.). These five elements constitute the core content of GA. In nature, although the evolutionary and genetic proce ss is infinite and endles s, a criterion for algorithm termination must be given to a learning algorithm and at that time the individual with maximal fitness value in the population serves as the solution to the problem. In GA, the execution sequence for the operations of selection, crossover and mutation can be serial or parallel. The flow chart is shown in Fig . 1.1. Many researchers have improved and extended Holland's basic GA according to practical application requirements. GA has been broadly applied to lots of fields, such as function optimization, automatic control, image recognition, and machine learning, etc. [33-37] and has become one of the common algorithms in computational intelligence technology. Coding and form ing of init ial population Detection and evaluation of individual's fitnes s in the population
Yes
Fig. 1.1 The GA flow chart
14
Process Neural Networks
(2) Evolutionary programming
The evolutionary programming (EP) method was first proposed by Fogel et al. in the 1960s [381• They thought that intelligent behavior should include the ability to predict surrounding states and make a proper response in terms of a determinate target. In their research, they described the simulated environment as a sequence composed of symbols from a finite character set and expected the response to be the current symbol sequence evaluated to obtain the maximum income. Here the income is determined as the next-arising symbol and its predefined benefit target in the environment. In EP, we often use a finite state machine (FSM) to implement such a strategy, and a group of FSMs evolve to give a more effective FSM. At present, EP has been applied in many fields such as data diagnosis , pattern recognition , numerical optimization , control system design, neural network training, etc., and has achieved good results. EP is a structured description method and it is essential to describe problems by a generalized hierarchical computing program. This generalized computing program can dynamically change its structure and size in response to the surrounding state, and has the following characteristics when solving problems : (a) The results are hierarchical; (b) With the continuing evolution, the individual constantly develops dynamically towards the answers; (c) The structure and size of the final answers need not be determined or limited in advance, because EP will automatically determine them according to the practical environment; (d) The inputs, intermediate results, and outputs are the natural description of problems, and the preprocessing of input data and the post-processing of output results are not needed or needed less. Many engineering problems can come down to the computer programs producing corresponding outputs for given inputs, so EP has an important application in practical engineering field [39-45 J• (3) Evolutionary strategy
In the early 1960s, when Rechenberg and Schwefel carried out wind tunnel experiments, the parameters used to describe the shape of the test object were difficult to optimize by traditional methods during design, and so they adopted the idea of biological mutation to change the values of parameters randomly and obtain ideal results. Thereafter, they carried out an in-depth study and development of this method and formed another branch of EP, which is evolutionary strategy (ES) [461• Currently, ES mainly has two forms: (u+A.) selection and (u) selection. (u+A.) ES produces A. individuals from J1 individuals in the population by means of mutation and crossover, and then compares these J1+A. individuals so as to select J1 optimal individuals; (u,A.) ES selects J1 optimal individuals directly from newly produced A. (A.>J1) individuals. In contrast to GA, ES directly operates in the solution space, emphasizes
Introduction
15
self-adaptability and diversity of behavior from parents to offspring in the evolution process, and adjusts the search direction and step length adaptively.
1.3.4 Combination of the Three Branches Fuzzy systems, neural networks, and evolutionary algorithms are considered as the three most important and leading edge areas within the field of artificial intelligence in the 21st century. They constitute so-called intelligent computing or soft computing. All of them are theories and methods imitating biological information processing patterns in order to acquire intelligent information processing ability. Here, a fuzzy system stresses the brain's macro functions such as language and concept, and logically processes semantic information including fuzziness according to the membership functions and serial and parallel rules defined by humans. A neural network emphasizes the micro network structure of the brain, and adopts a bottom-up method to deal with pattern information that is difficult to endow with semantics using complex connections among large numbers of neurons according to a parallel distribution pattern formed by learning, self-organization, and non-linear dynamics. An evolutionary algorithm is a probabilistic search algorithm that simulates the evolutionary phenomena of biology (natural selection, crossover, mutation, etc.). It adopts a natural evolutionary mechanism to perform a complex optimization process, and can solve various difficult problems quickly and effectively. It can be said that for fuzzy systems, neural networks, and evolutionary algorithms, their goals are similar while their methods are different. Therefore, combining these methods can draw on their individual strengths to offset their weaknesses and form some new processing patterns. For example, the learning process of a neural network requires a search in a large space in which many local optimal points exist, so sometimes it is more difficult to solve a large-scale training problem for a neural network. Meanwhile a genetic algorithm is very suitable for carrying out large-scale parallel searches and can find a global optimal solution with high probability . Thus, we can improve the performance of the learning algorithm of a neural network by combining it with a genetic algorithm . Combining fuzzy logic with a neural network, we can construct various fuzzy neural network models that not only mimic a human being's logical thinking, but also can have a learning trait. For example, the fuzzy computing (reasoning) network proposed by us in 1994 can execute a fuzzy semantic network and soakage computing [471• Furthermore, the combination of a neural network and a genetic algorithm can construct a neural network whose connection weights evolve continually with change in the environment, and it can much more vividly simulate biological neural networks. This continually evolving neural network can do various things in operation : (a) Perceive change in the environment, changing its network parameters correspondingly via evolution (e.g. by adopting an evolutionary algorithm) , and finding a new network structure and learning algorithm (the key lies in giving the
16
Process Neural Networks
algorithm or structure a proper coding (gene), as well as in the evaluation method for network performance); (b) When the network performance cannot meet demand , it automatically starts some learning algorithm, improves the parameters or structure of the network, and enhances its self-adaptability. Subject crossing or combination can often lead to the discovery of new technologies and methods and lead to innovation. For example, we can combine fuzzy systems, neural networks, and evolutionary algorithms and establish a fuzzy neural network with evolutionary capability to implement and express human intelligent behavior effectively.
1.4 Process Neural Networks At present, most artificial neural network models with actual values are constructed based on the MP neuron model, and the system inputs are time-unrelated constants, i.e. the relationship between the inputs and outputs of networks is instantaneous corresponding to relationships of a geometric point type. However, research from biological neurology indicates that the output change of a synapse is affected by the relative timing of the input pulse in a biological neuron and depends on the input process lasting for a certain time. Furthermore, in some practical problems, inputs of many systems are also a process , or functions depending on spatial-temporal change, or even multivariate functions relying on multiple factors; the system outputs are relative not only to current inputs, but also to a cumulative effect over a period of time. When we use a traditional neural network model to solve the issue of inputs and outputs of a time-varying system, the common method is to deal with it after converting the time relation to a spatial relation (time series). However, this will result in rapid expansion of the network size, and actually traditional neural networks still have difficulty in solving learning and generalization problems for large numbers of samples. At the same time, doing it like this makes it hard to satisfy real-time demands of the system and to reflect the cumulative effect of time-varying input information on the output. With these problems in mind, we proposed and established a new artificial neural network model , a process neural network (PNN), by extending traditional neural networks to the time domain or even the multi-factor domain. PNN can directly deal with processing data (time-varying functions) and has easy adaptability for solving many practical process-related problems . In this monograph, we will discuss the process neural network in depth, and will study its various theories, interrelated algorithms and various applications, and addres s a variety of unresolved issues that need further research . Finally, we will extend a neural network to a generalized abstract space, i.e. regard a neural network as a special mapping between points in different (or the same) abstract spaces, and
Introduction
17
consequently un ify all kinds of neural network models proposed by mathematicians in the past.
References [I] McCulloch W.S. , Pitts W.H. (1943) A logical calculus of the ideas imminent in neuron activity. Bulletin Mathematical Biophysics 5(1) ;115-133 [2] Hebb D.O. (1949) The Organization of Behavior: A Neuropsychological Theory. Wiley , New York [3] Rosenblatt F. (1958) Principles ofNeuro-Dynamics. Spartan Book s, New York [4] Widrow B. (1962) Generalization and Information Storage in Networks of Adaline Neurons. In: Self-Organizing Systems . Spartan, Washington DC, pp.435-46I [5] Amari S.A. (1967) Theory of adaptive pattern classifiers. IEEE Transaction Electronic Computers 16(3):299-307 [6] Minsky M.L., Papert SA (1969) Perceptrons. MIT Press , Cambridge MA [7] Amari S. (1972) Characteristics of random nets of analog neuron-like elements. IEEE Transaction on Systems, Man, Cybernetics 5(2):643-657 [8] Anderson lA . (1972) A simple neural network generating interactive memory. Mathematical Biosciences 14:197-220 [9] Grossberg S. (1976) Adaptive pattern classification and universal recoding . I: Parallel development and coding of neural feature detectors. Biological Cybernetics 23(3) :121-134 [10] Hop field J.J . (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science, U.S.A. 79(2) :554-558 [II] Rumelhart D.E., Hinton G.E. , Will iams R.J. (1986) Learning representations of back-propagating errors. Nature 323(9):533-536 [12] Hinton G.E., Nowlan S.J. (1987) How learning can guide evolution. Complex systems 1(3):495-502 [13] Hecht -Nielsen R. (1989) Theory of the back-propagation neural network. Proceedings of the International Joint Conference on Neural Networks 1:593-605 [14] Funahashi, K. (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3) :183-192 [15] Hornik K., Stinchcombe M., White H. (1990) Univer sal approximation of an unknown mapping and its derivatives using multilayer feedforword networks. Neural Network s 3(5):551-560 [16] Linsker R. (1988) Towards an organizing principle for a layered perceptual network. Neural Information Processing Systems 21(3):485-494 [17] Boser B.E., Guyon LM., Vapnik V.N. (1992) A training algorithm for optimal margin classifiers. In: Haussler D., Ed . Proceedings of the 5th Annual ACM
18
Process NeuralNetworks
Workshop on Computational Learning Theory. ACM Press, Pittsburgh, PA, pp.144-152. [18] Vapnik V.N. (1995) The Nature ofStatistical Learning Theory. Springer, New York [19] Vapnik V.N. (1998) Statistical Learning Theory. Wiley, New York [20] Han M., Wang Y. (2009) Analysis and modeling of multivariate chaotic time series based on neural network. Expert Systems with Applications 36(2):1280-1290 [21] Abdelhakim H., Mohamed E.H.B., Demba D., et al. (2008) Modeling, analysis, and neural network control of an EV electrical differential. IEEE Transactions on Industrial Electronics 55(6):2286-2294 [22] Al Seyab R.K., Cao Y. (2008) Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation . Journal ofProcess Control 18:568-581 [23] Tomohisa H., Wassim M.H., Naira H., et al. (2005) Neural network adaptive control for nonlinear nonnegative dynamical systems. IEEE Transactions on Neural Networks 16(2): 399-413 [24] Tomohisa H., Wassim M.H., Naira H. (2005) Neural network adaptive control for nonlinear uncertain dynamical systems with asymptotic stability guarantees. In: 2005 American Control Conference pp.1301-1306 [25] Ghiassi M., Saidane H., Zimbra D.K. (2005) A dynamic artificial neural network model for forecasting time series events. International Journal of Forecasting 21(2):341-362 [26] Tan Y.H., He Y.G., Cui c., Qiu G.Y. (2008) A novel method for analog fault diagnosis based on neural networks and genetic algorithms. IEEE Transactions on Instrumentation and Measurement 57(11) :1221-1227 [27] He X.G., Liang J.Z. (2000) Process neural networks. In: World Computer Congress 2000, Proceedings of Conference on Intelligent Information Processing. Tsinghua University Press, Beijing, pp.143-146 [28] He X.G., Liang J.Z. (2000) Some theoretical issues on procedure neural networks. Engineering Science 2(12):40-44 (in Chinese) [29] Zadeh L.A. (1965) Fuzzy sets. Information and Control 8:338-353 [30] He X.G. (1989) Weighted fuzzy logic and wide application . Chinese Journal of Computer 12(6):458-464 (in Chinese) [31] He X.G. (1990) Fuzzy computational reasoning and neural networks . Proceedings of the Second International Conference on Tools for Artificial Intelligence . Herndon, VA, pp.706-711 [32] Holland 1. (1975) Adaptation in Natural and Artificial Systems . Univ. of Michigan Press, Ann Arbor [33] Malheiros-Silveira G.N., Rodriguez-Esquerre V.F. (2007) Photonic crystal band gap optimization by generic algorithms. Microwave and Optoelectronics Conference, SBMOIIEEE MTT-S International pp.734-737 [34] Feng X.Y., Jia J.B., Li Z. (2000) The research of fuzzy predicting and its application in train's automatic control. Proceedings of the 13th International Conference on
Introduction
19
Pattern Recognition pp.82-86 [35] Gofman Y., Kiryati N. (1996) Detecting symmetry in grey level images: the global optimization approach. Proceedings of 2000 International Workshop on Autonomous Decentralized Systems 1:889-894 [36] Fogarty T.e. (1989) The machine learning of rules for combustion control in multiple burner installations. Proceedings of Fifth Conference on Artificial Intelligence Applications pp.2l5-221 [37] Matuki T., Kudo T., Kondo T. (2007) Three dimensional medical images of the lungs and brain recognized by artificial neural networks. SICE, Annual Conference pp.lll7-1121 [38] Fogel LJ., Owens AJ., Walsh MJ. (1966) Artificial Intelligence Through Simulated Evolution . Wiley, New York [39] Swain AX, Morris A.S. (2000) A novel hybrid evolutionary programming method for function optimization . Proceedings of the 2000 Congress on Evolutionary Computation 1:699-705 [40] Dehghan M., Faez K., Ahmadi M. (2000) A hybrid handwritten word recognition using self-organizing feature map, discrete HMM, and evolutionary programming . Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks 5:515-520 [41] Li X.L., He X.D., Yuan S.M. (2005) Learning Bayesian networks structures from incomplete data based on extending evolutionary programming . Proceedings of 2005 International Conference on Machine Learning and Cybernetics 4:2039-2043 [42] Lieslehto 1. (2001) PID controller tuning using evolutionary programming . Proceedings of the 2001 American Control Conference 4:2828-2833 [43] Li Y. (2006) Secondary pendulum control system based on genetic algorithm and neural network. IEEE Control Conference pp.l 152-1155 [44] Jose J.T., Reyes-Rico C, Ramirez J. (2006) Automatic behavior generation in a multi-agent system through evolutionary programming . Robotics Symposium, IEEE 3rd Latin American pp.2-9 [45] Gao W. (2004) Comparison study of genetic algorithm and evolutionary programming. Proceedings of 2004 International Conference on Machine Learning and Cybernetics 1:204-209 [46] Back T., Hoffmeister F., Schefel H.P. (1991) A survey of evolution strategies. Proceedings of the Fourth ICGA. Morgan Kaufmann Publishers, Los Altos, CA, pp.2-9 [47] He X.G. (1996) Fuzzy reasoning network and calculation inference . Journal of Software (10):282-287 (in Chinese)
2 Artificial Neural Networks
The modern computer has strong computing and information processing capabil ities, and it can be said that the modern computer has alread y exceeded the capabilities of the human brain . It plays an important function in human society in the fields of human daily life, production, and scientific research. However, current computer hardware and software systems are still based on von Neumann architecture. They can only mechanically solve actual problems by using predefined programs, and their capability is less than that of humans when solving certain problems, such as adaptive pattern recognition, behavior perception, logical thinking, analy sis and processing of incomplete and fuzzy information, independent decision-making in a complex environment, etc. What is more, they lack the mechanism and capability for adaptive learning from the environment and active adaptation to the environment. Neurology research indicates that the human brain is an information-processing network system formed by the complex and mutual connection of a huge number of basic units (biological neurons) and the network system is highly complex, nonlinear, and uncertain and has a highly parallel processing mechani sm. Each neuron cell is a simple information-processing unit, whose state is determined by self-conditions and the external environment. It has a definite input-output transformation mechanism. The human brain has capabilities such as memorizing, computing, logical reasoning and thinking, perception and learning from the env ironment, evolving with the environment, etc. Therefore, by imitating the organizational structure and the running mechani sm of the human brain, we seek new information denotation, storage, and proce ssing method s and construct a new information-processing system, which is closer to human intelligence, to solve the problems that are difficult to solve by using traditional methods. This will greatly extend the application areas of computers and promote the advancement of science. This will also provide a tentative way to explore a completely new computer system.
Artificial Neural Networks
21
2.1 Biological Neuron The biological brain is a complex interconnected network that is made up of billions of nerve cells (neuron s). The human brain has approximately 1010_10 11 neurons . Each neuron is interconnected with 103-105 other neurons (including itself), The brain is a huge and compl ex network system . In general, the structure of the neuron can be divided into three parts : soma, dendrites, and axon, as depicted in Fig. 2.1 11,21•
VI
I ({
Dendrites
r'
Synapses
Soma
\
Axon
Fig. 2.1 Biological neuron
To one side of the soma are many dendrite s form a tree shape; on the other side of the soma is the axon . Many branches of the axon connect with dendrites from other neurons. The junction between the axon branch and the dendrites is called a synap se. A neuron accepts electrical or biochemical transm issions from axon branches of other neuron s via a dendrite (input) . After weighted processing by the corre sponding synapses, each input signal undergoes aggregation superposition and non-linear activation at an axon hillock at the back of the soma . Under certain conditions (for example, the intensity of an aggregation signal exceeds a certain threshold value), it generates an output signal by activation. This signal is transferred to other neurons connected to it by branches of the axon, and then turns to the next information-processing process. The synapses of the neuron are the key unit in the construction of neural information proce ssing ; they not only transform an input pulse signal into a potential signal, but they also have an experience memory function, and can carry out weighted processing on the input signal accord ing to memory . The differences in information-processing methods between the brain system and von Neumann architecture are as follows : (a) Their information storage modes are different. The biological brain does not have a separate and centralized storage or arithmetic unit; while each neuron combines the functi on s of storage and computing. Various kind s of information are distributed and stored in the synapses of different neurons, and various kinds of information processing in fine-grained distribution are completed by numerous neurons. (b) The biological brain does not need a program for solving problems, that is, it does not create a model in advance when solving practical problems, but directly changes the memory parameters (connection weights) of the synapses of a neuron to
22
Process Neural Networks
acquire the knowledge for solving certain problems by learning. (c) The information (a processing object) processed by the biological brain is not completely certain and accurate, but has obvious fuzziness and randomness. The processing object can be either a discrete quantum, or a continuous quantum. (d) The processing method used by the biological brain can be a digital method, an analog method, or a digital/analog (D/A) organic mixed method, and also a random processing method. Therefore , the brain and the current computer have great differences in information processing methods. With the addition of the random processing method, and D/A mixed processing, the whole process becomes complex, and it is usually non-repeatable . (e) The switching time of a brain neuron is several milliseconds (of the order of 10-3 s), which is millions of times longer than that of a current computer (of the order of 10-10 s). However, the human brain can produce an accurate response to a complex activation in less than one second. This indicates that although the processing and transmission speed of a single neuron are rather slow, the brain can respond quickly due to its high parallelism. The brain is made up of many simple neurons and is very simple in microstructure, but it can solve very complex problems. More incredibly, the brain has stupendous creativity, and this is worth noting by learners of artificial intelligence. It is certain that we can learn much from research on the structure of the brain, e.g. artificial neural networks. Now let us start by providing a mathematical model of a neuron.
2.2 Mathematical Model of a Neuron In the above, we have simply analyzed the structure and information-processing mechanism of the biological neuron, to provide biological bases for constructing the mathematical model of an artificial neuron. Obviously, it is impossible to simulate factually various characters of the biological neuron in a current computer , and we must make various reasonable simplifications . In current research on the neural network, the neuron is the most essential information-processing unit of the neural network. Generally, the mathematical model can be depicted as in Fig. 2.2. Wlj
X2 - - - ' - - - + {
Fig. 2.2 Artificial neuron model
w ij
In Fig. 2.2, Xi (i= I,2,...,n) is the input signal of n external neurons to a neuron j; is the connection weight between the ith external neuron and the neuron j ; OJ is
Artii'icial NeuralNetworks
23
the activation threshold of the neuron j ; f is the activation function (also called an effect function, generally non-linear); Yj is the output of this neuron. The relationship between the inputs and the output of a neuron is (2.1)
where f can be a non-linear activation function, such as a Sign function or a continuous Sigmoid function . It can be seen from the above that the mathematical model of a neuron preferentially simulates information-processing by a biological neuron to a certain extent, but this has two disadvantages : (a) The information-processing does not refer to time. There is no time-delay between the inputs and the outputs. The relationship between the inputs and the outputs is a momentary corresponding relationship. (b) The accumulation effect of the inputs on the outputs has not been taken into consideration . A certain momentary output just depends on the current inputs without reference to earlier inputs. Nevertheless, in order to discuss and research conveniently , we first consider this kind of simple neuron model and its corresponding neural network.
2.3 Feedforward/Feedback Neural Networks Various artificial neural networks are constructed by connecting together several artificial neurons according to a particular topological structure. At present, there are tens of primary neural network models. According to the connection method among the neurons and the different information directions in the network, neural network models can be divided into two kinds. One is a feedforward neural network that has only forward information transfer, but no feedback information. The other is a feedback neural network that has not only forward transfer of information, but also reverse transfer (feedback) information.
2.3.1 FeedforwardlFeedback Neural Network Model A feedforward neural network is made up of one input layer, several middle layers (hidden layers) and one output layer. A typical structure with a single hidden layer is shown in Fig. 2.3. A feedforward neural network may contain several middle hidden layers. The neurons of each layer only accept output information coming from the neurons of the forward layer.
24
Process NeuralNetworks
Input layer
Hidden layer
Output layer
Fig. 2.3 A feedforward neural network with a single hidden layer
Each directed connection line among the neurons has one connection weight. The connection weight can be zero, which means that there is no connection. For simplicity and uniformity , in the diagram of a feedforward neural network, the neuron s of the previous layer are connected with all the neurons of the following layer. Any two neurons in a feedback neural network can be connected , including self-feedback of neurons . A typical structure is shown in Fig. 2.4. Output
Output Output
Input
Input
Input
Fig. 2.4 Feedback neural network
In Fig. 2.4, wij (solid line) is the connection weight for the forward transferring network nodes, and Vj i (dashed line) is the connection weight for the feedback transferring nodes of the network information . In the above network, each neuron does not always have initial input, and the connections between neurons are not complete connections. In a feedback neural network, the input signal will be repeatedly transferred among the neurons from a certain initial state, and after being transformed a few times, will gradually tend to either a particular steady state or a periodic vibration state. In neural networks research at present, the most popular and effective model is a feedforward neural network . It is quite successful in many domains, such as pattern recognition, classification and clustering , adaptive control and learning, etc. In neural networks research with a combination of feedforward and feedback ,
Artificial Neural Networks
25
due to the complexity of the structure, the problem of feedback information processing should be considered in the operation mode, and in some cases even time should be quantified. There are, therefore, many difficulties but few achievements . However, the information-processing mode of various animal brains belongs to this type, and various applications lead to a strong demand for research on feedback neural networks, so this research becomes imperative .
2.3.2 Function Approximation Capability of Feedforward Neural Networks When it is applied as a computing model, the computing capability of the artificial neural network and what sort of problems it is capable of solving should be considered first. Second, as learning by the neural network can be regarded as a special process of function fitting or approximation, and the neural network's solution of problems is generally inaccurate, the precision of its solution and the function approximation capability should be considered. An example of a MISO (multi-input-single-output) feedforward neural network with a single hidden layer is shown in Fig. 2.5.
y
Fig. 2.5 MISO feedforward neural network with single hidden layer
The relationship between the inputs and the output from the input layer to the hidden layer is (2.2)
The relationship of the inputs-output from the hidden layer to the output layer is (2.3)
Integrating Eqs. (2.2) and (2.3), the mapping relationship between the inputs and output of a feedforward neural network is
26
Process Neural Networks
(2.4)
In Eqs. (2.2)-(2.4), XI. X2, •••, X n are the multidimen sional inputs of the system; OJ (j= 1,2,...,m) is the output of the jth neuron in the hidden layer; f is the activation function of the hidden layers; OJ is the activation threshold of the jth neuron in the hidden layer; 0 is the activation threshold of the output neuron; g is the activation function of the output neuron. Obviou sly, the input-output relationship of feedforward neural networks can be considered as a mathematical function, and the problem of learning can be considered as a special problem of function approximation or fitting. The class of approximation function is a set that is composed of the above neural networks. Therefore, in order to explain the possibility that neural network models solve various application problems, the idea that the above models approximate input-output relationships (mathematical function relation ships) should be demon strated in theory. Otherwi se, there is no universality for solving problems. Hitherto, under certain conditions, many approximation theorem s for a neural network have already been proved. Now we refer to some of the famous theorem s. (1) Hecht-Nielson Approximation Theorem [3]
Suppose that Q is a bounded closed set. For any t:>O and any ~ function! Rn_Rm (R is the real number set) defined in Q, there exists a feedforward neural network with double hidden layers (shown in Fig. 2.6) such that IIf-yll
YI
XI
Y2
Ym First hidden layer Fig. 2.6 A feedforward neural network with double hidden layers used for L-z function approximation
(2) Hornik Approximation Theorem [4]
Hornik Theorem 1 Suppose that the activation function g(.) of hidden nodes is any continuou s non-constant function in R-R, then the three-layer feedforward neural network with adequate hidden layer nodes (shown in Fig. 2.7) can consistently approximate continuou s functions of any compact set DcRn with any precision.
Artificial Neural Networks
27
Y
Fig. 2.7 Structure of a three-layer feedforward neural network used for D-->R continuous function approximation
Hornik Theorem 2 Suppo se that the activation function g( .) of hidden nodes is any continuous non-constant function , then the three-layer feedforward neural network with adequate hidden layer nodes can approximate any fathomable function of R" with any precision. (3) Funahashi Approximation Theorem [5] Suppose that gO is a bounded , monotonou s increasing and continuous function; D is the compact subset (bounded closed set) in R"; F is a continuou s mapping : D---+Rm, then for any F and 00, there is a feedforward neural networkfwith k (k~3) hidden layers and a hidden layer activation function g(·):D---+Rm , such that max II f(x) - F (x ) II< E, xe D
where 11·11 is any norm in R". The structure of the network is shown in Fig. 2.8. Yl Yz
Yrn Fig. 2.8 A feedforward neural network with multiple hidden layers used for D-->Rrn approximation
2.3.3 Computing Capability of Feedforward Neural Networks Computing Capability Theorem The computing capability of a feedforward neural network is equivalent to that of a Turing machine. In 1995, Liu and Dai proved that the computing capability of the linear threshold unit neural network is equivalent to that of a Turing machine [61. As a linear threshold
28
Process Neural Networks
unit neural network is a quite simple feedforward neural network model, the comput ing capability of a feedforward neural network whose activation function adopts a Sigmoid function , a Gauss function, etc. will not be smaller than that of a Turing machine . On the other hand, the operat ional symbols used in a feedforward neural network are "+", ":", "I" and their compound operations, which can be completed by a Turing machine . Therefore, the computing capability of a feedforward neural network will not be greater than that of a Turing machine. Hence, the computing capability of a feedforward neural network is equivalent to that of a Turing machine .
2.3.4 Learning Algorithm for Feedforward Neural Networks The learning (or training) for a neural network is not simply a matter of memorizing the mapping relationship between the inputs and outputs among the learning samples, but of extracting the internal rules about the environment which are hidden in the sample by learning the finite sample data. At present , there are many learning algorithms of feedforward neural networks among which the error back-propagation algorithm (BP algorithm) and its various improved pattern s are applied most exten sively and effectively. A multi-layer feedforward neural network model which adopts the BP algorithm is generally called a BP network, and its learning process is made up of two parts: forward-propagation of input information and error back-propagation. Forwardpropagated input information is transferred to the output layer from the input layer after processing in the hidden layer. The state of each layer neuron only influences the state of neurons in the next layer. If it cannot obtain the expected output in the output layer, it shifts to back-propagation, and error signals are returned along the original pathway of the neural connection. In returning, the connection weight of each layer is modified one by one. Through successive iterations , the error between the expected output signals of the network and practical output signals of the system reaches an allowable range. A learning algorithm for a neural network is often related to certain function approximation algorithms, especially to some iterative algorithms that make the approximation error gradually smaller. In fact, the above-mentioned BP algorithm corresponds to a gradient descent algorithm in function approximation. Once we know this principle, we can construct various learning algorithm s for neural networks according to different function approximation algorithms.
2.3.5 Generalization Problem for Feedforward Neural Networks Generally , when modeling a certain object using a neural network, the input and output data samples of this object are divided into two groups: one group is called a learning sample; the other group is called a test sample. The learning sample is used for obtaining models by learning and training ; the test sample is used for testing the "generalization error" of the model by testing and learning . If the generalization
Artificial Neural Networks
29
error of the model is small, the generalization capability of the model is strong. On the contrary , if the generalization error of the model is big, then the generalization capability of the model is weak. The "approximation error" between the practical object and the model is described by the learning error and the generalization error . In fact, the generalization error of the model should refer to the error between the practical object and all possible input/output samples. Therefore, when the neural network is trained , the reasonable selection of the learning sample has a great influence on the generalization capability of the model. Analyzing the model structure , the generalization capabil ity of a neural network, and especially a multi-layer feedforward neural network, is closely related to many factors , such as the degree of complexity of the actual data source, the number and the distribution of the learning sample, the structure and the scope of the network, the learning algorithm , etc. In conclusion, the generalization capability of a neural network can be improved by two features : the network structure and the learning sample set. The network structure mainly considers how to improve the robustness and fault-tolerance of the network and ascertain the proper information capacity of the network from the following aspects: the network model, the connection structure of neurons , the number of neural network hidden layers and neurons in each hidden layer, the learning algorithm, etc. The learning sample set should consider whether the selected sample set covers all the different situations in the research objectives, whether the distribution of inputs is reasonable , and how many samples we need to ensure that the generalization error satisfies the demand s. For instance, the following problems are worth studying : (a) If the research object (system) is complex, non-linear, and high in degree of uncertainty, and the different individuals of the objects of a class have obvious differences, we can design a given sampling experiment, enlarge the overlying scope and the density of the sample, and express the non-linear dynamic characters of the research object as completely as possible . Thus, we can improve the generalization effect , i.e. diminish the approximation error of the test sample set. (b) As a black box system, the modeling of neural networks completely depends on input and output data, thus the quality and the distribution of the learning sample set are important to the generalization capability of the network. As in practice we can only obtain a finite data sample with a given scope and condition s, because of noise pollution and error analysis, the sample data quality will be reduced . Therefore, in the selection of the learning sample, we should construct a complete data collection and analysis mechanism to improve confidence in the learning sample . (c) The mismatch between the network scale and the degree of complexity (information capacity) of a practical system is also an important factor in influencing the generalization capability of the network. At present, the structure and the scale of neural network s cannot be ascertained by any mature theory, but have to be decided by experience and repeated experiments. Although neural networks have a general approximation property , the proof of this conclusion is based on the premise of an infinite network scale and sample size. If the network scale is too small, the information capacity is low, and the network cannot completely approximate complex objects. If the scale is too large, it will induce over-fitting and reduce the
30
Process Neural Networks
robustnes s and fault-tolerance of the network. In some cases, the fuzzy logic system is equal to a neural network. Accordingly, in practical applications, we can firstly obtain the fuzzy relationship between the inputs and outputs of the research object according to prior knowledge and understanding of the practical system. Based on this relationship, the neural network structure can be defined primarily , then gradually be modified and completed by validating the sampling data. In this way, the structural and the property parameters of neural network models can correspond well with and match the system characteristics of the research object. (d) The essence of neural network training is to further simulate the mapping relationship between the inputs and the outputs of a practical system in a certain data environment. For the trained network, if the data environment changes obviously , we must retrain the network, i.e. redetermine the new mapping relationship of the research objects to ensure the generalized capability of the network. (e) For a group of given sample data, we should research "how to properly classify the learning sample and the test sample to obtain the minimum approximation error of the neural network through learning in the whole sample set".
2.3.6 Applications of Feedforward Neural Networks As neural networks need not build accurate mathematical or physical models in advance in order to solve a problem, they are broadly applied to fields that lack prior theory and knowledge or where it is difficult to build accurate mathematical or physical models, such as in scientific research, engineering computing and other facets of daily life. The feedforward neural network has the following important and successful application s because of the characteristics of its information-processing mechanism.
(1) Pattern recognition Pattern recognition is one of the earliest and most successful applications of a feedforward neural network . A neural network can automatically extract and memorize the essential characters of various sample patterns by learning the training sample set, and form the discriminant function by adaptive combination of multiple characters, and solve various complex pattern recognition problems, such as autodiagnosis of mechanical failure [7], script character recognition [8], discrimination of sedimentary microfacies in petroleum geological study [9], and phoneme recognition [IO J•
(2) Classification and clustering Classification and clustering are common problems in signal processing and combinatorial analysis. When there are several classes , how to classify samples is called "classification" ; when the class number is unknown, how to merge samples into classes most reasonably is called "clustering". For classification , the BP
ArtificialNeuralNetworks
31
network is like a classifier with learning and adaptive mechanism by learning and extracting various classes of pattern features . For clustering, the classification structure of research objects does not need to be known beforehand , and it can be classified according to similarities among the research objects, which are not restricted by the current level of study of research objects and prior knowledge . The feedforward neural network (such as a self-organizing mapping neural network without teaching) , adopting the self-organizing competitive learning algorithm, is a perfect cluster, which is broadly applied to many fields including data mining, association analysis, etc. [11-131 (3) Forecastdecision-making
As a feedforward neural network has the properties of a learning mechanism regarding the environment, adaptive capability and continuity , the neural network that has learned some knowledge about related domains is like a prediction model that can analyze development trends of objects according to their external condition changes. At the same time, a neural network model is based on case learning, and can convert the knowledge and information acquired from learning into facts and rules in the process of reasoning ; therefore , it can be used for decision-making. At present, a neural network has been applied to trend prediction in economic development [141, environmental prediction [15,161, intelligent decision support [17 1, stock market change trend prediction analysis [18,191, earthquake prediction [201, performance forecast of a refrigeration system [211, etc. (4) System identification and adaptive control
System identification and adaptive control are other important applications of feedforward neural networks . System identification based on a neural network uses a nonlinear transformation mechanism and the adaptability of neural networks and regards the neural network as an equivalent model to an identification system, such that the practical system and the identification model have the same output under the same initial conditions and given inputs based on the inputs and the output data of the system. Moreover, a feedforward neural network can be a controller of a practical system, which can take effective adaptive control in conditions of system uncertainty or disturbance, and make the control system achieve required dynamic and static characteristics [22-241• (5) Modeling and optimizing
Feedforward neural networks have good learning capability and nonlinear transformation mechanisms. They can effectively finish simulation modeling for problems including sensing systems and automatic production processes where it is difficult to build accurate models using mathemat ical formulas . Moreover , they can also be applied to system structure design, optimization, etc. [25,261• As a feedforward neural network has good function approximation and computing capability , it has been broadly applied in other practical fields of scientific computing, image processing 127,281, etc.
32
Process Neural Networks
2.4 Fuzzy Neural Networks The signals processed by the biological nervous system are fuzzy and qualitative hybrid simulations to some extent. The managing process for them is not a simple nume rical calculation, but a combination of the environmental activ ation signal and the exi sting knowledge in the neural system. The information-processing mechanism of the neural network real izes logical reasoning and computing. A fuzzy neural network can integrate fuzzy logical reasoning with a nonlinear transformation mechanism and the learning capabil ity of a neural network to simulate the information-processing mechanism and proce ss of the biological neural network more closely.
2.4.1 Fuzzy Neurons There are two kinds of models of fuzzy neurons. Model I is obtained by the directly Iuzzied non-fuzzy neuron; Model II is described by fuzzy rules . The structure of Model I obtained by the directly fuzzied or popularized non-fuzzy neuron is shown in Fig. 2.9 .
y
Fig. 2.9 Structure of fuzzy neuron Model I
In Model I, the inputs, the connection weights, the activation thresholds, the aggregation operation, and the nonlinear activation function (also called effect function) are all fuzzied, and can be various fuzzy numbers, fuzzy operations, or fuzzy functions, separately. Therefore, the output of the neuron is fuzzy too. Similar to the non-fuzzy neuron, this fuzzy neuron can do a certain aggregation operation on inputs (fuzzy or precise) after the weighted operation, and then compute the output result for the neuron according to the activation threshold and the activation function. Fuzzy neuron Model II is designed according to the weighted fuzzy logic proposed by the authors. It denotes a weighted fuzzy logical rule in semantics, and the premise and the conclusion are fuzzy predications including fuzzy sets as arguments. In this fuzzy neuron, input information (fuzzy or precise) is related with its output by a weighted fuzzy logical rule. The reasoning rule denoted by the fuzzy neuron is stored in structure connection parameters of the neuron and the
Artificial Neural Networks
33
aggregation operation mechanism. The output predication is composed of the current input predication and the past experienced weight according to a certain rule . The structure of Model II is shown in Fig. 2.10. XI
X2
lV I'
IVZ'
•
-
Y
Fig. 2.10 Structure of fuzzy neuron Model II
2.4.2 Fuzzy Neural Networks Obviously, fuzzy logic has an outstanding feature: it can naturally and directly express the logical meanings habitually used by humans, so it is applicable to direct or advanced knowledge representation. On the other hand, it is difficult for fuzzy logic to express the complex nonlinear tran sformation relationship between quantitative data and proce ss variation. The neural network can complete adapt ability by a learning mechanism, and automatically obtain knowledge expre ssed by available data (accurate or fuzzy). However, with this knowledge indirectly expre ssed by "connection weight" or "activation threshold " in neural networks, it is difficult to directly determine its meanings, and is not easy to directly proceed to semantic interpretation. It is obviou s that both fuzzy logic and the neural networks have advantages and disadvantages. However, we can easily discover that the advantages and the disadvantages of fuzzy logic and neural computing are complementary in a certain sense. Fuzzy logic is more suitable for top-down analy sis and a design process when designing intelligent system s, while a neural network is more suitable for improving and consummating the performance of an intelligent system from bottom to top after it has been initially designed. Therefore, if fuzzy logic and a neural network can be combined harmoniously, they can have complementary advantages, that is to say the inhere nt disadvantages of one field can be compensated for by the other. It will be a good combination if we adopt the fuzzy neuron depicted in the above section to construct neural networks. Obviously, the knowledge base expressed by fuzzy rules can be conveniently expressed by a network composed of one or more of this kind of fuzzy neurons . Another combination is to adopt some fuzzy logica l rules to control the structure and the values of property parameters of a fuzzy neural network. For example, some learning parameters change according to fuzzy reasoning rules during the learning or the running process for fuzzy neural networks. The parameters u and d in the RPROP algorithm are originall y fixed constant s. The origin al algorithm is greatl y
34
ProcessNeural Networks
improved after adopting a fuzzy control method to make the parameters change during the running. In fact, the fuzzy control method can be extended to continuously control and modify other components of the neural network, including the connection weights, the activation threshold, the aggregation method, or even the dynamic adju stment of the activation function, etc . From hereon, the key is to design, acquire and ascertain fuzzy control rules , which is a design problem dependent on actual applications. For instance, in the learning course of a general fuzzy neural network, a method of modifying the fuzzy connection weight adopting not fuzzy computing but fuzzy logical rules is vital and worth researching. The main difficulty lies in how to produce appropriate fuzzy modification rules according to the semantics of the problems. There are also other methods for combining fuzzy logic and neural networks, for example: (a) Fuzzy operator neural network [29J• This is a fuzzy neural network model, whose neuron aggregation operator is a fuzzy operator satisfying commutative law, associative law and zero law, with a consistent approximation for the continuous functions; (b) Monomer fuzzy neural network [301• This is a fuzzy neural network model which modifies the operators <·,2> of the traditional neural network to operators
«v ; /\> ;
(c) Simplex and mixed fuzzy neural network [311• This includes traditional neurons and fuzzy neuron s, and has accurate and fuzzy information-processing capability. (d) Fuzzy max-min operator neural network [321. This is composed of fuzzy max-min operator neurons. The fuzzy max-min operator neuron refers to the following memory storage system (2.5)
where «v ; r;» satisfy: for any a,bEAs;[-I,I], there are a/\b=sgn(ab)min(lal,lbl), avb=sgn(ab)max(lal,lbl),
I, x> 0 where sgn(x)= 0, x = 0 ; { -I, x < 0
Xl ,x2, ... ,xn
are n inputs ,
X;E
[0,1];
WJ,W2, ... ,Wn
are the
connection weights corresponding to the above n input channels, WjE [-1,1]. Different combination modes can give rise to different fuzzy neural networks, but there are two main methods according to function, i.e. the combining pattern based on "differentia" and the integration pattern based on "sameness". The former integrates the advantages of both fuzzy logic and neural networks, and makes the fuzzy system or the neural network extend to extra special functions based on the
Artificial Neural Networks
35
original function. The latter integrates them based on the similarity between fuzzy systems and neural networks.
2.5 Nonlinear Aggregation Artificial Neural Networks In the aggregation operation of the traditional neuron, the aggregation operator generally takes the linear weighted summation of the input signals. In fact, in information processing in a biological neuron, the effect of an exoteric perception signal or signal transferred from other neurons is not completely linear weighted aggregation, but often is a particular nonlinear aggregation. Now we consider several effective nonlinear aggregation artificial neural network models.
2.5.1 Structural Formula Aggregation Artificial Neural Networks In a biological neuron, some input signals produce activation of neurons, while others produce inhibition. Consequently , we naturally construct the following artificial neuron mathematical model with structural formula aggregation .
Y=f [LWXX B), LVXX
(2.6)
where the numerator part Iwxx denotes the activation from an input signal to a neuron; the denominator part Ivxx denotes inhibition from the input signal to the neuron ; their effects can be adjusted by the connection weight coefficients . When the external input signal only activates but does not inhibit the neuron, then Ivxx=l, and the structural formula aggregation neuron is a traditional neuron model, i.e. the traditional neuron can be regarded as a special case of the structural formula aggregation neuron. The structure of a structural formula aggregation artificial neural network is similar to that of the traditional feedforward neural network ; the difference is that the neuron in the network is a structural formula aggregation neuron. This network model has higher efficiency and delicacy than the general neural network does in the fitting of an object with singular values output.
2.5.2 Maximum (or Minimum) Aggregation Artificial Neural Networks The importance that external factors have in stimulating and influencing the neuron is generally different. Under some conditions, a certain important factor may determine the output of the neuron, and thus we can use the following maximum (or
36
Process Neural Networks
minimum) aggregation artificial neural network model to express this informationprocessing mechanism. The maximum aggregation artificial neural network mode l
y=flmax(wx.x)-B).
(2 .7)
The minimum aggregation artificial neural network model
y=f(min(wx.x)-B) .
(2 .8)
A neural network composed of maximum (or minimum) aggregation neurons is called a maximum (or minimum) aggregation artificial neural network. This model is particularly suited for decision support, sensitive factor analysis, etc.
2.5.3 Other Nonlinear Aggregation Artificial Neural Networks In fact, we can construct multiform non-linear aggregation artificial neural mode ls according to the actual demands of practical problems and the constitutive principle of artificial neural networks. For example
y=f [ y=f [
LWXX
max(wxx) LWXX
min(wxx)
y=f( min(wxx) max(wxx)
y = f(
m~x(wxx) mm(wxx)
oJ
(2.9)
oJ
(2.10)
oJ oJ
(2.11) (2.12)
y = f(II wxx-O) ,
(2.13)
y = f( exp(II wxx)-O) .
(2.14)
Different types of aggregation artificial neurons have different informationprocessing mechanisms for the external input signals. A neural network consisting of the above neurons or some different types of neurons according to a certain hierarchical structure can emphasize different characters of different neurons in information processing. It is to a certain extent similar to the basis function composed of different types of functions in the function approximation, and can advance the flexibility and the adaptability of neura l networks in solving practical problems.
Artificial Neural Networks
2.6 Spatio-temporal Networks
37
Aggregation and Process Neural
As mentioned above , so far the artificial neural network (ANN) models that have been researched or are being researched are mostly based on the theoretical framework of PDP (Parallel Distributed Processing). The inputs of ANNs are constants independent of time, that is, the inputs at a time are just geometric point type instantaneous inputs (a value or a vector) . However, neurophysiological experiments and biological research indicate that variation of output of the synapse is related to the relative timing of the input pulse in a biological neuron, i.e. the output of a neuron depends on an input process that lasts for some time . The output of the neuron is not only related to the spatial aggregation and activation threshold function of input signals , but also depends on a time cumulative effect of the input process. Moreover, in practical problems, the inputs of many systems are also processes or functions changing with time . For example, in a real-time control system , the inputs are continuous signals changing with time, and the outputs not only depend on the spatial weighted aggregation of the input signals , but also are relative to the temporal cumulative effect in the input process interval. For variational problems, the definitional domain of the functional is generally a process interval related to time. For optimizing problems, multifactor optimization that depends on time can also be classified as conditions with process inputs . It can be said that the traditional artificial neuron M-P model preferably simulates the spatial weight aggregation effect and activation threshold function of the biological neurons in information processing, but it lacks another important character of the biological neuron-temporal cumulative effect [331• In order to solve problems like dynamic signal processing and nonlinear continuous system control, many scholars have presented some neural network models that can process time-varying information, such as delay unit networks [341, spatial-temporal neural models [351, recurrent networks [361, and partial feedback networks 1371. When solving procedural input and the problem of time order dependency of the system, these models usually implement delay between inputs and outputs by an external time-delay link, i.e. a time-discretization loop network is constructed. However, it will make the system structure complicated and bring many problems that are difficult to foresee, to the structure of the learning algorithm of the networks, the convergence and the stability of the algorithm, etc. At the same time, the essence of models and learning algorithms listed above are still based on traditional neural networks, and do not change the information processing mechanism of artificial neurons . Therefore, we simulate the relevant processing method of the biological neural system for the external input information, and extend the aggregation operation mechanism and the activation mode of the neuron to the time domain. It has important and practical significance to make the artificial neuron have the ability to process spatio-temporal 2-D information at one time. In the I990s, the authors started to research neural networks whose
38
Process Neural Networks
inputs/outputs are all a time-varying process, and in 2000, the concept and the model of the process neuron and the process neural networks were published for the first time. A process neuron works by simulating the principle of dynamics that the external stimulation of some biological neural system may last some time and the biological neuron proceeds to information processing according to the functions of synthesis, coordination and accumulation of many time-varying input signals in time delay intervals . The inputs and the weights of the process neuron can both be time (process) functions. It adds a temporal cumulative aggregation operator based on the spatial aggregation operation of the traditional neuron. Its aggregation operation and activation can simultaneously reflect the spatial aggregation function and temporal cumulative effect of time-varying input signals, i.e., the process neuron can process spatia-temporal 2-D information at one time. The basic information-processing units composing an ANN system are neurons. The information -processing mechanism of the neuron is the key to the character and information-processing capability of the neural network. The connection weights of the network can only be adjustable parameters or functions, and the aggregation operations (spatial, temporal) and the activation effect of the activation threshold should be completed in a neuron. From this point of view, the process neuron preferably simulates the information-processing mechanism of a biological neuron. A process neural network is a network model that is composed of some process neurons and general non-time-varying neurons according to a certain topological structure. Like the traditional neural network, a process neural network can be divided into feedforward and feedback neural networks according to the connection mode and the existence of feedback in information transferring among neurons. In fact, according to the difference in the topological structure of the network, mapping relationship between inputs and outputs, connection weights, activation threshold styles and learning algorithms, we can construct multiform process neural network models to adapt to different practical problems. The process neural network broke the synchronous instantaneous limitation of the traditional neural network model to inputs/outputs , which makes the problem more generalized and the application fields of artificial neural network broader. In fact, many practical applications can be classified into these kinds of issues, such as the simulation modeling of nonlinear dynamic systems, nonlinear system identification, control process optimizing , classification and clustering of continuous signals, the simulation and control of an aggregation chemical reaction process, fault diagnosis of continuous systems (analysis of fault reason), factor analysis (determination of primary-secondary of factors or reasons, also called reverse reasoning), and function fitting and process approximation . The neural network with process input is an extension of the traditional artificial neural network into the time domain, and is a generalized artificial neural network model. The traditional artificial neural network can be regarded as a special case of the process neural network which has broad adaptability for solving multitudinous problems related to inputs/outputs and processes in practice.
Artificial Neural Networks
39
2.7 Classification of Artificial Neural Networks So far, there are many kinds of proposed artificial neural network models, and each of them has its own structure character and information-processing method. According to the construction elements of the neural network, artificial neural networks can be classified by the following nine dimensions. It can be said that various existing neural networks are all included in the nine dimensions. (a) Input type Input can be divided into a simple type (integer, real type, string, etc.), a structure type (complex number type, tuple, etc.), a predication , a function (especially a time-varying function and a multivariate function), and even the dot in any functional space or abstract space. Moreover, we can further divide the above inputs into accurate, fuzzy, uncertain, or incomplete inputs, etc. (b) Output type Output can be divided into a simple type (integer, real type, string, etc.), a structure type (complex number type, tuple, etc.), a predication, a function (especially a time-varying function and a multivariate function), and even the dot in any functional space or abstract space. Moreover, we can further divide the above outputs into accurate, fuzzy, uncertain, or incomplete outputs, etc. (c) Connection weight type Connection weight can be divided into a simple type, a structure type, a function (especially a time-varying function, and a multivariable function), even functional, etc. Moreover, we can further divide the above connection weights into accurate, fuzzy, uncertain, or incomplete connection weights, etc. (d) Activation threshold type Activation threshold can be divided into a simple type, a structure type, a function (especially a time-varying function and a multivariable function), even functional, etc. Moreover, we can further divide the above activation thresholds into accurate, fuzzy, uncertain, or incomplete thresholds, etc. (e) Aggregation function type Aggregation function can be divided into arithmetical (further divided into linear and non-linear), logical, compound, and even functional types, etc. Moreover, we can further divide the above aggregation function into accurate, fuzzy aggregation functions, etc., including adopting various aggregation functions consisting of the T operator and S operator in fuzzy mathematics. The whole aggregation process of the neuron on input signals can be divided into spatial aggregation, multi-factor aggregation, temporal accumulation, etc. (f) Activation function type There are many types of activation functions . Generally, they are non-linear functions or functional, and can be further divided into accurate and fuzzy activation functions, or can also be time-varying functions . (g) Connection structure type Connection structure is generally divided into two classes, i.e. pure feedforward and feedback.
40
Process Neural Networks
(h) Learning algorithm type There are many kinds of learning algorithms. They can be divided into three types of computing (including functional or computation in abstract space), logic and reasoning in terms of adopted operation type according to the learning algorithms. (i) Process pattern of time type Process pattern of time can be divided into a continuous class and a discrete class (also called quantization). There are two aims of classification , one is to summarize existing research productions, and make them standardized, systemized, and make the understanding of problems clearer at the same time ; the other is to highlight those neural network models with significant factor permutations that have not so far been studied or applied using permutations and combinations of the possible values of classification factors (a multi-dimensional array composed of classification factors) . According to this aim, there are nine classification factors , and they have thou sands of combinations in all, among which many significant combinations exist. We believe that many neural networks corresponding to these combinations have not been researched thoroughly yet and are worth the notice of researchers. We especially point out that proposing this classification of neural networks including various existing neural networks is the main contribution of this book . The subject of this book "process neural network" is just one kind of these numerous networks. Certainly, it has great importance and significance.
References [1] Shepherd G.M. (1994) Neurobiology, 2nd Ed. Oxford University Press, New York [2] Longstaff A. (2004) Instant Notes in Neuron Science, 1st Ed. Bios Scientific Publishers, Oxford [3] Hecht-Nielsen R. (1989) Theory of the backpropagation neural network. Proceeding s of International Joint Conference on ofNeural Networks 1:593-605 [4] Hornik K. (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251-257 [5] Funahashi K., Nakamura Y. (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks 6(6) :801-806 [6] Liu X.H., Dai R.W. (1995) Turing equivalence of neural networks of linear-threshold-logic units. Chinese Journal of Computer 18(6):438-442 [7] Mohamed A., Mazumder M.D.A. (1999) A neural network approach to fault diagnosis in a distribution system. International Journal of Power and Energy System 19(2):696-703 [8] Garris M.D, Wilson c .t, (1998) Neural network-based systems for handprinted OCR applications. IEEE Trans Image Processing 7(8):1097-1112 [9] Ran Q.Q., Li S.L., Li Y.Y. (1995) Identification of sedimentary microfacies with an artificial neural network. Petroleum Exploration and Developmen, 22(2):59-63 (in
Artificial Neural Networks
41
Chinese) [10] Schwarz P., Matejka P., Cemocky 1. (2006) Hierarchical structures of neural network s for phoneme recognition. ICASSP 2006 Proceedings. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing 1:325-328 [II] Wang S.L. (2008) Research on a new effective data mining method based on neural networks . International Symposium on Electronic Commerce and Security , 2008 pp.195-198 112] Wu X.D. (2004) Data mining: artificial intelligence in data analysis . In: IEEEIWIC/ACM International Conference on Intelligent Agent Technolog y 1:569-575 113] Curtis D. (2007) Compari son of artificial neural network analysis with other multimarker methods for detecting genetic association. BMC Genetics 8(\) :49 114] Wang W., Zhang C. (2000) Applying artificial neural network to the predicting of nonlinear economy. Journal of Systems Engineering 15(2):202-207 (in Chinese) 115] Zhu CJ., Chen SJ. (2008) Prediction of river water quality using organic gray neural network. Control and Decision Conference pp.248 1-2484 1161 Zhu CJ., Zhou 1.H., Ju Q. (2008) Prediction of groundwater quality using organic grey neural network model. The 2nd International Conference on Bioinformatics and Biomedical Engineering pp.3168-3171 1171 Kuo RJ ., Chi S.c. (2002) A decision support system for selecting convenience store location through integration of fuzzy AHP and artificial neural network . Computer s in Industry 47(2) :199-214 1181 Ye Q., Liang B., Li YJ. (2005) Amnestic neural network for classification: application on stock trend prediction . Proceedings of 2005 International Conference on Services Systems and Services Management 2:1031-1034 1191 Khoa N.L.D, Sakakibara K., Nishikawa I. (2006) Stock price forecasting using back propagation neural networks with time and profit based adjusted weight factors . International Joint Conference on SICE-ICASE pp.5484-5488 1201 Liu Y., Liu H., Zhang B.F. (2004) Extraction of if-then rules from trained neural network and its application to earthquake prediction . Proceeding s of the Third IEEE International Conference on Cognitive Informatics pp.109-115 1211 Ertunc, H.M., Hosoz, M. (2006) Artificial neural network analysis of a refrigeration system with an evaporative condenser. Applied Thermal Engineering 26(5-6) :627-635 1221 Xia c .i., Qi W.Y., Yang R., Shi T.N. (2004) Identification and model reference adaptive control for ultrasonic motor based on RBF neural network. Proceedings of the CSEE 24(7):117-121 (in Chinese) 1231 Gc 5.5.. Hong F.. Lee T.H. (2003) Adaptive neural network control of nonlinear systems with unknown time-delays. IEEE Trans Automatic Control 48(11): 200.f-20 I0 12-11 Tomus> 1'.. Kr/ys/tor Z. (2007) Application or artificial neural network to robust
42
Process Neural Networks
speed control of servodrive. IEEE Transactions on Industrial Electronics 54(1):200-207 [25] Ciuprina G., Loan D., Munteanu I. (2002) Use of intelligent particle swarm optimization in electromagnetics . IEEE Transactions on Magnetics 38(2):1037-1040 [26] Niu Y.G., Yang C.W. (2001) Mode control for nonlinear uncertainty system of neural network. Information and Control 30(2):139-142 (in Chinese) [27] Liu B., Brun 1. (1999) Solving ordinary differential equations by neural network. Modeling and Simulation: A Tool for the Next Millennium. Proceeding of 13th European Simulation Multi-conference, Warsaw, Poland 11:437-441 [28] Feng Y., Chen Y.M. (2005) The Application of self-organizing neural network in image processing. Process Automation Instrumentation 26(8):32-34 (in Chinese) [29] He X.G. (1990) Fuzzy computational logic and neural networks. Advancement of Fuzzy Theory and Systems. International Academic Publishers, Beijing D14:1-8 [30] Liang 1.Z., He X.G. (2000) Function approximation capabilities of monolithic fuzzy neural networks. Journal of Computer Research and Development 37(9):1045-1049 (in Chinese) [31] He X.G. (1998) The Theory and Techniques of Fuzzy Knowledge . National Defense Industry Press, Beijing (in Chinese) [32] Liang 1.Z., He X.G. (2001) Turing equivalence of fuzzy max-min operator neural networks. Journal of Beijing University of Aeronautics and Astronautics 14(1):82-85 (in Chinese) [33] Ou Y.K., Liu W.F. (1997) Theoretical frame based on neural network of biometric-model of nerve cells. Beijing Biomedical Engineering 16(2):93-101 (in Chinese) [34] Waibel A., Hanazawa T., Hinton G.E., Shikano K., Lang KJ. (1989) Phoneme recognition using time-delay neural networks. IEEE Transaction ASSP 37(3):328-338 [35] Tsoi A.C. (1994) Locally recurrent globally feedforword networks. A Critical Review of Architectures. IEEE Transactions on Neural Netwo rks 5(2):229-239 [36] Draye 1.S., Pavisic D.A., Cheron G.A., Libert G.A. (1996) Dynamic recurrent neural networks: A dynamical analysis. IEEE Trans SMC(B) 26(5):692-706 [37] Herts 1., Krogh A., Palmer R.G. (1991) Introduction to the Theory of Neural Computation . Addison-Wesley Longman Publishing Co., Inc., Boston, MA
3 Process Neurons
In this chapter, we will begin to discuss in detail the process neural network (PNN) which is the subject of the book. First, the concept of the process neuron is introduced. The process neuron is the basic information-processing unit that constitutes the PNN, and the model used to form it and its operating mechanism determine the properties and information-processing ability of the PNN. In this chapter, we mainly introduce a general definition and basic properties of the process neuron, and the relationship between the process neuron and mathematical concepts, such as compound functions, functional functions, etc.
3.1 Revelation of Biological Neurons Neurophysiological experiments and research in biology indicate that the information processing characteristics of the biological neural system include the following nine main aspects: the spatial aggregation function, the multi-factor aggregation function, the temporal accumulation effect, the activation threshold characteristic, self-adaptability, excitation and inhibition characteristics, delay characteristics, and conduction and output characteristics [1-3]. From the definition of the M-P neuron model, we know that the traditional ANN simulates the characteristics of voluminous biological neurons, such as spatial weight aggregation, self-adaptability , conduction and output, etc., but that it lacks a description for the time delay, the accumulation effect and the multi-factor aggregation function. In the process of practical information processing in the biological neural system, the memory and the output of the biological neurons not only depend on the spatial aggregation function of each piece of input information, but also are related to time delay and accumulation effects, or are even related to other multi-factor aggregation functions. Therefore, the process neuron model we want to construct should simulate the~e important information-processing characteristics of biological neurons.
44
Process Neural Networks
3.2 Definition of Process Neurons In this section, we first define a simple proce ss neuron , which temporarily excludes the multi -factor aggre gation ability. Th is process neuron is made up of four operation s, including a time-varyin g process (or funct ion) signal input, spatial weighted aggregation , time effect accumulation and activation threshold activation output. It differs from the traditional neuron M-P model in two ways. First, the inputs, connection weights and activation threshold of the process neuron can be time- varying functions; second, the proce ss neuron has an accumulation operator, which makes its aggregation operation express both spatial aggregation for the input signals and the cumulative proc ess to time effect. The structure of the proce ss neuron model is shown in Fig. 3.1 .
y
Fig. 3.1 A general model of process neuron , xn(t) are the time-varying input functions of the In Fig. 3.1, XI(t), X2(t), process neuron ; WI(t), W2(t), , wn(t) are the corresponding weight functions; K(·) is the aggregation kernel function of the process neuron that can transform and proces s the input signals according to the inherent character of the actual system; K) is the activation function which usually is a linear function, a Sigmoid function , Gaussian function, etc. The proce ss neuron can be divided into two basic model s described by mathematics according to the different sequences of the spatial aggregation and temporal accumulation operation. The relationship between inputs and outputs of the proces s neuron is described below . Model I:
y = f(:L(f(K(W(t),X(t))))-O) .
(3.1)
In Eq. (3.1), X(t) is the input function vector, W(t) is the corresponding connection weighted function vector, y is the output, () is the activation threshold (also can be time-varying), "I " denote s some spatial aggregation operation (such as a weighted sum , Max and Min), "I" denote s some temporal accumulation operation (such as the integral over t). The proce ss neuron described by Eq. (3. 1) first does temporal weighted accumulation for the external time-varying input signals, i.e. implements the weighted temporal accumulation of system output for each time-v arying input signal,
ProcessNeurons
45
then does spatial aggregation on the temporal accumulation effect, and finally outputs the result by computing the activation function! Its structure is shown in Fig. 3.2.
y
Fig. 3.2 Process neuron model I Model II:
y=f
(f(L( K (W(t), X(t)))) -0).
(3.2)
The process neuron denoted by Eq. (3.2) first does spatial weighted aggregation when carrying out temporal-spatial aggregation operation, i.e. implements the spatial aggregation of multi-input signals at the same time point, then does temporal accumulation on the former spatial aggregation results, and finally outputs the result by computing the activation function! This process neuron is more often used in applications. Its structure is shown in Fig . 3.3.
Fig. 3.3 Processneuron model II It should also be noted that f, K, I and J can be diversified operators, and that they are not always exchangeable. Therefore, Model I is not equivalent to Model II. For instance, if we suppose that I=weighted sum, J=integral,f-:sign, K(u,v)=u*v, then Eq. (3.1) becomes
f(W(t) * X(t))dt )-0),
(3.3)
f(:L(W(t) * X(t))dt)-O) .
(3.4)
y = sign(L( and Eq. (3.2) becomes
y = sign(
Further, the process neuron can be extended to the condition that its inputs and outputs are all time-varying process functions, for example
46
Process Neural Networks
y(r) = f
(r( l( K (W(t), X (t»))) -0),
(3.5)
l(r( K(W(t),X(t»)))-O) ,
(3.6)
or y(r ) = f(
where "Jr" is a temporal accumulation operator depending on r, for instance the integral in the time interval [0, r] or [r-k, r] , This kind of process neuron can be used to constitute complex process neural networks with multi-hidden-layers. For brevity, we now use "Elf' and "®" to denote respectively the spatial aggregation operator and temporal accumulation operator in Eqs. (3.1) and (3.2), then the mapping relationship between the inputs and output of a process neuron denoted by Fig. 3.2 is
y = f ((W(t) EEl X(t»)® KO-O),
(3.7)
and the relationship between the inputs and output of a process neuron denoted by Fig. 3.3 is
y = f((W(t)® X(t») EEl KO-O).
(3.8)
For instance,
W(t) EEl X(t)
=:tw;(t)x;(t),
r
(3.9)
;=1
A(t) e KO =
A(t)K(t)dt,
(3.10)
where [0, 1'] is the input process interval of time-varying signals; K( ·) is an integrable function over the interval [0, 1'], or more generally suppose that K(·) is a mono-functional, and we define
A(t) e K(·) = K(A(t» .
(3.11)
Generally, the weighted function W(t)=(w,(t), wz(t), ... , wit» and the temporal weighted kernel function (functional) K(·) are both supposed to be continuous, and actually are in most applications. In Eq. (3 .7), if the spatial aggregation operation is taken as a weighted sum, the
temporal (process) accumulation operation is taken as the integral, and K(·)= 1, then the formula can be rewritten as (3.12)
ProcessNeurons
47
The process neuron described by Eq. (3.12) is called a special process neuron whose operation consists of weighted multiplication , summation, integration, and activation functions. In fact, the spatial aggregation operator "EEl" and the temporal accumulation operator " (8)" can take other operations of various forms. For example, "EEl" can be "max" and "min", or "T-operator" and "S-operator"; " (8)" can be convolution , varying parameter integration, etc.; the activation function f can be any bounded function. Thus, the process neuron described by Eq. (3.7) or Eq. (3.8) is a class of very broad process neurons and is called the generalized process neuron. The adaptability and the information-processing capability of the process neuron for handling different practical problems mainly depends on the forms of the spatial- temporal accumulation and aggregation operators, which should be carefully selected in practical applications . The process neuron can produce a process memory of the characteristics of time-varying input signals by learning the training samples. The process memory is reflected by the connection weight functions of the process neuron. In Eq. (3.12), if T=O, Xi(t )=Xi, Wi(t)=Wi, then it can be simplified as (3.13) This is a non-time-varying traditional neuron. It is obvious that the traditional neuron is a special case of the process neuron. Next, we will discuss the process neuron and some interrelated mathematical concept s, such as the relationship between neurons, functionals, and multivariate functions, etc.
3.3 Process Neurons and Functionals From the definition of the special process neuron, we know that the input of the process neuron is a time-varying function (or function vector), and that the output is a real value. Therefore, the process neuron is actually a kind of functional function from the mathematical perspective. Subsequently, in Eq. (3.12), if the activation function f is a linear function, and the activation threshold 0=0, then the process neuron is a linear functional. If we use F to denote the functional relationship delegated by the process neuron, we can obtain F (a1X1(t ) +a2X 2(t) + ...+aKX K(t))
= .bf W (t )-(a1(X1(t)) T +a2 (X 2 (t )) T + ...+aK(XK(t )) T )dt
r
=a1
W (t )· (X 1(t ))Tdt+a2
r
W (t )· (X 2(t ))Tdt+ ...+aK
r
W (t ) · (X K(t ))Tdt
48
Process Neural Networks
where Xk(t)=(Xkl(t) , xdt) , ... , xdt)) is an n-dimensional vector of the input functions, W(t)=(WI(t), W2(t) , . .., wn(t)) is an n-dimensional vector of weighted functions, ak is a real constant. In fact, the process neuron defined by Eq. (3.2) can also be directly extended to the condition of time-varying inputs-outputs, for example (3.14)
Then the inputs and outputs of the process neuron are all time-varying functions, i.e. the process neuron denoted by Eq. (3.14) is a functional function with variable parameters. The mapping mechanism of the traditional artificial neuron is a kind of function relationship. Function theory and function approximating methods greatly improve research into traditional artificial neural networks. The mapping mechanism of the process neuron is a kind of functional relationship, so we can also discuss in detail some properties of process neural networks by virtue of functional theory, and research on the learning and general ization problems of PNN by virtue of the functional approximating idea. It is of great significance for research on mapping mechanisms and applicability of the process neuron.
3.4 Fuzzy Process Neurons In practice, we often meet processing problems with process fuzzy information, such as ECDM process control 141, grinding process fuzzy control system design 15J, steam temperature regulation in coal-fired power plant [6J, machining process modeling 171, etc. If we define a kind of fuzzy process neuron by combining the information processing method of the process neuron with fuzzy reasoning rules, it will improve the information processing ability of artificial neurons . Two methods can be used to construct a fuzzy process neuron. One is that we directly fuzz the process neuron, combining the nonlinear transforming mechanism of the time-varying information of the process neuron with fuzzy logical reasoning methods, and establish a fuzzy computing model that can deal with process information. The other is that we denote the fuzzy reasoning rules used with process information as a fuzzy process neuron, i.e. each fuzzy process neuron denotes one fuzzy process-reasoning rule, so that multiple fuzzy process neurons can constitute a fuzzy process neural network according to a certain structure, i.e. construct a fuzzy process logical reasoning system (rule set). The following problems all focus on the domain with process fuzzy information (fuzzy time-varying system), and the non-fuzzy system can be regarded as a special case of a fuzzy system.
ProcessNeurons
49
3.4.1 Process Neuron Fuzziness Suppose that
Ai' ~, ..., A
K
are fuzzy sets in a domain U, and the membership
functions in the acceptance domain are /1;. (-),/1;. 1
2
0, ...,/1;. 0 respectively. The K
fuzzy process neuron is made up of weighted inputs of fuzzy process signals, fuzzy aggregation operation, and fuzzy activation output. Its structure is shown in Fig. 3.4.
Fig. 3.4 Fuzzy process neuron
In Fig. 3.4, the neuron input X(t)=(Xl(t),X2(t) ,... ,x,,(t » , tE [O,n can be time-varying functions or process fuzzy information; the connection weight of the fuzzy process neuron w(t) = (WI (t), w2 (t) ,..., Ii'" (t)) can be used to denote membership function s or belief functions; "S:" and "Etl" are two fuzzy dual aggregation operators corresponding to spatial aggregation and temporal accumulation respectively, such as max and min. an S-operator and T-operator; f is the fuzzy activation function, and y is the output of the fuzzy process neuron, According to Fig. 3.4, the relationship between the inputs and the output of this fuzzy process neuron is ." =
In Eq. (3.15),
B(t)
f (Etl (x(t) & w(t) ) -
B(t) ) .
(3.15)
is the fuzzy activation threshold of the fuzzy process neuron,
and it can also be a time-varying fuzzy function . As the inputs, connection weights, activation threshold, aggregation/accumulation operation and activation function of the process neuron are all fuzzied, and can be variously a fuzzy set, a fuzzy operation and fuzzy functions respectively, the output of the process neuron can be a fuzzy numerical value or a fuzzy function . Similarly to the information processing mechanism of the non-fuzzy process neuron, all the input functions (fuzzy or accurate) of this fuzzy process neuron are correspondingly aggregated/accumulated after weighting, and we obtain the output result of the neuron according to the activation threshold and the activation function.
50
Process Neural Networks
3.4.2 Fuzzy Process Neurons Constructed using Fuzzy Weighted Reasoning Rule [8-10] Denote the process neuron as a weighted fuzzy logical rule in semantics. Its precondition and conclusion include fuzzy predication of proce ss information. In this fuzzy process neuron , the inputs/output with process fuzzy information are conn ected by a weighted fuzzy logical rule. The knowledge and rules of the domain are stored in the fuzzy connection weight and the aggregation operator, and their output predications are made up of the combination of the current input predication and existing experienced weight according to certain rules. One fuzzy proce ss neuron corre spond s to one weighted fuzzy logical rule with process information. Its structure is shown in Fig. 3.5.
Fig. 3.5 Fuzzy reasoning proce ss neuron
The process neuron denoted by Fig.3.5 corresponds to a fuzzy reasoning rule that contains process information, and is denoted as (3.16) where P;(t), Q(t) (tE [0,11) are fuzzy logical predications, and the true value is taken in the interval [-I , 1]; the fuzzy connection weight w; ~ 0 (which can be a function
I
n
dependent on time), and
W; = I ; cf(O
i; l
and r is the applicable threshold
(O<~ I),
which means that the rule can be applied
only when the truth degree of the precondition p: p =
:tWi* i=l
T (P;(t)) ~ t , where
T(Pj(t)) is the truth degree of Pj(t), i=I,2,.. . ,n. The truth degree of the output predication Q(t) of the neuron is T(Q j(t))=F(p, cf), where F is the activation function. The above outputs of the weighted fuzzy reasoning rule (namely the fuzzy neuron ) are the predication Q(t) and its truth degree T(Q(t)).
Process Neurons
51
3.5 Process Neurons and Compound Functions The inputs, connection weight s and activation thresholds of the process neuron can all be temporal function s, or even multivariate function s. If the spatial aggregation operator and the temporal accumulation operator of the proce ss neuron are taken as appropriate operation forms, then the proces s neuron is a unitary continuous compound function. Theorem 3.1 Suppo se that the spatial aggregation operator of the process neuron is a weighted sum, and the temporal accumulation operator is taken as a varying parameter integral; the activation function is taken as a Sigmoid function (or other monotone increasing by degree s or continuous differential functions). If the input is a unitary continuous function X(/) (IE [0,1]) and the connection weight is W(t)E C[O,1], then the mapping capability of a process neuron is equal to that of a unitary continuous compound function. Proof Take any IE [0,1], and define y(t) =
I(! (W(1')X(1'))d1'-B),
i.e. the
output of the proce ss neuron is y(t), which is denoted as y(/)=F(x(t», where F is mapping relationship of the function expressed by the process neuron. Therefore, proces s neuron satisfies the mapping relationship of the compound function. Contrarily, if there is a unitary compound function relationship g, :t~x(t) , x(t) ~y(x(t» (tE [0,1]), i.e. g2(gt(t»=y(t). Take x(t) as the input function of proce ss neuron, then the expected output is defined as
y(/) =
I( l' (w(1' )x(1')f!1'-B) ,
the the
g2: the
(3.17)
where wet), 8 are undetermined functions and activation thresholds, x(t), and yet) are known functions. As the activation function 1 is taken as a Sigmoid function , which is monotonously increasing by degrees and a continuous differential coefficient, we can obtain
rl(y(t» = I(w(1')x(1')f!1'-B, Let
(r' (y(t»)' = w(t)x(t).
w(t) = (r l (y(t»)'/x(t) , B=-r'(y(O)),
i.e. ascertain the connection
weight and the activation threshold according to the given x(t) and yet) in advance, and then we can obtain
l
t
f [ (r 'x(1') (y(1')))' · x(1') } 1'+r l(y(O») ] f' ) 1 ( .b(w(1')x(1')f!1'-8=1.b =
I( 1'(r' (y(1'))}l1'+ r
' ( y(O»)) = 1(1-1 ( y(t»)) = yet).
52
Process Neural Networks
Thus, the mapping relationship of a unitary continuous compound function can be denoted as a process neuron. In this chapter, the definition and some basic properties of the process neuron were studied. As the temporal-spatial aggregation operators of the process neuron can be taken as diversified operating forms, and the activation threshold can be taken as a time-varying function or a parameter of other types, the process neuron ha-, many types, which constitute different mapping relationships between the inputs and outputs. Therefore, the process neural networks constituted by process neuronaccording to a definite topological structure have strong function transforming ability, and have prodigious flexibility and adaptability for modeling and solving practical problems.
References [I ] Tsoi A.C. (1994) Locally recurrent globally feedforward networks. A Critical Review ofArchitectures IEEE Transactions on Neural Network s 7(5) :229-239
[2] Ou Y.K., Liu W.F. (1997) Theoretical frame based on neural network of biometric-model of nerve cells. Beijing Biomedical Engineering 16(2):93-10 I (in Chinese) [3] Zhang L.I., Tao H.W., Holt C.E., Harris W.A., Poo M. (1998) A critical window for cooperation and competition among developing retinotectaJ synapses. Nature 395(6697):37-44 [4] Grzegorz S., Maria Z.S., Adam R. (2004) Building of rules base for fuzzy-logic control of the ECDM process. Journal of Materials Processing Technology 149(1-3):530-535 [5] Zhang X.L, Zhao H.W., Qi Y.M., Wang L. (2008) Grinding process fuzzy control system design and application based on MATLAB. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery 2008 3:311-315 [6] Shin S.D., Kim Y.G., Lee B.K., Bae Y.c. (2004) Design of fuzzy controller for the steam temperature process in the coal fired power plant. International Journal 01 Fuzzy Logic and Intelligent Systems 4(2):187-192 [7] AI-Wedyan H., Demirli K., Bhat R. (2001) A technique for fuzzy logic modeling of machining process. In: IFSA World Congress and 20th NAFlPS International Conference 5:3021-3026 [8] Zhou C.G., Liang Y.c., Yang Z.M. (2002) A fuzzy neural network based on fuzzy weighted reasoning method. Proceeding s of 2000 International Workshop on Autonomous Decentralized Systems 1:190-195 [9] He X.G. (1990) Fuzzy computational reasoning and neural networks. Proceedings of the Second International Conference on Tools for Artificial Intelligence pp.706-711 [10] Chen S.M. (2002) Weighted fuzzy reasoning using weighted fuzzy petri nets. IEEE Transaction s on Knowledge and Data Engineering 14(2):386-397
4 Feedforward Process Neural Networks
A feedforward process neural network is a basic process neural network model , and is an information-forward-propagation network model that consists of some process neurons including the traditional time-invariant neuron as a special case, according to certain topological structures. The inputs/outputs, the connection weights among neuron nodes and the activation thresholds of a feedforward process neural network can be time-varying functions. The network can memorize the structure parameters (or functions) and character parameters (or functions) learned from environments through training, and embody the procedural pattern characteristic and the transformation mechanism of the system. It has strong time-varying information processing ability and strong nonlinear mapping capability between inputs and outputs of time-varying systems. It has broad applicability for building models and solving practical problems, such as pattern recognition of process signals, dynamic system simulation and process control, etc. In this chapter, a general feedforward process neural network model and a process neural network model based on weight function basis expansion are introduced, and the basic properties of the network, such as continuity, functional approximation capability, computing capability, etc., are studied .
4.1 Simple Model of a Feedforward Process Neural Network For convenience of discussion, the multi-input-single-output network model with only one process neuron hidden layer is considered first. In fact, this model is easy to extend to the multi-input-multi-output situation . Suppose that the input layer of the process neural network has n nodes, the middle layer (process neuron hidden layer) has m nodes, and the output node is a common time-invariant neuron [I]. The network input is X(t) =(Xl(t), X2(t), .. ., xn(t» . Its topological structure is shown in Fig. 4.1. In Fig. 4.1, PN is the process neuron defined by Eq. (3.2); wilt) is the connection weight function from the input layer node i to the hidden layer node j of the
54
Process Neural Networks
y
Fig. 4.1 Process neural network with one process hidden layer
process neuron ; Vj is the connection weight from the hidden layer node j to the output node; g is the activation function of the output layer neuron; and y is the output of the system. If the special aggregation operation of the process neuron is defined as the weighted summary, the time (process) aggregation operation as the integral and unitary functional K(-)= I, then the (narrow-sense) process neural network can be shown as in Fig. 4.2.
y
Fig. 4.2 Narrow-sense process neural network
As seen from Fig. 4.2, the mapping relationship between the inputs and the output of the network is
(4.1)
where [0,
11 is the input process interval;
B?) is the activation threshold (which can
also be a time-varying function ) of the hidden layer node j ; f is the process neuron activation function; g is the output neuron activation function ; is the activation threshold of the output neuron threshold. Here f and g are both nonlinear function s. Thus, this process neural network model expresses a comple x nonlinear transformation mechanism. The model denoted by Eq. (4.1) shows not only the integrati ve dependent relationship of the output on n inputs (called "spatial aggregation " on inputs), but also temporal accumulation effects of the output to inputs (called "temporal
e
Feedforward ProcessNeural Networks
55
accumulation" on inputs). The spatio-temporal mapping relationship between the inputs and the output is necessary when describing many actual applications . For example, the dependent relationship of the yield of crops on illumination, temperature, moisture, fertilizer, etc. during growth is not a simple instant mapping relationship from point to point. It should be said that the yield of crops synthetically depends not only on various growth factors, but also on the accumulation effect of these factors on the crops during growth.
4.2 A General Model of a Feedforward Process Neural Network Now consider a general model of a feedforward process neural network with multiple hidden layers under the multi-input-multi-output situation and with multiple hidden layers. In this model, the network can include many different types of process neuron (with different temporal accumulation operators, spatial aggregation operators, activation threshold types and activation functions) hidden layers, where the network may include several time-invariant neuron hidden layers (a time-invariant neuron is a special case of process neuron). The structure of the multi-input-multi-output network with L hidden layers is shown in Fig. 4.3.
Y, Y2
Ym Fig. 4.3 A general model of process neural network
In Fig. 4.3, PNi is a process neuron in the ith hidden layer. Its spatial aggregation operator can be an operator such as weighted summary with multi-input signals, max or min weighted value, etc.; its temporal accumulation operator can be integral with variables, convolution, etc.; its activation threshold can be a time-varying function or value; and its activation function can be a Sigmoid function, a step function, etc. For the time-invariant neuron (L:, gk), its spatial aggregation operator, activation function gb etc. may have different forms. Therefore, we can build different feedforward process neural network models according to specific requirements . In a general model of the feedforward process neural network with multiple inputs, multiple outputs and multiple hidden layers, the information transfer between different neural hidden layers should satisfy the definition of the input/output signal
56
Process Neural Networks
type of each kind of neuron of the network model. If the model includes both process neurons and time-invariant neurons at the same time, according to the definition of the input/output signal type for the process neuron and time-invariant neuron and the forward-propagation characteristics of information flow in the multi-layer feedforward neural network model, the hidden layer with process neurons should generally be executed before the hidden layer with time-invariant neurons while feedforward process neural networks with many types of neurons are constructed.
4.3 A Process Neural Network Model Based on Weight Function Basis Expansion Because the inputs and connection weights of the process neural network can be both time-varying functions, and the process neuron has an accumulation operator on time (for a continuous system, generally taken as an integral operation), this makes the mapping mechanism and the computation course of a process neural network quite different from a common time-invariant neural network, and the computational complexity is also increased. Meanwhile, because of the randomness of the form of the connection weight function in the network, if the function types are not limited to a certain range, the weight functions are very difficult to confirm through the study of the training sample set. In order to solve this problem, consider one process neural network model whose network weight functions can be expanded by a group of known basis functions . Suppose that the network connection weights are continuous functions, namely, Wij(t)E C[ 0,1]. The basis function in space C [0, 71 has many optional forms, and there are good methods to represent the weight function as the expansion form of limited basis functions under the precondition of specified precision . Therefore, we might as well construct a process neural network model whose weight functions can be expanded by basis functions, and consequently, a process neural network which can be trained by virtue of the existing learning algorithm. Assume that E (x= X(t)E U c R" ,t E R,f(X)E V c R", and U is a compact set, here denote C( U,V;n,m) as a continuous mapping functional set from U to V. For convenience, let m=l, so we get a multi-input-single-output system (it can be easily extended to the situation of m> 1). Suppose that the weight functions in a process neural network can be expanded by a group of basis function B(t) belonging to U, i.e. limit the weight function forms to one correspondingly simple function class. The basis function B(t) may be finite or numerable, orthonormal or non-orthonormal. For a finite basis, the structure of the process neural network is shown in Fig. 4.4. In Fig. 4.4, first expand weight functions according to basis, and the operation rule in each middle sub-layer gives
FeedforwardProcess Neural Networks
57
y
Fig. 4.4 Process neural network based on weight function basis expansion L
w;Ct) = LwY)bl(t),
(4.2)
1=1
A(t) = Y = f(
r
L n
Wi
(t)Xi(t),
(4.3)
;=1
A(t)K(t)dt-O),
(4.4)
or in a general way we have
Y=f(K(A(t»-B),
(4.5)
where b 1(t),bit), ...,bLCt) are a group of finite basis functions in U; L is the number of basis functions; wi/) is the expansion coefficient of wiCt) corresponding to the basis function bJCt); B is the activation threshold of the process neuron. f is the activation function of the process neuron. The function K(·) (a functional in Eq. (4.5» can be determined according to practical needs. The activation functions in each hidden layer unit can be the same or not. A network possessing the same type of activation function is called a regular process neural network, and a network possessing different types of activation functions is called a mixed process neural network. The network denoted by Eqs. (4.4) or (4.5) is actually a process neural network model based on weight function basis expansion. Next, a process neural network model with a single hidden layer based on weight function basis expansion will be discussed . We still focus on a multi-input-single-output network whose topological structure is the same as that in Fig. 4.1. Each node unit in its middle hidden layer is a process neuron shown in Fig. 4.4, and the output node of the network is still a time-invariant neuron. The input-output relationship of the process neural network with a single hidden layer based on weight function basis expansion is
58
Process Neural Networks
(4.6)
where wy> is the expan sion coeffici ent of wij(t) corre sponding to the basis function
btCt). If in the output layer , g(u)=u, 0=0 and let
L wylb/(t ),
wij (t ) =
L
r:t
(4.7)
1=1
u/X (t )) =
wij (t )x;Ct)dt -O;n.
(4.8)
;=1
Then the process neural network with linear output is
y = i>J(u/X(t))).
(4.9)
j=1
As seen in Eq . (4.9), the proces s neural networks are a class of functional. In Eq. (4.6), wit) is repre sented as an expansion of a group of finite basis. In fact, the basis function can have variou s forms, i.e., finite or numerable basis, orthonormal or non-orthonormal basis, continuous or discrete basis. etc. Thus, the relationship between the input s and output of process neural networks based on weight function basis expan sion can generally be described as
W (t ) = W * B(t ), 0= fW (t ) *X(t )dt=W * fB (t ) * X (t )dt , y =g (Lv* ! (o-O(I))-O) ,
(4.10 ) (4.11) (4.12)
where B(t) is the basis function; W is the expansion coefficient of the basis function ; o is the result of the spatio-temporal aggregation operation on the input signals of process neuron ; and y is the output of the network.
4.4 Basic Theorems of Feedforward Process Neural Networks The properties of a process neural network are the theoretical foundation of its effecti veness for practical applications. In this section, some basic properties of the feedforward process neural network [2] will be discussed, and will be described in the form of theorems.
Feedforward ProcessNeural Networks
59
4.4.1 Existence of Solutions For any group of learning samples, the existence of solutions is determined by whether there exists a process neural network that satisfies the mapping relationship of the samples with certain accuracy. Theorem 4.1 Suppose that {Xk(t), dd (k=I,2,... ,K) are a group of independent identically distributed learning sample sets where Xk(t)E (qo, and dkE R. Then with given error precision, when there are enough hidden layer units, the process neural network based on weight function basis expansion defined by Eq. (4.6) has the solution which satisfies the input-output relationship and can be solved with a numerical computation method. Proof From Eq. (4.6), the relationship between the inputs and the outputs of a process neural network based on weight function basis expansion with a single process neuron hidden layer is
nr
y=g
Denote
~/ =
r
t r
[~VJ(~ w~l)
(b/(t)xj(t) )dt -
Bj'») - B).
(4.13)
b/(t)x;Ct)dt, then Eq. (4.13) can be rewritten as
(4.14)
Give K independent identically distributed learning samples {Xit), dd (k=I,2, ... ,K), where the kth sample input function is Xk(t)= (Xj(k )(t),X~k)(t),.··,X~k\t», and the expected output is di. Then with certain error precision, the mapping relationship between the inputs and the outputs of the process neural network should satisfy the following equation systems. (The inverse function exists because g is monotone increasing and continuous. We might as well suppose it is an identical function. At the same time, suppose 8=0 and denote Bjl ) = Bj )
(4.15)
60
Process Neural Networks
where
Aj;k) =
r
vj ' wjj) , OJ
are
the
adjustable
parameters
to
be
determined;
b/(t)x?\t)dt. If the number of the hidden layer nodes is not limited, when
2m+nxmxL+l?K, according to the nonlinear programming theory, the solutions of the above equation system exist. The specific solution course is given as follows . In fact, for the above nonlinear equation system we can adopt a numerical method (iteration approximation) to get an approximate solution satisfying accuracy demands. Define the objective function
(4.16)
where Q is the function with respect to the network parameters vj ' w~/) ,OJ'
If we denote W --
«I)
(2)
WIt' WIt
(L)
(I)
(2)
(I)
(2)
(L)
Ll
, "'WI 1 ,W21 , W 21 , " ' , W nm , W nm , •.• , W nm , V]' V 2," ', Vm,OI'
()
2" ' "
° m
)T'
then the objective function can be rewritten as K
Q(W) = :Lh;(W).
(4.17)
k=1
The above equation is a typical least-square optimization and we can adopt quite mature algorithms in a nonlinear optimization theory, e.g. adopt the Newton iteration algorithm [3J to seek the solution w" when Qequals zero . Suppose that WS ) is an approximate solution of the equation Q(w)=O. The objective function Q(W) is processed in a Taylor series expansion near to WS ) and a second-order approximation is adopted, then we have (4.18) where l5=W-W s ). According to the necessary condition of an extremum problem, the minimum point of Q(W) should satisfy the equation
V'q s(O)=O. From Eq. (4.18), the minimum point of q.(l5) is
(4.19)
FeedforwardProcess Neural Networks
If we denote W<s+I)=W<s)+ss), we get a new approximate solution Q(W)=O. The Newton iteration formula is
r
8(S) =- V'zQ(W (s) { W(s+l ) = W(s) +
s».
l
V' Q(W(s»),
61
w<s+l)
of
(4.21)
Next the gradient vector V'Q(w) and Hessian matrix V'zQ(W) of Q(W) are computed. If we import the vector function
then Eq. (4.17) can be rewritten as Q(W) = g(W)T g(W).
(4.22)
Define the Jacobi matrix of g(W) as a~
dW(I) II
J(W)
~
»:
ahK dW(Z )
ahK dW(L )
a~ aVI
aV2
ahK dVI
dhK avz
dW~;)
dWI(~)
a~
~
II
II
nm
~
d~
dhK dVm
ahK
ahK
ahK
a~
CJe2
CJem
= ahK aw(l)
d~
aVm
de2
(4.23) then the gradient vector of Q(W) is V'Q(W) = 2J(W)T g(W).
dZh dW(I)Z
__ k II
dZhk V'Zhk(W) = dw(l)aw(Z) II II
(4.24)
dZhk dw(2)dw(1)
dZhk dw(3)dw(1)
dZhk dw(Z)Z
dZhk dW(3)dW(Z)
II
II
II
II
II
II
II
62
Process Neural Networks
Hence the Hessian matrix of Q(W> is K
V 2Q(W) = 2J(W)T J(W)+2Ihk(W)V 2hk(W).
(4.26)
k =1
Consider the convergence of the Newton iteration Eg. (4.21) next. Although the Newton method has high convergence rate, it is only locally convergent. Only when the starting point WO) approximates the solution lV closely enough, can its convergence be ensured. Because lV is what is sought, it is difficult to verify whether WO) is close to w'. At the same time, Q{ WS ) } is not always monotonously decreasing , so we cannot ensure that {Ws ) } converges to w. In order to make Q{ Ws ) } monotonously decrease and satisfy the global convergence of {WS ) } , the method of dynamic modification on iteration step can be adopted. In Eq. (4.21), an import adjustable step factor as with Q(Ws)+a,<5(s)) =min Q(W(S) + a8(S ») a"O
and the gained algorithm is referred to as the damped Newton method [4] which has global convergence. The iteration formula for the damped Newton method is S'S) =-(J(W(S)? J(W(S)) +t~(W(S»)V2hk(W(S))J(W(sy g(W(s)), { W(s+1) = W(s) + asS'S),
(4.27)
in which as satisfies (4.28) From the foregoing analysis, when the number of the hidden layer nodes of the network satisfies 2m+nxmxL+l~K, solutions of Eq. (4.15) exist. Also, as Q(W>~O, the global optimal solution that the damped Newton method of Eq. (4.27) gives is the result of Eq. (4.15). Thus, the proof is completed .
4.4.2 Continuity To show the continuity of process neural network is to solve the problem of whether the mapping relationship of the process neural network is continuous . In other words, when the variation of the network inputs is very small, the variation of the outputs is also very small. Theorem 4.2 Suppose that the two inputs of a process neural network defined by Bq. (4.1) are respectively X(t),X'(t)E U c (C[O,T))", and the corresponding outputs are y, y'E VcR. If f, g are continuous, then for any £>0, there exists £5>0, when IIX(t)-X'(t)lk£5, Iy-y*k£ holds.
Feedforward ProcessNeural Networks
63
Proof In Eq. (4.1), denote W = max sup I W i} (t) I, I .j
O';t,;r
. - Jorr f'LJ
U } -
W;}
.
(I)
(t ) x; (t) d t - ()} .
;= 1
As g is continuous, then for any 00, there exists 151>0, when
ly-y*l<e holds. In the following, we will prove that for 15 1>0, there exists 15>0, when IIX(t)-X*(t)II<J, there is
Becausef is continuous, for 151>0, there exists 152>0, when (4.29)
we have (4.30) where V = max(1 , Vi I). Thus, whenever X(t), X*(r) and the selected15>0 satisfy IIX(t)- X*(t)11 < 0 < O2 / (n ·T ·W),
(4.31)
we have
Thus, the proof is completed. As we all know, a traditional neural network is a continuous model. Actually, a traditional neural network is a special case of the process neural network.
64
Process Neural Networks
Theorem 4.3 network.
Proof In
The traditional neural network is a special case of the process neural
y=
g[~Vjf(
r(t
Wij(t)X;Ct))dt-Ojl)
)-0)'
if we let
T=O,
x;(t)=x; and w;/t)=wij, then this can be simplified as
Thi s is a time-invariant traditional feedforward neural network with a single hidden layer. Thus, the proof is completed.
4.4.3 Functional Approximation Property Functional approximation capability is an important property of a process neural network, and it determines the applicability and the modeling capability of the process neural network for solving problems. In order to discuss the functional approximation property of the process neural network, two definitions are given as follows. Definition 4.1 Suppose that K(·):R n- . VcR is a random continuous function from R" to R, and is denoted as KE qRn). Define the functional class 'L\K)=
lfU-.V1 f(x(t)) =
r
K(x(t))dt,x(t)E UcR\f{x)E VcR} .
Definition 4.2 Suppose that X(t)=(Xt(t), xz(t), ..., Xn(t))T where X;(t)E qo, T], i=l, 2, . . ., n. If 1x;(tt)-x;(tz)I:::L1t t-tzl with I..;::.O for any tlo tzE [0, T], then x;(t) is said to satisfy the Lipschitz condition; If IIX(tt)-X(tz)II:::;Lxltl-tzl with L20, then X(t) is said to satisfy the Lipschitz condition; if IK(X(tl))-K(X(tz))II:::;LKIIX(tl)-X(tz)lI, then K(·)E C(Rn) is said to satisfy the Lipschitz condition. Research on the traditional neural network has already proved the following well-known approximation theorem. Lemma 4.1 [4J For any continuous function gE C(Rn) there exists a feedforward neural network with only one hidden layer, which can approximate g with any chosen accuracy. Theorem 4.4 (Approximation Theorem 1) For any continuous functional G(x(t)) E'Ln(K) defined by Definition 4.1 and any c:>0, if G(x(t)) satisfies the Lipschitz condition, then there exists a process neural network P such that IIG(x(t))-P(x(t))lkc:. Proof For any GE 'Ln(K), that is
G( x(t)) =
r
K(x(t))dt.
T=l, K
(4.34)
Without loss of generality, let is regarded as the composite function with respect to t, and the integral interval is divided into N equal parts , here t;=i/N (i=l ,
FeedforwardProcess Neural Networks
65
2, . .., N) is the partition point , then N
G(x(t»)= L
i =1
f
Let functional G(x(t» = ~ K(x(t) N i=\
l' K(x(t»)dt. t
(4.35)
i-I
be the approximation of G(x(t», then
t
I
_ IN . 1 N IG(x(t»)-G(x(t»)1 = ~ K(x(t»)dt- N~K(X(ti»)
s
tit
K(x(t»)dt-
~K(X(t;»)1
(4.36)
Because K(x(t» is continuous with respect to t, by the interval mean value theorem, there exists ~;E [(i-l)IN, i/NJ such that (4.37) Therefore,
(4.38)
where L K and L, are respectively the Lipschitz constants of K(x) about x and x(t) about t. Therefore,
G(x(t») =
1K(x(t»)dt=-N1 LK(x(t))+O(l/ N). r
N
(4.39)
i=O
Denote X(ti)=X(i). Becau se K(x(i»:Rn ---+V is a continuous function in C(Rn ) , according to Lemma 4.1 , it can be approximated by a traditional neural network, and based on Theorem 4.3, this traditional feedforward neural network can certainly be replaced by a process neural network Pi, i.e. (4.40) where f:i>O is an arbitrarily small value, i=l, 2, ... , N. We might as well let f:;
No, and we have
66
Process Neural Networks
I
I LK( N e G ( X(t ))-x(t)) <-.
I
Denote
N
i:O
(4.41)
2
(4.42) then IG( x(t ))-P( x(t ))I= G(x(t ))-
I
s
N G(X(t ))- NI ~K(X(t))
I
III
I N
I N
I N
N~K(X(t))+ N~K(x(ti ))- N~P;(X(t))
I
N (X(t)) - NI ~P; N (X(t)) I + N ~K
< e / 2 + e / 2 = e. (4.43) P(x(t)) is solved.
Lemma 4.2
There exists a continuous functional K( ·) such that for any X(t)=(x ](t) ,
X2(t ),.. .,xn(t)) and x*( t) = (x; (z), x; (t) ,..., x: (t)) , when x(t):j.x*(t), K(x(t)}fK(x*ct))
holds.
Proof Consider the ith component Xj(t)E qo, 11 of x(t), according to the universal approximation theorem for a polynomial [5], Xj(t) can be approximated by the following polynomial series Xi (t) =
L
a i/ j
(4.44)
,
j:O
where a jjER, i=l , 2, ... , n;j=O, 1,2, ... , and we may as well assume (4.45) where dijE {O, l}, PijEN is the natural number set, and SjjE (0, 1). Let n . ...n··u«, E {0,1,...,9 }, k=l, ...,N , PI}·· = n..YoY, N ; n'Jo E {O, l }, n.. 'he. J Sij = suo SUI ... su• .. ., S'11 E {O, 1, ... , 9}, k=O,I, ... aij (j=I, 2, ... ) are arranged as follows niONO
SiOo
SiO,
nil NI
Silo
Sil,
Sill
n'I 2I
ni2N,
silo
Si2 1
Si2.
nij,
n ijNj
s..'io
s..IlJ
sih
aiO:
djO;'Pi yiOY
ajl :
~1 /nilO/ill
aj2:
di2
n1lo
aij:
dij
nYo.
I
".
...
SiO•
... (4.46)
...
FeedforwardProcess Neural Networks
67
Let
That is, the decimal parts of z are arranged according to the arrowhead from a top left corner to a bottom right corner in the array of Eq. (4.47) and i is arranged from 1 to n. Obviously, ZE [0,1]. The mapping relationship is denoted as K and is clearly a one-to-one mapping. It can be seen from the choice method for Z that K(·) is continuous. Thus, the lemma is proved. Theorem 4.5 (Approximation Theorem 2) For any continuous functional G(x(t»: UeC(Rn)~VeRn and e>O, if the input function x(t) can be expanded to the form of a series such as Eq. (4.44), then there exists a process neural network P such that IIG(x(t»-P(x(t»ll<e.
Proof According to the construction method of Lemma 4.2, a continuous function, whose inputs are like Eq. (4.44), can be mapped into a real number defined by Eq. (4.47). Consequently, the functional approximation problem of a process neural network is converted to the function approximation problem of a traditional neural network. According to Lemma 4.1, the theorem can be proved.
4.4.4 Computing Capability
Theorem 4.6 The computing capability of narrow-sense process neural networks is equivalent to that of a Turing machine. Proof In the narrow-sense process neural network defined by Eq. (4.6), the used operators are "+", ":", "I" and their compound operations. All these operations can be implemented by a Turing machine (the machine only needs to seek an approximate value for the "I" operation), so the computing capability of a process neural network will not be more than that of a Turing machine. On the other hand, the computing capability of a neuron with linear threshold is equivalent to that of a Turing machine [6J, and the neuron with linear threshold is a special case of a process neuron, so the computing capability of the process neural network will not be less than that of a Turing machine. Hence, the computing capability of the process neural network is equivalent to that of a Turing machine.
4.5
Structural Networks
Formula
Feedforward
Process
Neural
In practical signal processing, usually there are many time-varying process signals with a singular value, such as various pulse signals which electronic instruments and electronic components generate, e.g. the electrocardiogram signals in a health check,
68
Process Neural Networks
etc. Learning and generalization of a time-varying function with a singular value are always difficult to solve in artificial neural networks. Generally, the number of hidden layers of the network and the number of neuron nodes of hidden layers are increased to satisfy the control precision of the system output error. In this way, the learning time is longer, the network redundancy is increased and the stability and the generalization capability are worse . Aiming to solve this problem, a structural formula process neural network model can be constructed, and it originates from the rational function approximation property in function approximation theory and the nonlinear transformation capability of the process neural network for a time-varying function. In the course of function approximation, the function approximation property and the fitting capability of the structural formula are much greater than a linear function . Similarly, a structural formula process neural network is superior to a common process neural network in flexibility when approximating a process function with singular values and the reaction sensitivity near to singular value points, and it can strengthen the learning quality and the generalization capability on time-varying sample functions with singular values . Structural formula process neural networks are one generalized form of the process neural networks of a model structure.
4.5.1 Structural Formula Process Neurons The structural formula process neuron is composed of two process neurons that appear dually. In logic, it is divided into a numerator and a denominator, and the output is got after rational formula combination. Its structure is shown in Fig. 4.5. In Fig. 4.5, the process neuron PNu represents the numerator part of a structural formula process neuron; PNd represents the denominator part of the structural formula process neuron ; WiuCt) is the connection weight function from each input node to the numerator process neuron node in the dual layer; viit) is the connection weight function from each input node to the denominator process neuron node in the dual layer. The structural formula synthesis node takes PNu output as the numerator and PNd output as the denominator, and then synthesizes them into a structural formula form; y is the output of the structural formula process neuron.
Fig. 4.5 Structural formula process neuron
Feedforward ProcessNeural Networks
69
4.5.2 Structural Formula Process Neural Network Model Consider a structural formula process neural network model that has an input layer, a structural formula proce ss neuron dual layer, a structural formula synthesis layer, and an output layer I7J. It can be regarded as an improved and generalized model of a process neural network , and its structure is shown in Fig. 4.6 .
y
Fig.4. 6 Structural formula feedforward process neural network In Fig. 4.6, the solid line represents the connection between the node in the input layer and the numerator unit in the dual layer, the dashed line represents the connection between the node in the input layer and the denominator unit in the dual layer . The input layer has n nodes between input time-varying functions X(t)=(XI(t) , X2(t), ... , xn(t». The proce ss neuron dual layer has m units with each unit divided into a numerator part and a denominator part in logical structure, and the numerator and the denominator process neuron appear in pairs . wit) and Vi/t) are respectively the connection weight functions from the input layer node to the process neuron nodes of the numerator part and the denominator part in the dual layer (i=l, 2, .. ., n; j=l, 2, . .., m) . The structural formula synthesis layer has m units that complete the structural formula synthesis between the output of the numerator process neuron and the output of the denominator process neuron in the dual layer (that means to do division). The output layer is an ordinary time-invariant neuron node, Uj is the connection weight from the synthesis layer to the output layer, and y is the output of the network . The structural formula process neural network is quite suitable for the situation that the output of an approximated object has a singular value. Actually, when a time-varying process signal is processed, the input signals of the system have an activation effect, and simultaneously may have an inhibition action. Here, the numerator process neurons of a processing unit in the dual layer are used to receive activation information from input signals and the numerator process neurons are used to receive inhibition information. The activat ion information and the inhibition information are synthe sized into a structural formula by the structural formula
70
Process Neural Networks
synthesis layer, which can improve the efficiency of the network in extracting process pattern characteristics and enhance the reaction sensitivity of the process neural network model near the singular value. The relationship between the inputs and the outputs of each layer in the structural formula process neural networks is as follows . The inputs of the network are
The output of the numerator unit in the process neuron dual layer is
(4.48)
The output of the denominator unit in dual layer is (4.49)
The operation in the structural formula synthesis layer is (4.50)
The output of the output layer node is (4.51)
In Fig. 4.6, if vij(t)=O and hdj=l , then structural formula process neural networks are simplified to ordinary process neural networks, and a narrow-sense process neural network is a special case of the structural formula process neural network. Therefore, Theorem 4.7 is tenable. Theorem 4.7 Structural formula process neural networks have all the properties in Theorems 4.1-4.6, which are existence, continuity , functional approximation capability and computing capability .
Feedforward Process Neural Networks
71
4.6 Process Neural Networks with Time-varying Functions as Inputs and Outputs In practical problems such as real-time system control and process simulation [8,91, both the inputs and the outputs of many systems are time-dependent functions . In order to build the simulation model of these problems, traditional methods generally need to establish and resolve complicated mathematical models or experiential statistical formulas. However, some problems are complex, nonlinear systems and are affected by many factors . Especially in some new research fields, due to the lack of a transcendental theory and knowledge, there exist troubles such as hard modeling, bad adaptability, great solving difficulty, etc., when traditional methods are adopted. A process neural network model with time-varying functions as its inputs and outputs will be constructed next, which can effectively realize the complex mapping relationship between the inputs and the outputs of a system by using the nonlinear mapping capability on time-varying input and output signals that the process neural network has [10, III .
4.6.1 Network Structure Suppose that the inputs of a process neural network are X(t)=(XI(t), X2(t), ... , xn(t)), and the expected output is yet) (a multi-input-single-output system which in fact can be easily extended to a multi-input-multi-output system), where x,(t), y(t)E qo, T]. bl(t), b 2(t), ... , bL(t) are a group of orthonormal function basis which satisfy fitting precision requirements. Assume that the expansion of the expected output yet) based on these basis functions is L
yet) = Le/bl(t).
(4.52)
1=1
The process neural network to solve problems with time-varying inputs and outputs is designed as a four-layer feedforward model, and its topological structure is n-m-lr-t, The input layer has n nodes for fulfilling the input of n time-varying functions XI(t), X2(t), .. ., xn(t) to the network . The first hidden layer is a process neuron layer consisting of m node units for completing spatial weighted aggregation and temporal accumulation operation to n input functions and the extraction of sample process pattern characteristics. The second hidden layer is a common time-invariant hidden layer, and has L node units (here, L is the number of basis functions in the yet) expansion) for approximating the coefficient vector in basis function expansion of the output y(t) of the system. The fourth layer is the output L
layer and the output is 'IIllbl(t). Fig. 4.7 shows the topological structure of the 1=1
network.
72
Process Neural Networks
yet)
Input
Time-invariant hidden layer
Process neuron hidden layer
Compositive output layer
Fig. 4.7 Process neural network with time-varying functions as its inputs and outputs
As seen from Fig. 4.7, the relationship between the inputs and the outputs of each layer of the network is as follows. The inputs of the system are X(f)=(XI(f), X2(f), ... , xn(t)), te [0, 1].
The outputs of the first hidden layer are (4.53)
where
YY)
is the output of the jth neuron in the first hidden layer; wij(t) is the
connection weight function between the input layer and the first hidden layer;
Bj' )
is the output activation threshold of the jth neuron in the first hidden layer; [0, T] is the input process interval; andfis the activation function in the first hidden layer. The outputs of the second hidden layer are
y?) = i:Vjlyj\) ,
1=1 ,2,...,L,
(4.54)
j=\
where
y?>
is the output of the lth neuron in the second hidden layer,
Vj/
is the
connection weight from the neuron j in the first hidden layer to the neuron I in the second hidden layer. Synthesize the output layer: Assume that bl(t) (1=1,2, ...,L) is the basis function in Eq. (4.52) and then the system output is
I
L
y(f) =
1=\
yj2)b/ (f).
(4.55)
Feedforward Process NeuralNetworks
73
According to Eqs. (4.53)-(4.55), the mapping relationship between the inputs and outputs of process neural networks with time-varying functions as its inputs and outputs is
(4.56)
In this model, weighted summary is adopted as spatial aggregation and integral on time as temporal accumulation. In fact, other appropriate aggregation operators can be used in actual instances. Hence, the above model actually gives a class of process neural network that can be employed in different situations.
4.6.2 Continuity and Approximation Capability of the Model
nr
Theorem 4.8 (Continuity Theorem)
Suppose that X(t) , X*(t)E Uc(C[O, are two input functions of the process neural network P defined by Eq. (4.56), and the corresponding outputs are respectively yet), y *(t)E VcC[O, 1]. Ifjis continuous, then for any 00, there exists £5>0, when IIX(t)-X*(t)II<£5, lIy(t)-y*(t)ll<£ holds. Proof In Eq. (4.56), let
w = maxlw~)I, V = max hi, B = max sup Ibl(t)l , /,J, I
J,1
I
O~t~T
Becausejis continuous , for any £1>0, there exists £5 1>0, when
It.[(t w;'b,(t))X;(tldt-o;" -(t.[(t. wi"b,Ct}:(tldt-o;")1
= =
t
I~ r( w~l)bJt) } X;(t)-X;(t) )dt!
s~ n
I
~lw~llbl(t)I 'lx;(t)-x;(t)1 dt
,( L
:S;n ·T ·L ·W ·B ·5<~ ,
)
74
ProcessNeural Networks
we have (4.57) Here, if 0>0 satisfies IIX(t)-X*(t)ll
I~ vj/f(u ~ vj/f(u :1)1 = I~ v j, ) -
=
Ihl·lf(u
jl
j, ) -
(J(u j l ) - f(u :,))1
f(u:,)1 < m-V· cp
j='
i.e.,
Iy?l - y?l'l < m ·V ·c,.
For any £>0, let £\=G!(L'm'V·B), when IIX(t)-X' (t) I1<0, we have Ily(t) - y' (t)11 =
lit
y?lb , (t) -
t
yi
2l'b
, (t)!1
=!It(y?) - y?l')bl(t)ll::; t l yi2) - y?l'I ·B < L im- V
-e.»
Thus, the proof is completed. Theorem 4,9 (Approximation Theorem 3) For any continuous functional G(X(t))e F: Vc( C[O,11r-. VcC[O,11 and £>0, there exists a process neural network P, which is defined by Eq. (4.56) with time-varying functions as its inputs and outputs, which satisfies IIG(X(t))-P(X(t))II<£. Proof Let X(t)e V, y(t)e V, and G(X(t))=y(t). For any 8>0, bl(t), b2(t), ... , bL(t) are a group of orthonormal function basis in C[O, 11 which satisfy the fitting precision demand of yet), and
From the approximation Theorem 4.5, for any continuous functional G\: V\cC(R n)- . V\cR and 8\>0, there exists a process neural network PI which satisfies
Theorem 4.5 can be extended to the situation of the process neural network with multiple inputs and multiple outputs: for any continuous functional Gz: VzcC(R n ) - . VzcR L and 81>0, there exists a process neural network Pz satisfying
Feedforward ProcessNeural Networks
Due to C= (Ch
CZ, ••• , CL) E
75
VZ, there exists Pz which satisfies
Define a process neural network P P(X (m =
~ (X
(m *B(t ),
in which B(t)=(bl(t), bz(t), ... , bL(t)), and "*,, denotes the inner product operation. Denote B = max sup{b, (t), bz(t), ...,bL (t)} . From the definition of G(X(t)) and P, I
OSI ST
we have
IIG(X (m- P(X (mil =Il y(t) - P(X (mil=Il y(t) - C *B(t) + C *B(t) - P(X (mil ~ liYCt) - C * B(t)/I + Ilc * B(t ) - P(X (mil~ 8/2 +Ilc * B(t ) - Pz(X(m* B(t)11 = E! 2+II(C - ~ (x(t))) * B(t)11 ~ 8/ 2+llc - Pz(X (t))II· B ~ 8/2+81 • B.
Here, let E:I=d(2B), we have IIG(X(t))-P(X(t))lkE:. Thus, the proof is completed.
4.7 Continuous Process Neural Networks In this kind of process neural network, first the process neuron with time continuous functions as its inputs and output s is defined . Its spatial aggregation operator is still defined as the spatial weighted summary of multi-input time-varying signals, and temporal accumulation operators are taken as parameter-varying integrals with time. In this way, the aggregation/accumulation operations and the activation mode of process neurons can simultaneously reflect the spatial aggregation effect of external time-varying input signals and stage time accumulative effect in an inputting course, and can also implement the nonlinear real-time (or some time unit delay) mapping relationship between inputs and outputs [IZ1. These process neurons can constitute a complex process neural network with multiple hidden layers conforming with certain topological structures. Using the nonlinear transformation mechanism which the artificial neural network has to establish the mapping relation ship between time-varying inputs and outputs in the comple x nonlinear continuous system directly , gives it broad adaptability for many problems whose inputs and outputs are both continuous time function s in practical applications.
76
ProcessNeural Networks
4.7.1 Continuous Process Neurons A continuous process neuron is defined as a process neuron with continuous time functions as its inputs and outputs . This process neuron is composed of the operations of time-varying input signal weighting, spatial aggregation, temporal accumulation, and activation output. The spatial aggregation operator adopts multi-input signal weighted summary and the temporal accumulation operator adopts parameter-varying integral with time. The structure of the continuous process neuron is shown in Fig. 4.8.
Fig. 4.8 Continuous process neuron In Fig. 4.8, Xl(t), X2(t), .. ., xn(t) are the continuous time-varying input functions of a process neuron; w,(t), W2(t), .. ., wit) are corresponding connection weight functions respectively; "I:" is the spatial aggregation operator of the process neuron, and is taken as the weighted summary of multi-input signals; "Jr" is the temporal accumulation operator of the process neuron, and adopts parameter-varying integral with time;f(·) is the activation function which can be a Sigmoid function, a Gauss function, or also any other form of bounded function, etc. In Fig. 4.8, the mapping relationship between the inputs and the outputs of the continuous process neuron is (4.58)
where (}(t) is the activation threshold of the process neuron, and is a time-dependent function. In fact, the spatio-temporal aggregation operator of the continuous process neuron can also have other peroration forms, e.g. corresponding to the input time point t, the spatial aggregation operation can adopt a maximum or a minimum operation, and an S-operator or a T-operator; the temporal accumulation operator can be a convolution, a maximum or a minimum operation, etc. in the interval [0, t). As seen from Eq. (4.58), the continuous process neuron model expresses both the spatial weighted aggregation of time-varying input signals and the accumulation of phrase time effect on inputted time-varying signals before time t, and can realize the synchronous mapping relationship between the inputs and the outputs. Taking account of spatio-temporal aggregation with several time unit delays, Eq. (4.58) can be extended and rewritten as
Feedforward Process NeuralNetworks
77
(4.59)
where b is time granularity, k is a non-negative integer, and t-kO?O. The process neuron defined by Eqs. (4.58) and (4.59) can be used to establish a complex process neural network model with multiple hidden layers, and the time-varying information flow transfers in a real-time or a lingering mode in each layer of the network in this model.
4.7.2 Continuous Process Neural NetworkModel According to a certain topological structure, some process neurons defined by Eq. (4.58) or Eq. (4.59) and other types of neurons can construct process neural networks with continuous time functions as inputs and outputs . Neurons with the same type have the same structure, share a theory and a learning algorithm and carry out the same aggregation/accumulation operation in the network. At the same time, information transfer between hidden layers of neurons should meet the needs of the input/output signal type corresponding to each kind of neuron in the network model. In order to simplify the discussion, consider a feedforward continuous process neural network model with a process neuron hidden layer defined by Eq. (4.58) whose activation function for an output layer is linear. Fig. 4.9 shows the topological structure of the network .
y(t)
Fig. 4.9 Continuou s process neural network
In Fig. 4.9, XI(t), X2(t), ... , xn(t) are continuous input functions of the process neural network; wij(t) (i=I,2, .. . ,n; j=I,2, ... ,m) is a connection weight function between an input layer node i and a hidden nodej; vit) (j=1,2,... ,m) is a connection weight function of the hidden node j to an output node which can also be a time-invariant adjustable parameter; yet) is the output of the system. According to Fig. 4.9, the mapping relationship between the inputs and the outputs of the network is
78
ProcessNeural Networks
where [0, 11 is the input process interval of time-varying signals ; f is the uniform activation function of the hidden process neurons; 8j (t) is the activation threshold function of the hidden process neuron node j.
4.7.3
Continuity, Approximation Capability of the Model
Capability,
and
Computing
Theorem 4.10 (Continuity Theorem) Suppose that two inputs to the continuous process neural networks defined by Eq. (4.60) are X(t) , X*(t)E Uc(C[O, 11r and the corresponding outputs are respectively y(t), y' (t)EZC C[O, 11. Iffis continuous, then for any e>O, there exists £5>0, when IIX(t)-X*(t)II<£5, lIy(t)-y'(t)ll<e holds. Proof In Eq. (4.60), denote W = max sup IWij(t)I,
(4.61)
V = max sup h(t)I,
(4.62)
', J
Uj(t) =
rt
J
05t 5T
05 t 5T
wij(r)xJr)dr-Bj(t),
(4.63)
;=1
(4.64)
From Eqs. (4.63) and (4.64), for any tE [0,
IIU j(t)
s
n we have
-u;(t)11 = I r( ~ wij(r)( xi (r ) - x (r») }rll
r( ~WIIX(t)-
i*
(4.65)
X * (t)II)dr s n- W ·T ,IIX (t ) - X * (t)II ·
Thus, for any £5\>0, as long as X(t), X*(t) satisfy (4.66) then (4.67) Because f is continuous, for any £52>0, from the randomness of £5\ , there exists
FeedforwardProcess Neural Networks
79
(4.68) Therefore
Il y(t)- y' (t)11 = II~ v/t)f( r~ wi/r)X;cr)dr-B/t))- ~ v/t)f( r~ wij(r)x; (r)dr-B/t) )11
=II~ v/t)(J(u/t))- f(u;(t) ) 11 s V~llf(Uj(t))- f(u ;(t))11 < V ·m·o
2•
(4.69) So for any 00, if the selected J>O make s J2
Il y(t) - y' (t)11 < E.
(4.70)
Thus, the proof is completed. Theorem 4.11 A traditional feedforward neural network and the process neural network defined by Eq. (4.1) are both special cases of continuous process neural networks defined by Eq. (4.60). Proof Assume that the input process interval of the process neural network defined by Eq. (4.60) is [0, T]. If we let T=O, and in Eq. (4.60), let Xi(t)=Xi, wij(t)=wij, v/t)=Vj, B/t)=Bj, then Eq. (4.63) is simplified as
(4.71)
This is a traditional feedforward neural network with a time -invariant single layer and a linear output. In Eq. (4.60), if the integral interval of the process neuron is fixed as [0, T], the activation threshold B(t) is taken as a time-invariant parameter B, the connection weights between hidden layer process neuron nodes and an output node are time-invariant parameters. The process neural network defined by Eq. (4.60) can be simplified as
(4.72)
80
Process Neural Networks
This is the feedforward process neural network with single hidden layer defined by Eq. (4.1). Thus, the proof is completed. Theorem 4.12 (Approximation Theorem) For any continuous functional G(x(t)): Uc(C[O,11r--+ZcC[O,11 and e>O, if the function yet) in Z satisfies the Lipschitz condition, i.e., ly(tl)-y(tz)I:SLylt,-tzl (where Ly>O is the Lipschitz constant), then there exists a process neural network P defined by Eq. (4.60) such that
IIG(x(t»-P(x(t»ll<e. Proof Choose any y(t)E Z, because yet) is continuous and satisfies the Lipschitz condition, then for any e>O, [0, 11 can be properly divided into N parts which satisfy ly(t)-y'(t)l
n
(tk-l+tk-l)12E R, k=1,2,... ,N for any tE (0, nand tE (tk-" tk), where U, is a piecewise function set defined in the sub-interval [tk-" td . From Theorem 4.5, for the functional (x(t)), there exists a single hidden
layer process neural network
G;
P;
(whose inputs are time-varying functions or
function vectors and whose output is a time-invariant value) which satisfies (4.73) where x(k)(t)=x(t), te [tk-" td ; k=l ,2, .. . ,N. The function approximation capability of the neural network is usually in direct proportion to the number of network hidden layers and the number of nodes in the hidden layer, as a result, the number of hidden layer nodes can be chosen properly, so that P; (k=I,2, ... ,N) i~ caused to have the same topology structure and to satisfy Eq . (4.73) at the same time. (X(k ) (t» - p;(X( k ) Denote
G;
«» = e..
Define the functional in the interval [0, 11 (4.74) (4.75) then
L c,(x(t». N
G(x(t» =
(4.76)
k=1
Denote
L G;(x(t», N
G"'(x(t» =
k=1
(4.77)
Feedforward ProcessNeural Networks
81
then IIC(X(t» -C'(X(t»11 s
fliCk(x(t» -C; (x(t»11 s N ._2NT c_ .r <~. 2
(4.78)
k =1
First, the process neural network P, with continuous functions as its inputs and outputs is constructed in the interval [tk-I> td. P, is constructed by adopting process neurons defined by Eq. (4.58) and its input process interval is [tk-I> td. Its network structure, connection weight function , and activation threshold are the same as those of P;. Obviously , Pk is the process neural network (P k:C[tk- l, tklr---+C([tk-I> tkl) defined in [tk-I> tkl with continuous time functions as its inputs and outputs. From Eq. (4.60) and the integral mean value theorem, we have
ll: (X(k )(t» =
t
vjk) f
=
t
f( t ~
vjk)
wjjk )(T)x?)(T)dT- BY»)
(~W~k) (~k )x? )(~k )(t - tk_
(4.79) 1) -
BY»).
Sk E [tk-l .t, l,
where -: and wjjk)(t) are the connection weights (functions) and By ) is the
P;. Moreover
activation threshold value of the process neural network C;(X(k)(t» = p;(X(k)(t» +ck =
=
~ vjk)
f( t (~Wjjk)(T)Xj(k\r)
~ vjk) f (~Wjjk) (;k )Xj(k\;k )(tk - tk_
1) -
}T- B;k))+ck
(4.80) B;k») +e. . ; k E [tk _l't kl,
so
Ilc;(X(k )(t» -t; (X(k )(mil
lit (~Wjjk) -t (~Wjjk) (~k ~ lit (f (~W~k ) vjk) f
=
(;k )x? )(;k )(t k - tk_l ) - By ))
vjk) f
vjk )
-f
(t
)Xj(k)
(~k)(t - tk_
wjjk)
(~k )Xj(k) (~k )(t - tk_
By) )11
l) -
(;k )Xj(k )(;k )(tk - t k_l )
l) -
+ e,
-
By) )
BY») )11 + e..
(4.81)
82
Process Neural Networks
As/is continuous, for given d(2N), there exists 6\>0, such that
I~ W~k) (~k )X;k) (~k )(f - fk_
1) -
s I~(W~k) (~k )x? l (~k) -
OJ-
(~Wijkl (;k )x? )(;k)(fk - fk_
wijk l (;k )x? )(;k) )!'If k -
l) -
JI
OJ
fHI < 8..,
there is
liCk (X(k) (f» -
(t»II::; e / (2N),
~ (X(k)
(4.82)
here only need Ifk-fk_11 to be small enough. Next, a process neural network P with continuous functions as its inputs and outputs is constructed in the interval [0, Denote
n
n
Define an activation function as g(f, A): for any fE [0, when fE (fk-h fk], g=j( Ak) where Ak is the spatio-temporal aggregation operation result for input signals in the interval Uk-I. f] of the process neuron defined by Eq. (4.58) without activation threshold; / is the Sigmoid function, the Gauss function or any form of bounded function ; k= 1,2,.. . ,N. A process neural network P is defined like this: Wi/f) is the connection weight function between an input node and a hidden layer node; Vj(f) is the connection weight function between the hidden layer node and the output node; the activation threshold function is O/f) ; and the activation function is g(f, A), that is P(X(f» =
L Pk(X(k)(f», N
k =!
Feedforward ProcessNeural Networks
83
and then
IIG (x(t)) -
p( x(t) )11
= IIG(x(t)) - G''(x(t)) + G"(x(t)) - P(x(t))11 = IIG(X(t))-
~G:(X(k)(t))+~G:(X(k)(t))-~~ (X(k)(t))11
~IIG(X(t))- ~G:(X(k)(t))II+II~G:(X (k)(t))- ~~ (X(k)(t))11
2:IIG ( X(k )(t))_~ (x (k)(t))11 N
<£/2+
k
k =\
< e /2 + N -E / (2N) = e. P(x(t)) is just what needs to be solved. Thus, the theorem is proved.
Theorem 4.13 (Computing Capability Theorem)
The computing capability of a continuous process neural network defined by Eq. (4.60) is equivalent to that of a Turing machine. Proof The computing capability of the process neural network cannot be greater than that of a Turing machine. At the same time, from Theorem 4.3, the computing capability of the process neural network defined by Eq. (4.6) is equivalent to that of a Turing machine. From Theorem 4.10, the process neural network defined by Eq. (4.6) is a special case of the process neural network defined by Eq. (4.60), hence the computing capability of the process neural network defined by Eq. (4.60) cannot be less than that of a Turing machine. As a result, the computing capability of the continuou s process neural network is equivalent to that of a Turing machine.
4.8 Functional Neural Network Each feedforward neural network model used at present, such as a time-invariant neural network, a fuzzy neural network, various process neural networks, etc., has different structure characteristics and information processing methods, and the types of input/output variables are also different. However, from the viewpoint of functional, all of them can be regarded as functional operators that denote certain input/output mapping relationships in some functional spaces. Therefore , all the discussed neural networks can be generalized into a uniform neural network model, i.e., a general functional model of the neural network. Various embodiments of this abstract functional neural network model will give multifarious specific neural networks including the process neural network.
84
Process Neural Networks
4.8.1 Functional Neuron In this section, a functional neuron whose inputs and outputs are both points in some functional space or other (such as distance space, normed linear space, Banach space, Hilbert space, etc.) is represented and has corresponding activation functional and aggregation operators. In practice, it is a mathematical abstraction of all the researched neuron models thus far. From a mathematical viewpoint, all the existing neurons can be considered as its special cases. The structure of the functional neuron is shown in Fig. 4.10.
y
Fig. 4.10 A general model of a neuron
In Fig. 4.10 , X=(Xto X2, ... , xn) is the input of the neuron , y is the output , and they are respectively points in functional space A, B (for example, A, B can be a real number set, time-varying function space or fuzzy set space) . M=(M" M 2, .. . , M n ) , C, and T are respectively revision, aggregation, and activation operators in corresponding space . For instance, the revision operator M can be a weighted operation, min operation, or the membership function operation; the aggregation operator C can be an accumulative summary operation, max operation, T-mode operator operation, or an operator sequence (e.g. the spatio-temporal aggregation operator sequence of process neural networks "E8", " <8>") operation or multivariate aggregation operator, etc.; the activation operator T is usually a linear or nonlinear functional, etc. When operators in a functional neuron model are replaced with some specific instances, they can constitute various neurons that have already been discussed and researched up to now. The relationship between the inputs and the outputs of a functional neuron is (4.83) The specific forms of several familiar neuron models are given next. Example 1 Multi-input-single-output MP neuron model. This corresponds to the situation in the general model of a functional neuron in which A, B are real number sets, M(X)=W*X, C=L, T=f Here, W is a weight vector, L is the summary operation, andfis an activation function which generally adopts Sign or Sigmoid functions. Example 2 Fuzzy neuron with real number inputs. This corresponds to the situation in the general model of a functional neuron in which A, B are non-fuzzy sets, M(X)=fl(X)=(;1,(Xl), fl2(X2), .. ., fln(Xn», i.e. a membership function, C is the
Feedforward ProcessNeural Networks
85
accumulative summary operator or a T-mode operator, and T=jis a linear function. Example 3 Fuzzy neuron with fuzzy inputs. This corresponds to the situation in the general model of a functional neuron in which XEA is a set made up of membership functions, YEB~[O, I], M(X)=G(X)=(G1(XI), Gz(xz), .. ., Gn(xn». Here G;(Xi) is the weighted operation on the connection of the ith synapse, C is the accumulative summary or T-mode operator, and T=jis a linear function. Example 4 Process neuron. This corresponds to the situation in the general model of a functional neuron in which A, B are time-varying function sets, M(X)=W*X, C=(Ef>, ®) and T=f Here, W is a weight vector function, E9 is a spatial aggregation operator, ® is a temporal accumulation operator, and j is an activation function which generally adopts a Sigmoid or Gauss function . Example 5 Multivariate function neuron. This corresponds to the situation in the general model of a functional neuron in which A, B are multivariate function sets, M(X)=W*X, C=(Ef>, ®), T=f Here, W is a weight vector function, Ef> is a spatial aggregation operator, ® is a temporal accumulation operator, and j is an activation function that generally adopts a Sigmoid or Gauss function . The neuron with multivariate functions as its inputs is a straightforward extension of the process neuron. Neural networks composed of these neurons are very useful in resolving practical problems and will be discussed in the latter part of this book.
4.8.2 Feedforward Functional Neural Network Model A general model of feedforward functional neural networks consists of some functional neurons, which are shown in Fig. 4.10 and are forward connected, according to certain topological structures. A typical connection structure of a feedforward network is a three-layer network with a single hidden layer, as shown in Fig. 4.11.
yz
r--~l---~j--.
Ym
Fig. 4.11 A general model of feedforward functional neural networks
The feedforward functional neural network denoted by Fig. 4.11 is an abstract model. When a specific neural network is constructed in a practical application, the functional neuron can be replaced with specific neurons according to needs, such as MP neurons, fuzzy neurons, process neurons, etc. Assume that the number of units
86
Process Neural Networks
in the input layer, the hidden layer and the output layer are respectivel y n, m, q, and the input layer is an identical transformation layer that directly sends input signals to the hidden layer without doing any transformation on them. Each operator of the hidden-layer neuron is respectively Mo:R"'-+Rm, Co:Rm-+Rq, To:Rq-+Rq; each operator of the output-layer neuron is respectively Mo:R"'-+R"', Co:Rm-+Rq, To: ](l-+Rq. Then the relationship between the inputs and the outputs of the whole network is (4.84) or is represented as (4.85) where "0" denotes a compound operation and R is the field space of studied problems, such as real number space, a fuzzy set, time-varying function space, etc. If we mark FN =To oCo oMo -t, -c, -u. , then FN:Rn-+](l and
Y=FN(X) .
(4.86)
Therefore, a three-layer feedforward neural network is equivalent to the mapping problem of a compound operator. The properties of the network can be described by a compound operator FN. In practical applications, different FN operators can be chosen to carry out computation for different problems .
4.9 Epilogue In this chapter, a general model of a feedforward process neural network is defined and the transformation mechanism and the mapping relationship of the model with respect to time-varying information are illustrated. On this base are constructed network models, such as a process neural network based on weight function basis expansion, a structural formula process neural network, a process neural network with time-varying functions as inputs, outputs and a continuous process neural network , etc. Theories, such as existence of solutions, continuity, functional approximation ability, computing capability , etc., of the process neural network are analyzed and proved. All of these have established the theoretical foundation for the practical application of the process neural network . Finally, all the neural networks that have been discussed so far are generalized into a uniform neural network model, i.e., a general functional model of the neural network. A process neural network is an extended form of a traditional neural network employing a time field and other answers for the information processing mechanism
Feedforward ProcessNeuralNetworks
87
of a biological nervous system from the viewpoint of nervous bio-informatics. As the feedforward process neural network has good properties, such as functional approximation, learning ability, and self-adaptive modeling ability for time-varying environments, etc. so we will see that it has good adaptability for solving dynamic system information processing problems in the following chapters.
References [I] He X.G., Liang I.Z., Xu S.H. (2001) Learning and applications of procedure neural networks. Engineering Science 3(4):31-35 (in Chinese) [2] He X.G., Liang J.Z. (2000) Some theoretical issues on procedure neural networks. Engineering Science 2(12) :40-44 (in Chinese) [3] Guan Z., Lu J.P. (1998) Foundation of Numerical Analysis. Higher Education Press, Beijing (in Chinese) [4] Hornik K.(1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2) :251-257. [5] Xu L.Z ., Wang R.H ., Zhou Y.S. (1983) Theories and Methods of Function Approximation. Shanghai Science & Technology Press, Shanghai (in Chinese) [6] Liu X.H ., Dai R.W. (1995) The equivalence of neural network with linear threshold unit with Turing machine. Chinese Journal of Computer 18(6) :438-442 (in Chinese) [7] Xu S.H., He X.G ., Wang B. (2006) A structural formula process neural networks and its applications. Journal of Computer Research and Development 43(12) :2088-2095 (in Chinese) [8] Al-Ghusain LA., Huang J., Hao OJ., Lim B.S . (2006) Using pH as a real-time control parameter for wastewater treatment and sludge digestion processes. Water Science & Technology 30(4) :159-168 [9] Vasanth Kumar K., Porkodi K., Avila Rondon R.L., Rocha F. (2008) Neural network modeling and simulation of the solidlliquid activated carbon adsorption process. Industrial & Engineering Chemistry Research 47(2):486-490 [10] Xu S.H., He X.G. (2003) Process neural network with time-varied input and output functions and its applications. Journal of Software 14(4) :764-769 (in Chinese) [II] Xu S.H., He X.G., Li P.c. (2004) A process neural network for continuous proces s approximation and its application. Information and Control 33(1) :116-119 (in Chinese) [12] Xu S.H., He X.G., Liu K., Wang B. (2006) Some theoretical issues on continuous process neural networks. ACTA ELECTRONICA SINICA 34(10):1838-1841 (in Chinese)
5 Learning Algorithms for Process Neural Networks
There are already many mature learning algorithms for the training of traditional neural networks, e.g., Back Propagation algorithm (BP algorithm) [1], Particle Swarm Optimization algorithm (PSO algorithm) [21, Genetic Algorithm (GA) [3], GA-PSO algorithm [41, Quantum Genetic algorithm (QG algorithm) [51, etc. Amongst these algorithms, the broadest and most effective one in application is the error back propagation algorithm (BP algorithm) based on gradient descent and its various improved forms. For training of process neural networks, the inputs and the connection weights of the network can be time-varying functions, the process neuron includes spatial aggregation operators and temporal accumulation operators, and the network can include different types of neurons with different operation rules, i.e. each neuron processes the input information according to its own algorithm. All of these make the mapping mechanism and learning course of the process neural network quite different from those of the traditional neural network. Furthermore, because of the randomness of the form and parameter position of the network connection weight functions, if the form of function class is not restricted or set to belong to some function class in advance, it is difficult to determine these complex parameters by learning from practical samples through network training. In mathematical terms, there is a variety of basis function systems in continuous function space so that the functions in the function space can be expressed as finite item expansions of the basis functions with a certain degree of precision under certain conditions . Therefore, for research into the general learning algorithm of the process neural network, it is instructive and important to probe into the learning algorithm of the process neural network based on weight function basis expansion. In the space qo, 7], the commonly used basis function systems are trigonometric function systems, polynomial function systems, Walsh basis function systems, and other wavelet basis function systems. The effect of the basis function in a sense is to map the time-dependent connection weight functions into the familiar smooth functions, and as long as the number of items of the basis function is large enough, the basis function expansions of input/output functions and connection weight functions of the network can sufficiently approximate to the functions themselves.
Learning Algorithms for Process Neural Networks
89
In fact, as the training or learning of the process neural network is essentially a function approximation problem, various function approximation algorithms (except gradient descent), such as the Newton method [61, descent method [7], etc., can all be modified into learning algorithms for the neural network. As the process neural network embodies a mapping relationship from function to function, its training or learning corresponds to the functional approximation ; therefore, it is necessary to find a new path with reference to the function approximation method. Seven effective learning algorithms for the process neural network will be discussed in this chapter: a learning algorithm based on gradient descent, a learning algorithm based on the function orthogonal basis expansion , a learning algorithm based on the Fourier function transform, a learning algorithm based on the Walsh function transform, a learning algorithm based on spline function piecewise fitting, a learning algorithm based on the function rational square approximation, and a learning algorithm based on the optimal piecewise approximation.
5.1 Learning Algorithms Based on the Gradient Descent Method and Newton Descent Method [8) Gradient descent is a minimization algorithm in common use. Because the algorithm is of simple design, involves little calculation and is easy to implement, it is broadly applied in training of connection weights and activation thresholds of the neural network. The BP algorithm in standard form is just the gradient descent method. In this section, a general learning algorithm based on gradient descent for the process neural network and a combination learning algorithm of the gradient descent and other optimal algorithms are mainly explained.
5.1.1 A General Learning Algorithm Based on Gradient Descent From Eq. (4.6), a process neural network, which has one process neuron hidden layer and linear output and has connection weight functions expanded by basis functions, may be written as (5.1)
wi/t) = L w/jI)b/ (t) L
where
1=1
n
and bl(t), b2(t), ... , bL(t) are a group of finite basis
functions in space qo, Give K learning sample functions
90
Process NeuralNetworks
(XII (t), X I Z (r),
, X l n (t)A) ,
(X 21(t), XZZ (t),
, X Zn (t), d z),
where d, is the expected output of the ith function sample. The network training error function is defined as
t.(y, -d,)' =t.(t,'J( rt,(t,H"b, }I-O,)-d, )' =t.(t, f ( t.t.( w~) r h )-d,
r
E=
(I) )x.(t)
b,(tlx.(t)dl
'i
(5.2)
where Yk is the actual output of the network corresponding to the kth input function sample. According to the gradient descent algorithm, the learning rule for the connection weights and activation thresholds of the network is Vj
=v j + m l v j , j = 1 ,2 ,...,m ,
wjjll =wjj/)+fJ/iw;)l, i=1,2, ...,n;j=1,2, .m; 1=1,2, ...,L, Bj
= B + ,.nB j
j
,
j
=1,2,
.m,
(5.3)
(5.4) (5.5)
where a, p, y are the learning rate constants. In order to describe this easily, we denote (5.6) then
(5.7) (5.8)
(5.9) If we letftu)=(l +e-ur', thenf(u)=ftu)(l-ftu». In Eq. (5.6), because the basis function b/(t) and the input function xklt) are
known,
Learning Algorithms for Process Neural Networks
r
91
b, (t)Xki (t)dt is a computable specific value and can be figured out before
network training by pretreatment. Ukj is a linear function only with respect to the coefficient wijl) of the connection weight function basis expansion and the activation threshold OJ' The algorithm is described as follows. Step I Let the error precision be e. the accumulative learning iteration times s=O, the maximal learning iteration times M , and the selected basis functions bt(t), b2(t), ... , bL(t).
Step 2
Initialize the connection weights and the activation thresholds
Vj,
wy),
OJ(i=l , 2, .. ., n;j=l, 2, ..., m; 1=1 , 2, ... , L). Step 3 Calculate the error function E according to Eq. (5.2). If Ece or e-M, go to Step 5. Step 4 Modify the connection weights and the activation thresholds according to Eqs. (5.3)-(5.9); s+ l -s, go to Step 3. Step 5 Output the learning result and stop. For the situation that the system inputs are the analytic functions, the value of
r
bl (t)Xki (t)dt can be figured out by adopting an integral or numerical calculation
method first, and then the training of the process neural network is executed . For the situation that the discrete time sample data can only be obtained by experiment (sometimes the distribution of these data is not equidistant in [0, 11), the fitting curve Xi(t) can be constructed via original sample data, and then follow the steps as for the analytic function .
5.1.2 Learning Algorithm Based on Gradient-Newton Combination Learning algorithms based on gradient descent of process neural networks can come down to minimization of nonlinear multidimensional functions. There are many minimization algorithms in optimization methods, of which gradient descent is the simplest algorithm. Because the objective function is defined in multidimensional space during the training of process neural networks, it is easy for gradient descent to fall into a local minimum value during learning. Moreover, a seesaw phenomenon probably appears during searching for the gradient descent, the network training error surges back and forth and is not convergent. Particularly at a late point in the network learning, the convergence speed actually becomes very slow. The Newton method is another optimization algorithm in common use, and has second-order convergence speed in the convergence field. However, it needs to compute the second derivative of the objective function, and consequently its computational complexity is quite large. The gradient-Newton combination algorithm will now be introduced. The algorithm uses the gradient descent method in the initial stages of learning, and imports the Newton iteration method for the objective function in the convergence field. This algorithm only needs to compute
92
Process NeuralNetworks
the first derivative of the function, and convert the problem into solving nonlinear equation systems that do not need a one-dimensional search, accordingly its computational complexity is reduced , and its convergence speed is accelerated . In the output error function of the process neural network defined by Eq. (5.2), denote (I) ( 1) (Ll (L l Ll ()) W = ( WI)( 1) , . " ,Wnl(I ) , W 12 , . . . , Wn2 , · · · , W1m " " , Wnm , V1, · ··, Vm , V 1, ··· , m '
then E is the function with respect to W, i.e. E=E(W) . The gradient descent algorithm is defined as W(k
Denote bjk
+ I) = W(k) -a(k)VE(W(k)).
(5.10)
=f(tt(W;j/) r
b/(t)Xk/(t)dt)-()j).
t, = ~VJ(tt(W;j/)
r
b,(t)Xk/(t)dt)-()j
)-d
k'
F(W) = (J;,f2 ,..·,fK )·
Then solving the process neural network is equivalent to solving (5.11)
F(W)=O.
The nonlinear equation system (5.11) includes K equations and nxLxm+2m unknown parameters. When K-:::'nxLxm+2m, the equation system generally has solutions that can be obtained by the Newton method. The Newton iteration formula is F(W(s)) = -F'(W(s))L'1W(s),
(5.12)
where L'1W(s) = W(s+ l) - W(s), and
F'(W(s)) =
(a~) a
nmL+2m
,
(5.13)
(1) (1) ( 1) (1) (L) (L ) Ll (). d d .& I h were w lI , .. . , w nl ,w12 , ... , w n2 , ..., w'm , .. . , w nm' vI' '' '' Vm , Up ' ''' m IS enote unhorm y byw. The learning algorithm for the process neural network based on the gradient-Newton combinative algorithm is described as follows. Step I Denote the error precision 1» 0, the maximal learning times M; set the
Learning Algorithms for Process Neural Networks
93
learning times counter s=O. Initialize the network connection weight W(O) using the result of the last phase, which is obtained by the gradient descent algorithm . Step 2 Calculate the error function value E according to Eq. (5.2). If Ece or sz-M, go to Step 4, or else go to Step 3. Step 3 Calculate W(s) according to Eqs. (5.12)-(5.13); set s+ l---+s; go to Step 2. Step 4 Output the calculation result and stop.
5.1.3 Learning Algorithm Based on the Newton Descent Method In this section, we will give a learning algorithm based on the Newton-descent method . It has been mentioned that the Newton method can also be designed as a learning algorithm, but as is known, it is quite sensitive to the initial value of the network parameter. Therefore we might as well add a control parameter A (i.e. "descent factor", O
(5.14)
The choice of A should satisfy the descent principle IF(W(s+ 1))1 < IF(W(s))!.
It can be proved theoretically that when a certain condition is satisfied, according to the above Newton direction descent, when the step is small enough, the value of the function F can always decrease a little. Consequently, iterating according to Eq. (5.14), although the convergence speed may become slower, a strictly monotonic decreasing sequence with lower bound zero can be obtained . When the needed error control precision is satisfied, the solution is just that of process neural networks .
5.2 Learning Algorithm Expansion [9)
Based on Orthogonal
Basis
If the standard orthogonal function system is chosen as the basis functions used for the expansion of connection weight functions , and the input functions can also be expanded by the same basis functions, then based on Eq. (5.1), the accumulative operation of the process neuron on time (e.g. the integral operation) can be greatly simplified because of the orthogonality of basis functions . Therefore, a learning algorithm whose input functions and the connection weight functions are both expanded by the orthogonal basis can be considered.
94
ProcessNeuralNetworks
Suppose the input function space of the process neural networks is a subset of
(C[O, 11r. In C[O, 1], a group of standard orthogonal basis functions is imported and the system input functions are expanded into the form of finite series according to given fitting accuracy, here the connection weight functions are also expressed as an expansion of this group of basis functions. Then based on the orthogonality of basis functions, the integral operation result of Eq. (5.1) can be obtained directly. Via this method, the training of process neural networks (to determine connection weight functions and activation thresholds) is converted into the learning of basis expansion coefficients of connection weight functions and activation thresholds of the neuron, i.e. the optimization of functional function into the extremum of multivariable functions. Consequently, the training of process neural networks can be implemented by virtue of the existing learning algorithms of neural networks or by constructing new learning algorithms, and the computational complexity of the network is greatly reduced.
5.2.1 Orthogonal Basis Expansion of Input Functions Suppose
the
input
space
of
the
process
neural
network
is
(C[O,11r.
b](t),b2(t),... ,blt), ... are a group of standard orthogonal basis functions in C[O, 11
(e.g. trigonometric basis function, Walsh basis function, wavelet basis function, etc.). X(t)=(x](t), X2(t), ... , xn(t)) is a function in the network input space, and Xi(t) can be
expressed as the series form of the basis function expansion under certain conditions Xi(t) = LaA(t)·
(5.15)
1=1
Therefore, for any 00, there exists L, which is big enough such that for any I..:2L i, we have
suplx/t)O';/';T
taA(t)I$~, i=I,2,...,n , n 1=1
and then
That means X(t) can be expanded based on basis functions bt(t), b2(t), ..., bL(t) with given precision.
Learning Algorithms for Process Neural Networks
95
5.2.2 Learning Algorithm Derivation The connection weight function Wi/ t) is expressed by the basis function expan sion of b\(t ), b2(t ), ... , bL(t ), namely
I
L
wj/ t) =
w~l )bl(t),
(5.16)
1=1
where wjjl) is the connection weight between the input layer node and the hidden layer node corre sponding to blt), and is an adjustable time-invariant parameter. By substituting the basis function expansions of x;(t) and wij(t ) into Eq. (4.6), the input-output transformation relation ship of the network can be represented as
which is modified as (5.17)
As b\(t ), b2(t ), .. ., bL(t ) are a group of standard orthogonal basis function s in the interval [0, 11 and satisfy when 1= s; when I", s. Eq, (5.17) can be simplified as
(5.18)
Denote K learning sample function s
Suppo se that the real output of the network is Yk corresponding to the kth learning sample input , then the error function of the network is defined as
96
Process Neural Networks
where aj~k ) is the coefficient of Xki(t) corresponding to biCt) in the basis function expansion. According to the gradient descent algorithm, the learning rule of the connection weights and the activation thresholds is:
= vj +tmv j , j =1,2,....m, w~1) = wjj) + fJ!iwjjl) .i =1,2,..., n; j = 1,2,..., m; I = 1,2,..., L, Vj
) (} j(1) -_ (}(1 j
) • - I 2 + ,,,,(}(1 r~ j , } - , , ••• ,m,
() = () +1]!i(},
(5.20) (5.21) (5.22) (5.23)
where a, fJ, Y, 1'/ are learning rate constants. Denote
then (5.24) (5.25) (5.26) (5.27) If the activation functions f and g are both Sigmoid functions , then f'(u) = f(u)(l- feu)), g'(u) = g(u)(l- g(u)) .
5.2.3 Algorithm Description and Complexity Analysis The learning algorithm is described as follows. Step 1 Choose standard orthogonal basis functions bl(t), b2(t), ... , bL(t) in the input space; denote the input functions and the connection weight functions as the
Learning Algorithms for Process Neural Networks
97
expansions of basis functions; Step 2 Denote the network learning error precision E:, the accumulative leaning iteration times s=O, and the maximal leaning iteration times M; Step 3 Initialize the connection weights of the network and activation thresholds
Vj,
w;jl, Bjll, e(i=I, 2, .. ., n;j=l , 2, ... , m; [=1, 2, ... , L);
Step 4 Calculate the error function E by Eq. (5.19); if E<E: or sz-M, go to Step 6, or else go to Step 5; Step 5 Modify the connection weights and the activation thresholds according to Eqs. (5.20)-(5.27); s+l---+s; go to Step 4; Step 6 Output the learning result and stop. The inputs to the process neural networks may be analytic functions or time-dependent discrete sampling data, so the selection of basis functions and the expansion of input functions can be completed by adopting (fast) Fourier transformation, Walsh transformation, wavelet transformation, or even other methods, which are suitable for the analytic functions and discrete time data. If the scale of the process neural networks is n-m-I (n input nodes , m process neuron hidden layer nodes, and 1 time-invariant neuron output node), and the number of basis function expansion items is L, then the complexity of the learning algorithm for the process neural network based on weight function basis expansion is equivalent to that of the BP network with a scale of n-mxL-l, and the traits of this algorithm are the same as that of the BP algorithm.
5.3 Learning Algorithm Based on the Fourier Function Transformation Fourier function transformation is a commonly used signal conversion and analysis method that has important applications in signal processing. The Fourier function system is a complete orthogonal function system [IO J• When it is chosen as the basis function group of the connection weight function basis expansion of the process neural network, we can use its orthogonality and completeness to express the input functions and connection weight functions in finite series form of the Fourier basis function within given fitting precision.
5.3.1 FourierOrthogonal Basis Expansion of the Function in L2[0, 2rr] In real space L 2[0, 21t], the function system I I I . I I . } [0 2 ] M = {~ ' Jii, cost, Jii, Sill t,"', Jii, cos nt , Jii, Sill nt.> ,t E ,1C
(5.28)
98
ProcessNeural Networks
is referred to as a Fourier funct ion system in L 2[0, 21t] . (a) The function system M is in standard orthogonal intersection, namely, for any el(t), elt)E M, we have when [i:- s; when [= s, where (el(t),es(t))
=
r
(5.29)
el(t)eJt)dt is the inner product of any two basis functions
ehes in M . (b) The function system M is complete, i.e. for any fit)EL 2[0, 21t], we have
11!(t)llz = L
l(f(t),el(t){
(5.30)
e/eM
fit) can be expanded in the following form cc I a a !(t)=La/ek(t)= ~ao+ ~sint+ ~cost+ .. ., tE[0,21t], /= 0 "21t,, 1t ,,1t
(5.31)
where (5.32) When the specific analytic form of fit) is unknown and only N discrete sample points are known, there is the following fast algorithm. Suppose the 2n+ I equidistant sample points in the interval [0, 21t] of fit) are
t;
=~(i +0.5), 21t+ 1
i = 0,1,· .. , 2n,
the corresponding function values are j;=fiti), then the first 2n+ 1 coefficients of the Fourier orthogonal basis function expansion can be calculated as follows : 2 a/ =--(fo+ujcoskO-u z )' [=1,3,5, ..., 2n+1
hi =_2-ujsinkO, [=2,4,6, ..., 2n+1
where
u" U2 can be obtained recursively according to the following formula :
(5.33) (5.34)
Learning Algorithms for Process Neural Networks U 1n+1 = U 1n+1 =0, u j = f j +2cos(kB)u j +1 -u j + 1 ' j = 2n,2n-I,...,2,1,
{
99
(5.35)
where B=27t1(2n+ 1).
5.3.2 Learning Algorithm Derivation Theorem 5.1
For any continuous function x(t), W(t)E L 2 [0 , 21t], suppose that the
Fourier orthogonal basis expansions are respectively
x(t) = L>/e/(t),
w(t)=
/ =0
L w/e/(t),
then the following integral formula holds:
/=0
r
(5.36)
x(t)w(t)dt =fx/w/ . n=O
Proof
Thus, the proof is completed . Suppose the input process interval of the process neural network is [0,1]. Through variable substitution, all variables in the input functions and connection weight functions of the network can be transformed into variables in [0,21t], so the orthogonality and the completeness of the Fourier function system can be used directly. Consider the process neural network shown in Fig. 5.1.
xit)
y
Fig. 5.1 Narrow-sense process neural networks
The input-output mapping relationship of the network is
100
Process Neural Networks
Given K learning samples: (x:(t),x~(t), .. ·,x~(t), d k ) , k =1,2, ...,K; dk is the expected output of the system when the inputs are x:(t),x~(t), ...,x~(t) ; suppose that the actual output of the network corresponding to the kth sample input is Yk (k=I,2, ..., K) , then the error function can be defined as
Finite Fourier orthogonal basis expansion k X1
IS
implemented for the sample
(r), x~ (t), ... , x~ (t) to yield
(5.38)
where L is the number of Fourier basis function items, which satisfies the precision requirement of the input functions. The network connection weight function Wi/t) U=I,2, .. ., n;j=1,2, .. ., m) is also expressed as an expansion of finite Fourier basis functions :
L w~el (t) . L
wi/t) =
(5.39)
1=0
Substitute Eqs . (5.38) and (5.39) into Eq. (5.37), and according to the conclusion in Theorem 5.1, the error function can be simplified as:
(5.40)
The connection weight parameters of the network can be determined by adopting a learning algorithm similar to the one described in Section 5.2 .
Learning Algorithms for Process Neural Networks
101
5.4 Learning Algorithm Based on the Walsh Function Transformation The Walsh function system is a finite, complete and normal ized function orthogonal system and has two forms, namely , continuous transformation and discrete transformation [I ll. Therefore, if the Walsh function system is selected as the basis function , the learning algorithm introduced in Section 5.2 has good adaptability to the system whose inputs are analytic functions or discrete time sequences.
5.4.1 Learning Algorithm Based on Discrete Walsh Function Transformation (1) Discrete Walsh transformation
When there are N discrete sample data in the interval [0, I] (generally N=2P where p is a positive integer), the discrete Walsh transformation pair is
Xi
N -I
(
. )
(5.41)
= LXkwal k,.!:..... , i = O, I,..., N - I, N=2 P , k=O N
x, =-L x;wal I
N- I
N
i=O
( i)
k , - , k=O,I,..., N - l, N= 2 P , N
(5.42)
where walik, ilN) is the Walsh basis function and its value domain is {-I ,+ I}; k is the sequency, i is the discrete normali zed time variable, x, is the original data, and X, is the transformed data. Lemma 5.1 In the interval [0,1], the inner product of two discrete Walsh functions with different sequency is 0, that is
~ wal(j,"!-)wal (k,"!- ) = 0, j ~ k, N = 2 1=0
N
N
P
•
(5.43)
Proof According to the definition of the discrete Walsh function, wal(j,t)wal(k,t) =wal(jffJk,t) where ffJ is Xor operator. Because each Walsh function can be denoted as a linear combination of finite Haar functions, and the summary of every Haar function of the other N-I discrete points in [0,I] (except har(O,O,t)) is 0, so the lemma holds. Lemma 5.2 In the interval [0,1], the inner product of two discrete Walsh function s with the same sequence is equal to N, that is ( . ) _ p L wal 2 l-'.!:..... -_ N , j ' < - N, N - 2 . i=O N
N-I
(5.44)
102
Process Neural Networks
Proof: According to the definition, any discrete Walsh function value at i/N (i=O,I, ..., N-l) in [0,1] is 1 or -1, and becomes 1 after being squared. Therefore, the inner product of discrete Walsh functions at N different discrete points is equal to N. Theorem 5.2 For any two continuous functions x(t), w(t), suppose that the sequence value at N=2P uniform discrete points in [0,1] are respectively Xi, Wi (i=O,I, ... , 2P- l), then the following integral formula holds
r I
N- l
x(t)w(t)dt = lim L walix, )wal( w;), N N.-+oo ;=0
=2
P
,
(5.45)
where xi=x(tD, Wi=W(ti); wal(xi) and x(tD are the discrete Walsh transformation pair, i=O,I, ... ,N-l. Proof Suppose ti=i/N (i=O,I, .. ., N-I) are N=2P equal division points in [0,1]. According to the definition of an integral, we have
£x(t)w(t)dt = lim Lx(ti )w(t)L\t;. N-l
(5.46)
N -+oo i=O
In the following, Eq. (5.46) will be proved correct. According to the definition of the discrete Walsh transformation, N -I
N-I
;=0
;=0
L waltx, )wal(w) = L wal(x(t; ))wal(w(t)) -I ( = N-l( L -1 NLx(tj)wal j,":". ;=0 N j=O N
)J(
N- l -1 L w(tk)wal ( k,":". N k=O N
)J
1 N- I N- l ( . ) 1 N - IN - l N - l ( . ) ( .) =-2 LLx(t j)w(t)wal 2 j,":" +-2 LL L x(t j)w(tk)wal j,":" wal k,":" , N ;=0 j=O N N ;=0 j=Ok=OMj N N
from Lemma 5.1, we have 1 N -l N - l N - l ( N 2 ~~k=t:"j x(tj)w(tk)wal
.)
j,~
(
wal
. )
k,~
')J
1 N'} =-2 L l NL- l x(t)w(tk) (N-l Lwal ( j,":" al ( k,":" ;=0 N N N j=Ok=O.koOj
Thus
and also from Lemma 5.2, we have
=0,
Learning Algorithms for Process Neural Networks
103
where lit;=lIN. When N tends to infinity, the limit is taken to both sides and Eq. (5.46) holds. Thus, the proof is completed. (2) Learning algorithm
Next, we will derive the learning algorithm based on the discrete Walsh transformation for the process neural network using the conclusion of Theorem 5.2. As the input process interval [0,71 can be converted into [0,1] through variable substitution, we will only discuss the situation when the input process interval is [0,1]. When the input functions of the network are analytic functions, the input functions are discretized into the sequence whose length is 2P within the interpolation precision. When the input functions are discrete time data, if the length of the sequence is not 2P, the corresponding length of the sequence can be obtained by smooth interpolation . In the interval [0,1], give K learning samples with sequence length of 2P
where tl=lIN, and dk is the expected output of the system corresponding to the inputs Xkl(t/), Xk2(tD, ... , Xkn(t/) (1=0,1, . . ., 2P-I). Implementing the discrete Walsh transformation on the learning sample, we have
Corresponding to the system inputs Xkl(t/), xdt/), ... , xdtD (1=0,1 , ... , 2P-I) , the input-output relationship of the process neural network corresponding to Eq. (5.1) is
L wjjl)bl(t), L
where wij(t) =
and bl(t), b 2(t), ... , bL(t) are a group of finite basis
1=1
functions in space C[O,n Let b1(t), b2(t), ... , bL(t) be Walsh basis functions, then from Theorem 5.2, we have
104
Process Neural Networks
(5.49)
where Yk is the actual output corresponding to the kth learning sample. The error function is defined as
(5.50)
where wal(xk ;(@ (/=0,1, ... , 2P- 1) is the Walsh transformation sequence of the ith component in the kth learning sample . The learning rules for the network connection weight and the activation threshold using the gradient descent algorithm are Vj
w;j l = wij)
= vj +a0.vj ' j = 1,2,...,m;
(5.51)
+ fJ/).w~l), i = I, 2,...,n; j = I, 2,..., m; 1= 0,1' 00 " N -I ,
(5.52) (5.53)
where w~l) is the coefficient of wij (t) corresponding wal(tl )
to the basis function
in the discrete Walsh basis function expansion, and a, [3, y are learning rate
L L wal(xk;(t l ))wal(Wi) (l l)) - OJ , and then n N- I
constants . For convenience, denote ukj =
; =\ 1=0
(5.54)
(5.55)
(5.56)
The corresponding learning algorithm is described as follows. Step 1 The input functions (analytic functions or discrete sample data) are converted into discrete time sequences with length of N=2P, and the discrete Walsh transformation is implemented on the input sequences according to Eqs. (5.41) and (5.42) ; Step 2 Denote the network learning error precision bye, the accumulative learning iteration times s=O, and the maximal learning iteration times by M; Step 3 Initialize the connection weights and activation thresholds of the
Learning Algorithms for Process Neural Networks
105
network vj ' wjjl) , OJ' i=I ,2, ...,n ;j=I,2,...,m; I=O,I, ...,N-l. Step 4 Calculate the error function E according to Eq. (5.50). If E<£ or sz-M, go to Step 6; Step 5 Modify connection weights and activation thresholds according to Eqs. (5.51)-(5.56); s+1~s; go to Step 4; Step 6 Output the learning result and stop.
5.4.2 Learning Algorithm Based on Continuous Walsh Function Transformation The continuous Walsh function system is a normalized function orthogonal system in the interval [0, 1]. Any L 2 function in [0, 1] may be expanded into a Walsh series, which has mean convergence. If the given function is also a periodic function with period I, then it can be expanded into a Walsh series in (-00, +(0). (1) Continuous Walsh transformation
Continuous Walsh transformation formulas are
J(t) = La/wal(l,t),
(5.57)
1:0
al =
J: J(t)wal(l,t)dt,
(5.58)
and satisfy the following properties
! wal(O,t)dt
= 1,
l wal(l,t )dt = 0, 1= 1,2,... f1
2
(5.59)
(5.60)
!>(wal(l,t)) dt=l, 1=0,1,2, ...
(5.61)
wal(l,t)wal(s,t) =wal(l EEl s,t),
(5.62)
where lEEls denotes the Xor operation of two nonnegative integers. Theorem 5.3 For any two continuous functions x(t), wet), the following integral formula holds
!x(t)w(t)dt t !x(t)wal(l,t)dt !w(t)wal(l ,t)dt. =
1: 0
Proof Let p=IEEls, then according to the operation property of Xor, we have
(5.63)
106
Process Neural Networks
p =o, l =s; { p:t:- 0, l:t:- s. From the Walsh function integral property Eqs. (5.59) and (5.60), we have
{I
.br wal(l ED s,t)dt = '
l
=S',
0, l:t:- s.
Thus, according to the definition of a continuous Walsh transformation
£x(t)w(t)dt = £(~( (! x(t)wal(l,t)dt) wal(l,t) )(~( (£ w(t)wal(S,t)dt) Wal(S,t»)))dt =
£(~~( (£ x(t)wal(l,t)dt)( £w(t)wal(s,t)dt) wal(l,t)wal(s,t») )dt
=
!(~~( (! X(t)Wal(l,t)dt)( £W(t)Wal(l,t)dt) wal(l ED s,t))}t
=
~~( (( 1x(t)wal(l,t)dt)( 1w(t)Wal(S,t)dt)) 1wal(l ED s,t)dt)
=
t( 1x(t)wal(l,t)dt £w(t)wal(l,t)dt). 1=0
Thus, the proof is completed. (2) Learning algorithm
Next, we will derive a learning algorithm based on the continuous Walsh function transform of a process neural network using Theorem 5.3. Assume K learning samples (Xkl(t), xdt), .. ., Xkn(t), dk) where k=1,2, , K, and dk is the expected output. The input sample function Xkl(t), xdt), , Xkn(t) is transformed by a continuous Walsh transformation and gives N
N
N
1=0
1=0
1=0
La~lwal(l,t), La~lwal(l,t), ...,La:1wal(l,t),
(5.64)
where N is a positive integer satisfying the precision requirement of the continuous Walsh basis function expansion; a~ is the Walsh basis function expansion coefficient of xk;(t) determined by Eq. (5.58). Suppose the continuous Walsh transform of the connection weight function wij(t) is
Learning Algorithms for Process Neural Networks
N
(
N
N
)
~ wi?wal(l,t), ~ wijwal(l,t), ..., ~ w~?wal(l,t) , j = 1, 2,....m,
107
(5.65)
where w~) is the expansion coefficient of wit) corresponding to wal(l,t). Consider a process neural network with only one process neuron hidden layer and linear activation function in the output layer. By Theorem 5.3 and the orthogonality of Walsh basis function, when the input function is Xkl(t), xdt), ... , Xkn(t), the input-output relationship of the process neural network described by Eq. (5.1) is (5.66)
Define the error function
(5.67)
In a way similar to the training of process neural networks based on the discrete Walsh function transformation, the network connection parameters Wi)I) , Vj, and the activation threshold OJ can be determined by adopting the gradient descent algorithm. The corresponding learning algorithm is described as follows . Step 1 Determine the number N of the Walsh basis function according to the input function fitting precision requirement of a learning sample set. The input functions are transformed by continuous a Walsh transformation according to Eqs. (5.57) and (5.58). Step 2 Denote the network learning error precision by e, the accumulative learning iteration times s=O, and the maximal learning iteration times M ; Step 3 Initialize the connection weights and the activation thresholds vj ' w~ ), OJ' i=I,2, ...,n ;j=1,2, ...,m; I=O,I, ...,N-l;
Step 4 Calculate the error function E according to Eq. (5.67). If Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and activation thresholds according to Eqs. (5.51)-(5.56); go to Step 4; Step 6 Output the learning result and stop.
108
Process Neural Networks
5.5 Learning Algorithm Based on Spline Function Fitting Spline function fitting is a function polynomial piecewise interpolation fitting method proposed by Schoenberg in 1946 [121. The spline funct ion has a simple structure, good flexility and lubricity , and favorable approximation properties for both analytic functions and discrete time sequence functions. Therefore, the connection weight function s of process neural networks can be represented as spline functions. During network training, by learning from time-invariant connection parameters in spline function s, the process neural network whose connection weights are denoted by spline function s can gradually approximate the input-output mapping relationship of real systems to complete the training for process neural network s.
5.5.1 Spline Function Suppose there are N+l time order partition points to, th t2, .. ., tN in the input process interval [O,n where to=O, tN=T. x(t) is a time-varying function defined in [O,n and the values at time partition points are respectively xUo), XUI ), X(t2), ... , X(tN). Then in the interpolation interval [t/-ht/], the spline function is defined as follow s. Definition of linear spline function Sl(t) =
x(tl) - X(tl _l ) h (t-tl _I)+X(tl _J ) , 1= 1,2,...,N.
(5.68)
I
Definition of quadratic spline function (5.69)
Definition of cubic spline function
(5.70)
2
where M / = s;UI),M I = s;UJ,M I = Sl(tl),h l = t l -ti-JA = x(tl) -M1hl /2. The spline functions Eqs. (5.68)-(5.70) are modified according to the power of t, and expressed in the form of a polynomial, and then the forms of the above spline polynomial functions are as follows. Form of linear spline (5.71)
Learning Algorithms for Process Neural Networks
109
Form of quadratic spline (5.72) Form of cubic spline (5.73) where the polynomial coefficients in Eqs. (5.71)-(5.72) satisfy the continuity and the lubricity to some extent of spline function s at interpolation point s (i.e. the continuity of some order derivative).
5.5.2 Learning Algorithm Derivation Consider a process neural network with only one process neuron hidden layer and linear activation function in the output layer. Suppose to'!lh,." '!N are N+ I interpolation points satisfying the precision requirement of the spline function interpolation fitting of the input function s XI(t),x2(t) ,. ",XI/(t) in the system input process interval [0,71. Because Xj(t) is known, 4t) U=I,2, ... ,n) can be denoted in the form of piecewise spline functions as Eqs. (5.68)-(5 .70) (the power of the spline function may be determined according to the complexity of the input function) by mature spline fitting method s 1131 in numerical analysis, and further be modified into the form of a piecewise interpolation polynomial as Eqs. (5.71)-(5.73). The connection weight functions in the network training are also expres sed in the form of a piecewise spline interpolation polynomial. Accordingly, when the input functions and the connection weight functions are both denoted by piecewise spline functions, the input-output relationship of the network is
Y=
fVJ((fi r w~;)(t)X~S)(t)dtJ -()j)' j= 1
1=1 ;=1
(5.74)
,-,
where w~; ) (t) and xj;S) (t) are respectively the spline functions of the network connection weight function wij(t) and the input function x;(t) in the interpolation interval [t'-I,!'], and s is the power of the spline function s. Give K learning sample functions (xlk(t),x~(t), ...,x;(t),dk) where k=I ,2,.. .,K, and d, is the expected output of the system while the inputs are The input functions
k X1
k X1
(z), x~ (t), ...,
x; (t).
(r), x~ (t), ... ,x; (t) and the connection weight funct ion of the
network wit) are denoted in the form of spline fitting functions (the form of spline polynomial interpolation) in the interpolation interval [t,-J,t,], and then in the interval [t,-J,t ,], the spline interpolation polynomial function forms of X;k (t) and wij(t) are as
110
Process Neural Networks
follows. While in linear spline fitting Xi~(t) = a~/+ a~ji' 1=1,2,..., N; k=1 ,2, , K; i=1 ,2, ,n,
(5.75)
Wjj/(t) =Wljj/t +WOiji' 1=1,2,...,N; i = 1, 2, ,n; j=1,2, .m.
(5.76)
While in quadratic spline fitting
_ Zk Z+aIiJ+a Zk Zkil ' I -- I, 2,..., N ', k = 12K . I, 2, ,n, XiiZk(t ) -azilt , , , ; 1= O Wi~/ (t ) = w;ij/
(5.77)
+ w~i + W~iji' 1= 1,2, ..., N ; i = 1,2, ,n; j = 1,2, .m.
(5.78)
While in cubic spline fitting Xj~k(t)=aii~t3+a;~tZ+a~;t+a~~ , 1=1,2,...,N; k=I,2, ...,K; i=1 ,2, ,n,
(5.79)
w~/ (t)=wiij/+w;ij/+w~j/t+w~jji' 1=1,2, ...,N; i=1,2, ...,n; j=I,2, .m. (5.80) In the above equation s, W(s-r)ij/ represents the coefficient of t'" (r=0, 1,2,3) in the sth spline interpolation polynomial of wij(t) where the superscript represents the degree of the interpolation spline, the first subscript represents the t'" term in the corresponding interpolation polynomial, the second subscript is the serial number of the network input node, the third subscript represents the serial number of the network hidden layer node, and the fourth subscript represent s the corresponding in the sth spline interpolation interval [tI- J,ttl . a(:_r)iI represents the coefficient of
r:
interpolation polynomi al of x jk (r) where the first subscript denote s the f-r term in the corresponding interpolation polynomial, the second subscript denote s the serial number of the network input node and the third subscript denote s the corresponding interpolation interval [tI- l,tI]. As xjk (t ) is known, the piecewise spline function fitting form of
k Xi
(t) is
determinate, and the connection weight functions are denoted in the form of a piecewise interpolation polynomial during the network training . The network error function is defined as follows:
(5.81)
In the following , only the situation when s=2 (that is a quadratic spline function ) is derived, and the situation that s= 1 or s=3 is similar . Here
Learning Algorithms for Process Neural Networks
111
(5.82) Denote
Then Eq. (5.82) can be reformulated as
(5.83)
It can be seen from Eq. (5.83) that the error function is the function only with respect to the network parameters
Vj,
OJ and
w(s-r)ij/ '
so the network training can be
accomplished by adopting a method such as the gradient descent algorithm. The specific algorithm steps are not repeated any more.
5.5.3 Analysis of the Adaptability and Complexity of a Learning Algorithm In the learning algorithm based on spline function piecewise fitting, as the spline function has good flexility and lubricity, it will improve the nonlinear mapping ability of the input-output relationship of a process neural network when using the piecewise spline function as the network connection weight function [14] . However,
112
Process NeuralNetworks
this learning algorithm first needs to determine the proper piecewise number of input process intervals and the power of the spline function in terms of the complexity of the input functions (or the complexity of the real systems). At the same time, the input functions need piecewise spline fitting, and this increases the pretreatment process before the network training in actual applications. In addition , the number of parameters that need adjusting in the network increases exponentially with the number of piecewise interpolation intervals and the power of the spline function . If the number of the network input nodes is n, the number of the middle hidden layer nodes is m, the number of the interpolation partition points is N, and the power of the spline function is s, then the number of parameters that need determining in the network is nxmxNx(s+ l)+2m, which makes the computation during the network training increase exponentially with n, m, Nand s. Therefore, it is important to properly choose the number of piecewise interpolation points and the power of the spline function. However, simulation experiment results show that in some special practical applications, this algorithm has universal adaptability and is an effective method for training of process neural networks.
5.6
Learning Algorithm Based on Rational Square Approximation and Optimal Piecewise Approximation
In actual signal processing , a great deal of experimental sample data usually needs handling and some specific type offunction is required to express this approximately. In system modeling based on process neural networks, the type of the system input function and connection weight function have great influence on the computational complexity and functional approximation precision in network training. Therefore, how to choose a proper approximation (or fitting) function form to express the network input function and the connection weight function has important meaning for the design of the network structure and reduction in the complexity of the learning algorithm. During previous discussion of learning algorithms, the input functions and the connection weight functions of a process neural network have used an algorithm based on basis expansion. In order to achieve high fitting precision with the original curve, especially the curves of some functions with acute change, the number of basis function terms is usually large. In this section, using the favorable approximation properties of the rational function and the optimal piecewise function, learning algorithms based on the rational square approximation [15.161 and optimal piecewise approximation [17,18] are respectively researched .
5.6.1 Learning Algorithm Based on Rational Square Approximation When the deviation is measured by the sense of Chebyshev, a rational function with
Learning Algorithms for Process Neural Networks
113
lower order has high approximation preci sion when it is used to approximate a known function (discrete or anal ytic), especially for some function s with acute change. On the other hand , when approximating by a polynomial , even by a high order polynomial, a satisfactory appro ximate expression can seldom be obt ained . Moreover, the rational function has a compact form , and there are mature implementation algorithms for the approximation process. (1) Rational square approximation of the function
Denote the rational function set by 9\ m.II' The element R(x) in 9\ m.1I has the function form (5.84)
Now consider the square deviation of the approximated function jixje C[a,b] and the rational function R(x). The following two situations are considered. (a) Continuous situation (interval approximation): Suppose C[a,b] is a set made up of a continuous real function in the interval [a,b]. 9\ /11.11 is a set con sisting of the whole ration al function s with the polynomial who se degre e <m as numerator and the polynomial whose degree
R(x) = P(x ) / Q(x ), P(x) = ~:>; Xi , Q(x ) = ; =0
n L)jXj,
(5.85)
j=O
let 2
p (f,R )=IIJ-RI1 = [U (x)-R(x )f dx .
(5.86)
p(f,R) is referred to as the square deviation of f and R. Obviously, p(f,R) is a non-n egative real number, thus let p '(f) = inf p(f ,R) , RE'll." ,
(5.87)
then p*(f) is referred to as the minimal (rational) square deviation off If (5.88)
such that p'(f)=p(f,R \ then R*(x) is referred to as the optimal square approximation rational expression offix). (b) Discrete situation (point approximation): Suppose X={ xh1h=I ,2,.. .,N } is a
114
Process Neural Networks
point set on the real axis, and the function fix) has definition in X, i.e. a string of real numbersfi,(h=1,2, .. .,N) is given such thatfixh)=fi,(l,2, .. . ,N) . For R(X)E 9\m.m let N
Px(f,R)=IIJ-RII: = L(j(xh)-R(Xh))2,
(5.89)
h=1
and p~(f)= inf Px(f,R). RE9t"',1l
(5.90)
Then p~ (f) is referred to as the minimal square deviation ofj(x) in X. If
R'(x) = P'(x)/Q'(x) =
~a>i Itb;XJ E 9\m.n'
such that p~ (f) = Px (f,R'), then R*(x) is referred to as the optimal square approximation rational expression of the function fix) in X. Theorem 5.4 (Existence Theorem 1) Suppose that fix) is continuous in [a,b], then there exists R*(X)E 9\m.n such that
p(f,R') = p'(f) = inf p(f ,R), RE9l/ll."
where p(f,R)=sup!f-RI. Theorem 5.5 (Existence Theorem 2) R*(X)E 9\m.n such that
Suppose thatfix)EL2[a,b], then there exists
p(f,R') = p'(f) = inf p(f,R),
where p(f, R) =
Ilj- RI1
2
=
RE9lm ••
r
(j(x) - R(X))2dx . Suppose thatfix)EL2 [a,b], then there exists
Theorem 5.6 (Existence Theorem 3) R' (X)E 9\m.n such that
p(f,R')=p'(f)= inf p(f ,R), RE9lm ,.
where p(f,R) is the weighted square approximate distance that can be obtained by
p(f ,R) =11W<j _R)11 = 2
r
w<x)(j(x)-R(X»)2 dx,
where w(x) is a continuous positive function . For proof of these theorems please refer to the related references.
Learning Algorithms for Process Neural Networks
115
Next, a numerical method (Newton method) is adopted to solve the rational optimal approximation of the function. The optimal rational function approximation of Eqs. (5.86) and (5.89) is equivalent to solving the minimal problem of the following formula:
p =p(r) =p(a"a" ...,am,fJ"fJ,,···,p,l =
r[
f(x)-
~a;x'/(1 + ~Pjxj)Jdx. (5.91)
The basic idea of the Newton method lies in converting the minimization into minimizing a series of
(5.92)
h were
cj
= "j
(k)
- "j
.
Because the rational function is a ratio of two linear functions for the coefficient rj (j=O, 1,... ,m+n), it is proper to approximate it by the linear terms of a Taylor series. The computational steps to solve the optimal rational expression approximation are as follows. ,,(0) • • • ,,(0) ). Step 1 Choose a group of initial values ,,(0) = (r:(0) o ' 1 ' , m+n ' Step 2 R(r,x) , as a function with regard to the parameter r, is expressed approximately with linear terms of a Taylor series expanded at r(O) R(",X )
"" R(
(0) ) ",X
~( _ + L.., "j j=O
(0») "j
aR(,,(O) ,x) = R(
a"j
and then by the least square method, obtain achieve the minimum.
t
(0)
,x
)
~ aR(,,(O) ,x) + L..,c j a ' j=O
G
"j
that makes the following formula
The necessary condition is that the partial derivative of
116
Process Neural Networks
with respect to E: is 0, that is to solve the normal equation system m+n
:L>'j j =()
{a-:'IR( T
(0 )
aTj
)
,x .
o-:'IR(T
(0)
at,
) -:'IR( (0 ) ) ,x dx= {U(x)-R(T« »,X)r T , x dr , i=O,I, ... ,m+n;
aT;
(5.93) Step 3 Modify r(O) using the above obtained E:: r (l)=r(O)+E:. Replace rIO) with r(l) , repeat the computation course from Step 2, and iterate until the modification quantity E: is small enough (according to the required precision).. It can be seen from Eq. (5.92), that if the above iteration process is convergent (i.e. E: tends to a zero vector ultimately), R(r*,x) obtained from conv ergence does satisfy the nece ssary equation
(5.94)
Furthermore from the point of view of computation, this stationary point is certainly minimal, as there are always some directions which can make p(r) descend at the maximum point or a saddle point. Due to the influence of rounding errors, etc ., in fact , it is impossible to be steady at this point. In order to make the solving of linear Eq. (5.94) in Step 2 feasible, its coefficient determinant must not be zero , so we only need to prove the linear independence among the partial derivatives of R(r,x) about ti . Rk(T,X) =aR(T,X)jaTk
,
k
=O,l,···,m+n.
Because the Gram determinant of the linear independence function group is always greater than zero , the coefficient determinant is not equal to zero. In the above algorithm, the coefficient matrix needs calculating every iteration, so the computational load is very large . In the following, we will introduce a deformation of the algorithm, a simplified Newton method. For convenience, Eq. (5.93) is rewritten as m+n
2,hij(T(O»)£j =gJT(O»), i=O,l ,.. ·,m+ n, j =O
i.e. let
h (T(O ») = {aR(T(O) ,x) . aR(T(O ),x) dr, I}
a
aT
}
ot. I
gJT (O ») = {V(X)-R(T«»,x)) aR~:) , x) dr, I
Learning Algorithms for Process Neural Networks
117
or written in matrix form (5.95) Obviously in the above algorithm, the coefficient matrix H that Eq. (5.95) solves in Step 2 changes every iteration, i.e., a Gram matrix needs recalculating each iteration, which includes a majority of the computational load of the iteration . Actually, if the initial value i O) is properly chosen, the coefficient matrix only needs calculating in the first iteration, and H is kept unchanged in the following iterations (either H is fixed when the iteration achieve s a certain extent or H is updated by stages). In this way a so-called simplified algorithm is obtained, and it is written in iteration formula as follows
where F'(r) denotes the inverse matrix of the coefficient matrix H(r) at r, and g(r) denotes the gradient direction of g(r) at r. In this notation, the iteration format of the aforementioned unsimplified algorithm may be represented as
where F'(r(k)) changes with the alteration of k. Obviously , compared with the former algorithm, the computational quantity reduces greatly every iteration after the simplification.
(2) Learning algorithm of process neural network with the input of rational expression Consider a multi-input-single-output system with a single process neuron hidden layer and linear activation function in the output layer, the network input-output mapping relationship of the network may be denoted as (5.96)
According to the complexity of signals in system input function (discrete or analytic) space, choose a proper rational function set 9tL•P, and express the input functions in the form of a rational function in the sense of an optimal square approximation with certain fitting precision . Meanwhile, the connection weight functions of the network are also denoted by rational functions. For this process neural network with rational functions as its inputs, the network training can adopt the gradient descent algorithm. Next, a specific training course is introduced. Denote K learning sample functions : (xt(t),x~(t), ...,x:(t), d k ) (k=1,2,.. . ,K), where
d,
is
the
system
expected
output
corresponding
to
the
input
Process NeuralNetworks
118
Xlk(t),X~(t) , ...,X:(t). The input functions
x:(t),x~(t), ...,x~(t) and the network
connection weight functions wij(t) are both denoted by the rational function xjk(t)
L / =L,aj~tl 1=0
P L,bj~tP , k
=1,2,...,K; i =1,2,...,n,
(5.97)
p=o (I) 1 /
-
L wij(t)-L,w j t 1=0
( p)
p
• _
•
. _
P L,u j t , l-l ,2,...,n, j - l,2, ...,m.
(5.98)
p= O
The error function of the network is defined as follows
(5.99)
where aj~
is the polynomial expansion coefficient of the rational expression
numerator part of
x:(z).
bj~ is the polynomial expansion coefficient of the rational k
expression denominator part of x j (t) . The learning rule for the connection weight and activation threshold of the network according to the gradient descent algorithm is as follow s Vj=vJ+a!lVj,
j= 1,2,...,m, 1, 2, ... ,m,. 1=1,2,...,L,
(5.100)
2, •••" n '
(5.102)
W (/ ) --
flA (/ ) , 1• -1 - , 2, ... , n,. j ' -wij(/) +PL1W j
U ij( p ) --
u(p) ij
j
+ ALl 1 ·u ( p ) ij '
I' -1 ,
j ' -12m' , , •••, ,
p=I,2, ...,P,
(5.101)
(5.103) where a, p, A, y are the learning rate constants. Denote
Then (5.104)
Learning Algorithms for Process Neural Networks
119
(5.105)
(5.106) (5.107)
The learning steps in network training are as follows : Step 1 Choose a proper rational function set 9iL ,P as the input function space for the network . The input functions and connection weight functions of the network are denoted in the form of rational function as Eqs. (5.97) and (5.98); Step 2 Denote the learning error precision of the network e. the accumulative leaning iteration times s=O, and the maximal leaning iteration times M; Step 3 Initialize the connection weights and activation thresholds of the network vj,w;j) ,u;jP) ,B j , i=I,2,..., n; j=I ,2,...,m ; 1=1 ,2,..., L; p=I ,2, ...,P; Step 4 Calculate the error function E from Eq. (5.98), and if Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and activation thresholds according to Eqs. (5.100)-(5.1 07); s+ l-+s; Go to Step 4; Step 6 Output the learning result and stop. The computation of Eqs. (5.104)-(5.107) is rather complicated, it is better to write a pretreatment program (function) by adopting integral or numerical computation methods aiming at these computational formulas. After the initial values of the connection weights and the activation thresholds are given, the modification values of various parameters can be calculated first by calling this pretreatment program (function) each iteration, and then substituting into the network for training .
5.6.2
Learning Algorithm Approximation
Based
on
Optimal
Piecewise
Generally, function approximation or fitting can be described as follows : suppose that D is a point set in any dimension space,
g(P ; AI ' ~ ,..., An)
is a
parameter-varying function (when the parameter values are in a subset of n-dimensional space , it actually denotes a group of functions or function class
120
Process Neural Networks
defined in D, referred to as an approximation function) depending on a group of parameters A I,A2, ... ,An defined in D (i.e. PED) . To solve function approximation is to find out that for a function fiP) (PE D) defined in D, whether exists a group of parameter values Aj=AjO (i=1,2, .. .,n) such that the distance (or deviation) between g(P;Ao,~, ...,A~) andfiP) achieves the minimum .
At present, most researches on function approximation discuss the approximation using functions belonging to the function family defined with a unified analytic expression in the whole domain, and all the parameters in the approximation function are embodied in the approximation formula itself. However, in practical applications, the actual measured signal may change acutely under some conditions. When the unified form of function is used to approximate, there are usually many difficulties in the selection or construction of an approximation function form, and it may generate great deviation during fitting. We studied the "optimal piecewise approximation" method [17, 18J, i.e. the approximation function is defined piecewise by some analytic expressions; here some parameters in the approximation function are used to express how to divide the domain . Discussion of optimal piecewise approximation not only has special meaning for engineering technology, but also has great value from the viewpoint of reducing computational complexity. As piecewise approximation is adopted, in order to achieve the same fitting precision, the expression that defines the approximation function on every subsection may be simpler, or even may be a linear function. When calculating a function value, if variables can be determined to belong to some subinterval, it is easier to compute than using a complex formula over the whole interval uniformly. As usual, some logic operations used for judgment are much quicker than arithmetic operations in a computational program . Therefore, when the input signals are complicated, it can greatly reduce computational complexity during network learning if the optimal piecewise approximation method is applied to the construction and the training of process of the neural network. (1) Piecewise approximation of the function Suppose lJ'! (x),q)ix),... ,lJ'm(x),,,, is the continuous function sequence defined in the interval [a,b], and is linearly independent in any subinterval (a,,8)~[a,b], i.e. for any m
positive integer m, if there is Z> jlJ'/x)=O in xE(a,,8), then Cj=O (j=1,2, ... ,m) j =1
certainly holds where Cj is a real number. Any linear combination m
P(x) = :~:>j lJ'/x) (where Cj is a real number) j=!
is
referred
to
{~CjlJ'/X*j
E
as
an
m-order
generalized
R} are denoted as n;
polynomial,
and
the
whole
Learning Algorithms for Process Neural Networks
121
For a fixed positive integer n, consider the function with the following form
=P;(X)= ICijlJ.'/x), whenxE (Xi_1' x), i=I,2, ....n,
(5.108)
j =1
where Pi(X)EHm; XO<XI< ' ''<Xn-l<Xm xo=a, xn=b; and (Xi-I,xi) do not intersect with each other. The function ,!,<x) of this form is referred to as the function which can be divided into n segments by an m-order generalized polynomial defined in the interval [a,b], and the all of these functions are denoted as Hm(n). Therefore, Hm(l)=Hm. In the following, Hm(n) is especially denoted as Pm(n) while Hm=Pm(referred to as the function class which can be divided into n segments by (m-1)th polynomial to be defined). Next, we will discuss optimal approximation in the function class Hm(n). It can be seen from the definition of the function ,!,<x), it takes the interval's partition points Xi (i=1,2,... ,n-l) and Pi(x)'s coefficients cij (i=I,2, ... ,n; j=I,2, ... ,m) as parameters. So, differently from the approximation function considering the approximation in the whole interval , the parameters of the function are embodied not only in the formula (as the coefficients of Pi(x», but also on the subsection for the approximation interval. Definition 1 Suppose thatf(x) is an arbitrary function defined in [a,b], ,!,<x)EHm(n), then (5.109)
is referred to as the deviation between fix) and ,!,<x) in the sense of Chebyshev .
Definition 2 If there exists I/fe(X)E Hm(n) '1/, (x) = P;, (x) , when x E (xi _1' x)
(i = 1,2,..., n)
such that the relation Eq. (5.109) holds, then I/fe(x) is referred to as the optimal piecewise approximation of fix) in [a,b] belonging to Hm(n) (in the sense of Chebyshev). En(a,b) is referred to as the optimal piecewise approximation deviation of fix) in Hm(n) . The partition for the interval [a,b] corresponding to I/f;,(x): a=xO<xI<" .<xn-I<xn=b is referred to as the optimal partition for the interval [a,b] when fix) approximates in Hm(n). It is not difficult to prove the following theorems [171 using these definitions directly. Theorem 5.7 Suppose that Eia,b) and E/(a,b) are respectively the optimal approximation deviations of fix) in Hm(n) and Hm(l), if l$n, there certainly exists En(a,b)$Ela,b).
122
Process Neural Networks
In other words, an increase in the subsection will not make the optimal approximation deviation increase at least. Theorem 5.8 Suppose that En(a,b) has the same meaning as above, then there is lim En (a,b) = O. n-+ ~
Theorem 5.9
Suppose that E~m)(a,b) and E~I)(a ,b) are respectively the optimal
approximation deviations offix) in Hm(n) and Hln), if Ism, there certainly exists
That is, increasing the generalized polynomial order used for approximating will not make the optimal approximation deviation increase at least. Next, we will further discuss several important characteristics of optimal piecewi se approximation. First, a binary function is introduced e(a,p)= inf P( x )eHm
suplp(x)-!(x)l , a~a
(a,p)
(5.110)
Actually, e(a,j3) is just the optimal approximation deviation of fix) in (a,[J) belonging to Hm• As long as the generalized polynomial class satisfies the Harr condition, i.e. for any P(x)EHm, if it has m-1 zero values in e(a,j3) at most, then for continuous fix), there exists unique PeCx)EHm which achieves the optimal approximation deviation e(a,j3), that is the optimal approximation exists and is unique . We have proved the following theorems about optimal piecewi se approximation in reference [18]. Theorem 5.10 If (abPt)~(a2,1h)~[a,b], then
Theorem 5.11 The function et(y)=e(abY) is monotonic increasing in [abb]; the function e2(y)=e(y,pt) is monotonic decreasing in [a,pd. Theorem 5.12 (Deviation Equipartition Principle) Suppose fix) is a bounded function in [a,b], if function If/e(x)EHm(n), then ~e(X)
= P;e(x) ,
while
XE
(Xj_1'x)
(i
=1,2,...,n)
and satisfy (a) Pi,(x) (i=I ,2,... ,n) are the optimal approximation offix) in the interval (Xi- hXj) in H m , i.e.
Learning Algorithms for Process Neural Networks
123
e(xi-!'x;) = sup IF:e(X)- !(X)I· (X'_I'X,)
(b) e(xi-hxi) are equal to each other for all i=I,2, ...,n, i.e.
then Vfe(x) is the optimal piecewise approximation of fix) in the interval [a,b] in Hm(n). Theorem 5.13 Suppose that a=xo<xt< ... <Xn-I<xn=b is the optimal n-segment partition for the interval [a,b] in the piecewise approximation of fix), then for any subinterval [xbxIl, k«l (k=I,2, ...,n-l; 1=1,2, ... ,n), the former partition Xk<Xk+ I<... <Xt-t<Xt is the optimal partition for the interval [xk,xIl corresponding tofix) divided into l-k segments. Next, we will discuss how to solve the optimal Chebyshev approximation of fix) in Hm(n), i.e. how to solve Vfe(x)eHm(n) lfIe(x) = F:e(x),
when x e (x;_1' x;) , (i = 1,2,..., n)
such that suPllfle(x)- !(x)1 = inf [a ,bl
supllfl(x)- !(x)1 = En[a,b].
'I', eHm( n ) l a ,bt
Note that the parameters that need determining here are not only the coefficients to define PieCx) in each subsection, but also the partition points of the interval [a,b]:
a=x~ < x; < ...< x:_ < x: =b. t
For a binary function e(a,fJ) = inf suplp(x)- !(x)l, a ~ a < fJ ~ b, P (x )e H m (a,p)
iffix) is continuous in [a,b], and for any subinterval
when (at - a2)+(f3.t-fJ2):tO, thereis
then fix) is referred to as the function whose deviation strictly ascends with the increase of interval, and the whole is denoted as
F; (H m)'
In the following
124
Process Neural Networks
discussion we assume j(x)e F;(Hm ) . We will give the specific algorithm steps to solve the optimal piecewise approximation according to the deviation equipartition principle as follows. As shown in Theorem 5.12 (Deviation Equipartition Principle) , solving the optimal piecewise approximation of j(x) in Hm(n) is actually converted into seeking the function in H m that satisfies the conditions (i) and (ii) of Theorem 5.12. The task of solving the optimal piecewise approximation is mainly to determine the optimal partition points in the interval [a,b] (a = x~ < x; < ...< x: _1 < x: = b.) such that the deviations e(xj'_I'X;) in every subsection are equal to each other. Our algorithm is an iteration method that adjusts partition points gradually to make the total deviation max {e(xj_l' x) } descend continuously. The algorithm is as follows. )St Sn
Step I
Choose the initial partition arbitrarily
Step 2
Let
e(x;~~ , x;O») and
xri
l
)
= a, start from i= I, check stepwise whether the deviations
e(x;O), x;Z~) are equal to each other.
(a) If e(x;~~ , xj(O») = e(\(O) , x;Z~), then let X/I) = x;) and add I to i, continue to check . (b) If e(x;~~ ,x;O» = e(x;O), Xj(Z~ ), then pick "X;(I ) as the point satisfying e(x(l) x(l ») = e(x(l) x(O» ) I- I '
I
I
'
1+1
'
x(l ) < x (l) < x(O ) , -I
I
HI '
add I to i, and continue to check. =b Adjust the partition points as mentioned above until i=n-I, finally let X(I) n ' and here we get a new partition group
Step 3
Let x~1) = b, start from i=n-I , check stepwise whether the deviations
e(x(1) x(l») and e(x(l) x(l)) are equal to each other • I- I ' J I ' 1+1 x(l ») = e(x(l) X(I») then let X(I) = x(l) subtract I from i and (a) If e(x(l) 1- 1' I J ' HI ' I I '
continue to check. (b) If e("X;~~, "X;(I») '# e("X;(I) , Xj(~:)' then pick x?) as the point satisfying e(x(1) x(\») = e(x(\) x(\») x(l) < X(I) < X(I) I-I'
I
I
'
1+1'
l -1
and subtract I from i, and continue to check forwards.
J
H I'
Learning Algorithms for Process Neural Networks
Adjust the partition points as mentioned above until i=] , finally let
125
x61) = a, and
in this way, we get a new partition group
Step 4
Replace xi(O) with x?) and repeat the course from Step 2. According
to the precision demanded, iterate until 1~$~~llxi(l )
-
I
x;O) is small enough.
(2) Learning algorithm of a process neural network with piecewise functions as inputs The optimal piecewise approximation method for a process function has many useful properties in theory. It can implement fitting or function transformation based on polynomial optimal piecewise approximation for discrete process sample data with acute change and complex analytic function, and can reduce the computational complexity of spatia-temporal aggregation operations greatly and improve the adaptive capability of the network during the training for a process neural network. However, as given in the definition of optimal piecewise approximation, the partition of interval and the determination of polynomial function coefficients need considering simultaneously when optimal piecewise approximation based on a polynomial is adopted. Therefore, the optimal piecewise approximation polynomial must be determined first by a pretreatment program according to the algorithm steps given by the deviation equipartition principle. Because the optimal piecewise interval determined by each input function in the training sample set may be different, the piecewi se interval of connection weight functions may select the intersection interval of several optimal piecewise intervals. Then the network connection weight functions and activation thresholds are adjusted step by step via numerical computation methods, e.g. the gradient descent or Newton-descent method, and consequently the training of the process neural network is completed. The algorithm is as follows. Step] Choose a proper approximation function class used for optimal piecewise approximation. The input functions are pretreated through the aforementioned optimal piecewise approximation algorithm, and are denoted by an optimal piecewise approximation expression . Meanwhile, the connection weight functions of the network are expressed by the corresponding piecewise function form; Step 2 denote the network learning error precision by c, the accumulative leaming iteration times s=O, and the maximal learning iteration times M; Step 3 Initialize the connection weights and activation thresholds; Step 4 Calculate the error function E. If Ecs or sz-M, go to Step 6; Step 5 Modify the connection weights and the activation thresholds; s+ l-->s; go to Step 4; Step 6 Output the leaming result and stop.
126
Process Neural Networks
Here, the specific formulas modifying connection weight, activation threshold, calculating error function , etc. depend on the selection of the approx imation function class. It is not hard to derive them according to specific situations .
5.7 Epilogue In this chapter, according to the specific characteristics of the process neural network structure and information proces sing course, five realizable learning algorithms for process neural networks are given. The common characteristic of these algorithms is to first denote the connection weight functions of the network into finite combination form of a group of known basis functions (e.g. linear combination, rational combination, etc.), then determine the connection weight functions and activation thresholds satisfying the system input-output mapping relationship by learning the coefficients of the finite combination expres sions of the basis function . However, we should say that it is possible to propose some learning methods from the viewpoint of functional approximation and work on this aspect is expected to develop in the future. The training course and the generalizability of the process neural network is a very complicated problem and is closely related to the network model structure, the selection of the learning sample set, the design of the learning algorithm, etc. Therefore, the research to develop highly efficient learning algorithms for process neural networks and to improve generalizability are very important. References [1] Cheng H.L., Soon c.P. (2009) An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Systems with Applications 36(2):3208-3215 [2] Meissner M., Schmuker M., Schneider G. (2006) Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training. BMC Bioinformatics 7: 125-131 [3] Wang L. (2005) A hybrid genetic algorithm-neural network strategy for simulation optimization. Applied Mathematics and Computation 170(2):1329-1343
[4] Du S.Q., Li W.S., Cao K. (2006) A learning algorithm of artificial neural network based on GA-PSO. In: The Sixth World Congress on Intelligent Control and Automation 1:3633-3637 [5] Xu Z.F., Wang H.W., Wu G.S. (2007) Converse solution of oil recovery ratio based on process neural network and quantum genetic algorithm. Journal of China University ofPetroleum: Edition ofNatural Science 31(6) :120-126 (in Chinese)
Learning Algorithms for Process Neural Networks
127
[6] Estatico e. (2004) A two-steps inexact Newton method for atmospheric remote sensing. In: 2004 IEEE International Workshop on Imaging Systems and Techniques p.66-70 [7] Pan S.T., Chen S.e., Chiu S.H. (2003) A new learning algorithm of neural network for identific ation of chaotic systems. In: IEEE International Conference on Systems, Man and Cybernetics 2:1316-1321 [8] Battiti R. (1992) First and second order methods for learning : between steepest descent and Newton' s method . Neural Computation 4(2):141-166 [9] Xu S.H., He X.G. (2004) Learn ing algorithm s of process neural networks based on orthogonal function basis expansion . Chinese Journal of Computers 27(5) :645-649 (in Chinese) [10] Ji H., Xia S.P., Yu W.X. (2001) An outline of the Fast Fourier Transform Algorithm . Modern Electron ic Technique (8):11-14 (in Chinese) [II] Wang N.C. (1996) Algorithmic Design of Synchronic and Parallel . Science Press, Beijing (in Chinese) [12] Schoenberg, 1.1. (1946) Contribution s to the problem of approximation of equidistant data by analytic' function . Quart, Applied Mathematics 4(45-99) :112-141 [13] Li P.e., Xu S.H. (2005) Training of procedure neural network based on spline function. Computer Engineering and Design 26(4):1081-1087 (in Chinese) [14] Xu H.K. (2002) Iterative algorithms for nonlinear operators. Journal of the London Mathematical Society 66(1) :240-256 [15] He X.G. (1966) Theoretical problem of rational square approximat ion. Communication on Applied Mathematics and Computation 3( I ):31-49 (in Chinese) [16]
He X.G. (1966) Computing method of rational square approximation. Communication on Applied Mathemati cs and Computation 3(2) :90-107 (in Chinese)
[17] He X.G. (1965) The best approximation by segments. Communication on Applied Mathematics and Computation 2(1):21-38 (in Chinese) [18] He X.G. (1979) Some iterative algorithms of the best approximation by segments and their convergence. Mathematica Numerica Sinica 1(3):244-256 (in Chinese)
6 Feedback Process Neural Networks
A feedback neural network is an artificial neural network model that has been widely applied to signal processing [II, optimal computation [2J, convex nonlinear programming [31, seismic data filtering [4J, etc. A traditional feedback neural network model generally has time-invariant inputs. However, when a biological neural organization processes information, it actually feeds back time-delay information and the inputs of external signals will last for a period. Its current outputs depend not only on current inputs, but also on the accumulation of all previous inputs, i.e. a temporal accumulation effect. In practical problems, many systems also have feedback control items, e.g. in a real-time process control system, the inputs of control variables usually need adjusting according to the current output quantity of the system; in some multi-objective optimization problems, the system needs to adjust dynamically to the search strategy according to current states. The feedback process neural network is just a process neural network model with information feedback, and all its neuron nodes are connected according to the information flow direction of the system. The information can be passed back to nodes in each previous layer by certain rules, and output information can be fed back to the nodes themselves. There are many forms of the feedback process neural network model. In this chapter, we mainly introduce a three-layer network model and its learning algorithm, then analyze the stability of the model. In addition, several other forms of feedback process neural network model will be given. When the feedback process neural network transports information, there are forward flows as in a feedforward neural network and time-delay feedback information that is from the latter layer nodes to the former layer nodes. Therefore, the feedback process neural network model can be used as a functional approximator, and at the same time as an associative memory machine, or as an intelligent real-time controller. Thus, the feedback process neural network model is expected to have broad application possibilities.
FeedbackProcessNeural Networks
129
6.1 A Three-Layer Feedback Process Neural Network 6.1.1 Network Structure The model and the information transfer flow of a three-layer feedback process neural network with multi-input-single-output are shown in Fig. 6.1.
Fig. 6.1 A three-layer feedback process neural network
In Fig. 6.1, there are n node units in the input layer to complete the input of n time-varying signals to the system and the time-delay feedback of the output signals of hidden layer nodes. There are m nodes in the middle layer (the process neuron hidden layer) to complete the spatial weighted aggregation and activation output of the input signals, transfer the output signals to the output layer, and simultaneously, transfer time-delay feedback signals to the input layer. The output layer consi sts of a process neuron node to complete the spatial weighted aggregation, and temporal accumulation operation on output signal s of the hidden layer nodes and accompli shing the system output. Suppo se that the inputs to the system are X(t)=(X)(t),x2(t),... ,xnCt)), where tE [O,n [0, is the system input proces s interval. The output of the process neuron hidden layer nodes is
n
where wit) is the connection weight function of the ith node of the input layer and the jth node of the hidden layer; wjJt) is the feedback connection weight function of the jth node of the hidden layer and the ith node of the input layer; u/t) is the signal output of the jth node of the hidden layer at the time t; t is the time delay ; f is the activation function of process neuron in the hidden layer. As shown in Fig. 6.1, the output of the feedback proce ss neural network is
130
Process Neural Networks
(6.2)
where Vj(t) is the connection weight function of the jth node of the hidden layer and the output node; ()is the activation threshold of process neuron of the output layer ; g is the activation function of the output node and y is the output of the network.
6.1.2 Learning Algorithm [5] Assume P learning samples (Xpl(t),Xp2(t),... ,xpn(t),dp) for p=I,2, ... ,P, where the first subscript of Xpi(t) denotes the serial number of the learning sample and the second subscript denotes the serial number of input function vector component; dp is the expected output of the system when the inputs are (XI'I (t),xpz(t) ,.. .,xpn(t),dp). Suppose that when the inputs of the system are (XI'I (t),xp2(t) ,... ,xpn(t),dp), the real output of the network is Ypo The network error function is defined as follows
t.(y,-d,)' t.H r(~ "/t)U;(I)}t )-d,J =t[ r[ ~ ' j(t)f[ t(W ij(l{ x,(I)+ t, WjJt)U;(I-T»)))Jdt-8 ]-d,J
E=
-8
=
g[
(6.3)
Without loss of generality , suppose that the input space of the feedback process neural network is (C[O,nr and b\(t),bz(t), ... ,bL(t ) are a group of finite basis function s in C[O,n which can satisfy the input function fitting accuracy demand. The network connection weight functions Vj(t) , Wij(t) , Wji (t) are all denoted in the expansion form of these basis functions L
v/t}= Lvjl)b/(t),
(6.4)
I; )
L
wi/t} = L wjj 'b/(t),
(6.5)
J; )
L
Wjj(t) = Lwj:'b/(t),
(6.6)
I; )
where vjl), wijl) and wj:> are respectively the expansion coefficients of vit), wij(t), and wji (t ) about bLCt) .
Feedback Process NeuralNetworks
131
The input process interval [O,T] is divided into K equal parts (time granularity) properly. Suppose that the time partition points are to,t),... ,tK, the time interval between adjacent partition points is 1; then Eq. (6.2) can be rewritten as
(6.7)
where to=O, tK=T. Eqs. (6.4)-(6.6) can be rewritten as L
V/tk) = Lvjl)bl(tk),
(6.8)
1=1
L
Wij(tk) = Lwijl)bl(tk)'
(6.9)
1= 1
L
Wji(tk)
=Lwj:)b/(t k) ,
(6.10)
1=1
Substitute Eqs. (6.8)-(6.10) into Eq. (6.3), and then the error function of the network becomes p
E = L(Yp _d p )2
#g[~t(t'\"b,(I+[t,[(tW::'b,(I,l] p=1
=
{x"
(1,)+
(6.11)
t(t W;'b,(I,l}:(I,_,))}I, -1,+0)-d' J
According to the gradient descent algorithm, the modification rule of the network connection weights and activation thresholds is v j(/) =v j(/) wij(/) -- wij(/)
. 12 + tmvj(/),J= , ,....m,
RA (I ) , + puwij
I 2, .. . ,n,. J' -- I, 2,...,m,
•1,
- (/) - (/) ,A-(/) '- 12 . '- 12 w ji - wji + I,-,W ji ' J - , ,...,m, 1 - , , •. . ,n, () =() + Al!.(),
where a, /3, y, A, are the learning rate constants. Denote
(6.12) (6.13) (6.14) (6.15)
132
ProcessNeuralNetworks
Z~I) = t~(tVj')bl(tk») Xf[
t((t W~'b,(I,)HX,,(I,)+ ~(t w;:'b, 0+ 0,_, ») ]0, -1,_+0,
o~ = t((t W,~'b,(I,»)(x"u.)+ t( then I1vjl)
t
=- a~y) =-2 (g(Z~I)
t
w;:'b,
(I,)}:(I,,»)).
- d p)l (z~» tbl (t, )f(o~~~ )(tk - tk_I),
(6.16)
I1w(l) =- aE =-2f.(g(Z(!»-d )g'(Z(!») I) aw(l) L.. p P P _ P- I
I)
. t vj (tk)/
(O~~~)hl (tk{X
p;
(tk) +
~ wji (tk )uf (tk_I») (tk - tk_I),
-(/) aE 2f.( ( I ) ) ' ( (!))~ I1w j ; =- awj? =- ;:: g(zp )-d p g z; f:tV/tk)
. J'(o~~~ )w;j (tk )bl
I1B =-
~~ =
-2t( g(z~l)
u, )u;' (tk -I)(tk - tk_I),
-d p
)g'( Z~I»)( -I).
(6.17)
(6.18)
(6.19)
The learning algorithm is described as follows. Step I Give the error precision e. accumulative learning times s=O, the maximal learning times M. Select basis functions b,(t),b 2(t), .,bL(t) in input space . Divide the interval [0,71 properly and equally , and determine the partition points t,,[z,oo .,tK. Step 2 Initialize the connection weights and the activation thresholds of the -(/; ) , () (1 '-12 '- 1,2,oo .,m,' [ -- 1" 2 00 " L) • network v(I) ' wij(/) , w - , ,oo.,n,. Joo
j
Step 3
j
Calculate the error function E according to Eq. (6.11), and if E<£ or
sz-M, go to Step 5. Step 4 Modify the connection weights and the activation thresholds according to Eqs. (6.12)-(6.19); s+1->s; Go to Step 3. Step 5 Output the learning result and stop.
6.1.3 Stability Analysis Stability analysis is a meaningful task during research on feedback process neural networks. For the convenience of discussion, choose the activation function of
Feedback Process Neural Networks
133
process neurons in the hidden layer as a linear function in Eq . (6.1), i .e.j{u)=u and the time delay unit r-=1, and then Eq. (6.1) can be simplified as (6.20)
Eq. (6.20) is a time difference computational formula, the state of hidden layer pro cess neurons in the hidden layer after time 0 can be calculated from the input at the current time t and the states before the time t-1. Eqs. (6.20) and (6.2) can give a compound mapping relationship F:U~V where UcC[O,l1 and VcR. Next we will discu ss for given inputs Xj(t) (i= l ,2, .. . ,n) , how to choose wij(t ) and wj/t ), such that Eq . (6.1) is steady. That is, after N iterations of the network, uP) doe s not change any more or changes in a given interval. To make the integral formula Eq . (6.2) meaningful under the condition T ~oo, we can obtain luit)I~O. We will consider two situations. (a) When t~oo, there is lim x/t) = Xi" I --> ~
Under this condition, let w;/t) ~ w~ , wji(t) ~ w;;, and t~oo on both sides of Eq. (6.1), then we can obtain (6.2 1)
where u'. = limu (t ). If wij(t) ~O, then J
l -+OCI
J
limu (t) = 0, i.e. Eq. (6.1) satisfies 1-+00
J
asymptotic stability. The abo ve analysis can be described in the following theorem.
Theorem 6.1
When
x ; (z)
~ x ;' , if the selected connection weight functions satisfy
wuCt)~O and wj/t ) ~ w;;, then feedback difference Eq, (6.20) is asymptotically
steady. (b) When
t~oo,
lim r .Ir) oscillates periodically or is not convergent. I--> ~
Suppose that IXj(t)1 and IUjU)1 both have upper bounds which are respectively equal to M, and M; (i=I,2, .. . ,n ; } = 1,2,... ,m). Because
IU j (t)I=
~W/t{Xi(t)+~Wji(t)U/t-1))
s ~hj (t) i( lx/t) 1+ ~h/t)llu / t -1)1 )
~ ~IWij (t)I(M + ~IWj/t)~ .r
u ).
(6.22 )
134
ProcessNeural Networks
If wjJt) has an upper bound W, then there is
luit) I ~O
when
wit)~O.
As is
shown in Eq. (6.5), (6.23)
Thus, as long as Ibl(t)I~O, we can obtain wij(t)~O. Therefore, if the selected basis function attenuates with time and finally tends to zero, the asymptotic stability condition ofEq. (6.20) can be satisfied . The speed of basis function attenuation is the key factor affecting the integrability ofEq. (6.2). Only when (6.24) the integral in Eq. (6.2) is integrable. In order to get the integrability condition about Eq. (6.2), suppose that
lu/t)I:::; 1/ t", j
= 1,2, ....m,
(6.25)
or
IIWij(t)I(M x +mwMu ) :::; uv
(6.26)
;=]
Here, as long as wit) satisfies (6.27) Eq. (6.2) is integrable. Eq. (6.27) is rewritten as (6.28)
When btCt) satisfies (6.29) Eq. (6.2) will also be integrable. Therefore, the above conclusion can be described into a theorem . Theorem 6.2 Suppose that 1x;(t)1 and IuP) I both have upper bounds which are respectively M, and M u • p>1 is an arbitrary positive number, and then when the selection of connection weight basis function satisfies Eq. (6.29), Eq. (6.2) is integrable. At the same time, Eq. (6.20) is asymptotically steady, that is,
FeedbackProcessNeural Networks
135
limu .(t ) = O. t ~ oo
J
In Theorem 6.2, it is easy to understand that IXj(t) I must be bounded, but the assumption that luit)1 is bounded depends on the condition of the system. In fact, this assumption is generally satisfiable in practical application.
6.2 Other Feedback Process Neural Networks According to the spatio-temporal aggregation mechanism of process neurons and their structures and information transfer modes of feedback process neural networks, we can construct different feedback process neural network models to satisfy different demands.
6.2.1 Feedback Process Neural Network with Time-varying Functions as Inputs and Outputs [6] The structure, the spatio-temporal aggregation mechanism and the information transfer flow of this feedback process neural network are shown in Fig. 6.2.
XI (t) - - - - ; j H - 1 - - + (
. . 0'.--_-
k-""---....
y(t )
Fig. 6.2 Feedback process neural network with time-varying functions as inputs and an output
The hidden-layer process neurons in Fig. 6.2 process input signals using weighted aggregation, integral from 0 to t, activation output, etc. The output neuron is a process neuron that just includes spatial weighted aggregation . The transfer relationsh ip between input and output signals of various layer nodes in the network is as follows. The input to the system is X(t)=(x\(t),X2(t),.. . ,xn(t)) for tE [0,1], where [0,1] is the input process interval of the system. The output of a hidden layer process neuron node is
136
Process NeuralNetworks
f[t( r
"/,)=
w,(<) ( x,«) +
t. Wp«-<,)"/<-<,l}<)-0/1)]. =1,2,... j
.m,
(6.30) where wi/t) is the connection weight function between the ith node in the input layer and the jth node in the hidden layer; Wjj(t) is the feedback connection weight function between the jth node in the hidden layer and the ith node in the input layer; uit) is the signal output of the jth node in the hidden layer at time t; 10 is the time delay unit; Bj(t) is the activation threshold function of the hidden layer node. f is the activation function of the process neuron in the hidden layer. The output of a feedback process neural network shown in Fig. 6.2 is
(6.31)
where vit) is the connection weight function from the jth node in the hidden layer to the output node; ~(t) is the activation threshold function of the process neuron in output layer; g is the activation function of the output node; yet) is the network output. The feedback process neural network model defined by Fig. 6.2 can be applied to solve problems such as dynamic system process control, nonlinear signal process prediction, etc.
6.2.2 Feedback Process Neural Network for Pattern Classification Consider the feedback process neural network model shown in Fig. 6.3.
wji
y
x. (t)
---
.,-----1"
Fig. 6.3 Feedback process neural network for pattern classification
FeedbackProcessNeural Networks
137
In Fig. 6.3, the processing by hidden layer process neurons of input signals involves weighted aggregation, integration, over the whole input process interval, activation output, etc. The output neuron is a time-invariant neuron. The input-output relationship between signals in each layer of the network shown in Fig. 6.3 is as follows. The inputs of the system are X(t)=(Xl(t),x2(t),... ,xn(t» for te [0,11. where [0,11 is the system input process interval. The output of the hidden layer process neuron node is (6.32)
where wij(t) is the connection weight function between the ith node in the input layer and the jth node in the hidden layer; wj;(t) is the feedback connection weight between the jth node in the hidden layer and the ith node in the input layer; Uj(t) is the output of the jth node in the hidden layer. The output of the feedback process neural network is (6.33)
where vit) is the connection weight for the jth node in hidden layer to the output node; Bis the activation threshold of the output neuron; g is the activation function of the output neuron; yet) is the output of the network. The feedback process neural network model can be used in applications such as aircraft engine condition monitoring [6], determination of molecular mass of polyacrylamide [7], etc.
6.2.3 Feedback Process Neural Network for Associative Memory Storage For briefness, Fig. 6.4 only gives a feedback process neural network model with 4 process neuron nodes; actually the following discussion is also applicable to a network with any number of nodes. This structure is similar to a traditional Hopfield network [8,9]. In Fig. 6.4, each process neuron node is both an input node and an output node. The inputs of the system are X(t)=(Xt(t),x2(t),... ,xn(t» for te [0,11. where [0,11 is the input process interval. The network output is
138
ProcessNeural Networks
Fig. 6.4 A feedback process neural network for associativememory storage where t is the time delay unit; wij(t) is the connection weight function for the time-varying signal positive-direction transfer ; j i (r) is the feedback connection
w
weight function of the signal; ij=I,2,3,4 ; Uj(t) is the signal output of the ith node at time t; Wt) is the activation threshold function for the process neuron; and f is the activation function for the process neurons. This feedback process neural network model can be used for associative memory optimization of functional functions, etc. At present , many feedback process neural network models have been proposed, e.g. He X.G. and Xu S.H. propose a feedback process neural network model based on weight function base expansion [IO J• Xu Z.F. et al. present a feedback procedure neural network model based on Walsh conversion [II) . Ding G. and Zhong S.S. propose a feedback process neural network model with time-varying input and output functions [6J• Xu S.H. and Li P.c. propose a novel neural network model named the feedback procedure neural network with three layers and its hidden layer and output layer are procedure neurons [12J, etc.
6.3 Application Examples Next, two application examples using feedback process neural networks defined by Eqs. (6.1) and (6.2) are given. Example 6.1 A function sample classification problem Construct 9 input sample functions belonging to 3 classes with process input interval
Feedback Process Neural Networks
139
[0,I]. The first class has 3 sample functions: sin(21t(t-Q.5)), sin(2.l1t(t-Q.5)), sin(2.21t(t-Q.5)). Suppose that the corresponding expected output is 0.3333; The second class has 3 sample functions : 1.2sin(31t(t-0.667)), 1.2sin(3.21t(t-Q.667), 1.2sin(3.41t(t-0.667)). Suppose that the corresponding expected output is 0.6667; The third class has 3 sample functions: 1.4sin(41t(t-0.25)), 1.4sin(4.31t(t-Q.25)), 1.4sin(4.61t(t-Q.25)) . Suppose that the corresponding expected output is 1.0000. The structure parameters of the feedback process neural network shown in Fig. 6.1 are chosen as follows: 1 input node, 10 process neuron hidden layer nodes, 1 process neuron output node, and take the Walsh orthogonal function system as the basis functions. A continuous Walsh transform pair is L
(6.35)
J(t) = 'LF;wal(l,t), 1=0
F; =
1
(6.36)
J(t)wal(l,t)dt,
where wal(l,t) is the continuous Walsh basis function. Take the number L of the basis function as 32, the learning rate constants a=0.45 , /3=0.30, ~.35 and ..1=0.55, time delay 'l=O.05, the maximal learning times M=5,000, and learning accuracy 8=0.05. The network training is convergent after 971 iterations. The training result is shown in Table 6.1 and the learning error curve is shown in Fig. 6.5. Table 6.1 The training result of the feedback process neural network Expected output
Actual output
Absolute error
0.3333
0.3625
0.0292
2
0.3333
0.3344
0.0011
3
0.3333
0.3064
0.0270
4
0.6667
0.6701
0.0035
5
0.6667
0.6674
0.0007
6
0.6667
0.6612
0.0055
7
1.0000
0.9612
0.0388
8
1.0000
0.9509
0.0491
9
1.0000
0.9605
0.0395
Serial number
140
ProcessNeural Networks
1.0
...0 ...... <1.l ell
c
'2 ...
0.5
o:l <1.l
...l
00
400
800
1,200
1,600
2,000
Iterations Fig. 6.5 Learning error curve
Example 6.2 Rotating machine failure diagnosis A rotating machine is a mechanical device with a rotor and other gyration components as its main components. For rotating machine failure diagnosis, different time domain wave shape signals reflect different failure types, so we can implement failure diagnosis by recognition of the wave shape of a continuous signal within a sample interval. The rotating machine moves periodically, so a rotation cycle of the machine can be considered as a sample interval and the continuous change process of the signal within one cycle can be considered as a sample. Typical failure modes of a rotating machine are mainly divided into 4 types (13,14 1, namely, eccentricity, axis misalignment, abrasion, and normal. Typical signal curves are shown in Fig. 6.6. x(t)
x(t)
1.0 0,5
0.5
a
0.5
a
1.0 t
(a) x(t)
1.0
1.0
0.5
0.5 0.5 (c)
1.0 t
(b)
x(t)
a
0.5
1.0 t
a
0.5 (d)
Fig. 6.6 Four typical curves of rotating machine movement (a) Normal; (b) Axis misalignment ; (c) Eccentricity ; (d) Abrasion
1.0 t
Feedback Process NeuralNetworks
141
The feedback process neural network shown in Fig. 6.1 is used as the automatic failure diagnosis apparatus, continuous change signals within a cycle are used as the inputs to the network, and the output is the machine working state (the failure mode). In the actual signal measurement, choose respectively 5 axis misalignment curves, 6 eccentricity curves, 5 abrasion curves, and 3 normal curves, namely, 19 signal curves in all, to constitute a learning sample set, and the test set is made up of 10 samples. The network structure parameters are chosen as follows: 1 input node, 10 process neuron hidden layer nodes, and 1 output node. The basis functions select the continuous Walsh orthogonal function system. Train the feedback process neural network with the samples in the learning sample set. The failure mode output is 0.25 corresponding to axis misalignment, 0.50 to eccentricity, 0.75 to abrasion, and I to normal instance . The number of the basis function L is 32, learning rate constants a=0.55, /3=0.35, r-=0.60, and A=0.55, the maximal learning number of times M=5000, and the learning accuracy 8=0.05. The network training is convergent after 1277 iterations . In training, 10 test samples are recognized, of which 9 test samples are correctly judged, and it has gotten a better result. The learning error curve is shown in Fig. 6.7. 1.0 .... 0 .... ....
-l
0
0
500
1000 Iterations
1500
Fig. 6.7 Learning errorcurve Several simple feedback process neural network models are constructed in this chapter mainly to illustrate how to design learning algorithms and analyze stability. In practical applications, we usually need to build a feedback process neural network model with more complex structure according to specific problems . As feedback process neural networks increase the feedback of neuron output information during the learning of input samples, the learning efficiency and the stability of the network can be improved . However, during information processing, feedback process neural networks need to consider simultaneously the combination problem of frontward transfer and feedback of time-varying information, and therefore, both the structure and the learning algorithms are comparatively complex . It is an important research topic to analyze the information processing mechanism and the flow of feedback process neural networks thoroughly and design steady, highly efficient learning algorithms during research on feedback process neural networks .
142
Process NeuralNetworks
References [1] Alderighi M., Crosetto D., d'Ovidio EM., Gummati E.L., Sechi G.R. (1995) A feedback neural network for signal processing and event recognition. In: IEEE First International Conference on Algorithms and Architectures for Parallel Processing 2:788-791 [2] Yang Y.Q., Cao J.D. (2008) A feedback neural network for solving convex constraint optimization problems. Applied Mathematics and Computation 201(1-2):340-350 [3] Yee L., Chen K.Z., Gao X.B. (2003) A high-performance feedback neural network for solving convex nonlinear programming problems. IEEE Transactions on Neural Networks 14(6):1469-1477 [4] Noureddine D., Tabar A., Kamel B., Abdelhafid M., lalal E (2008) Application of feedback connection artificial neural network to seismic data filtering. Comptes Rendus Geoscience 340(6): 335-344 [5] He X.G ., Xu S.H. (2004) A feedback neural network model and learning algorithm. ACTA AOTUMATICA SIClNA 30(6):801-806 (in Chinese) [6] Ding G., Zhong S.S. (2007) Feedback process neural network with time-varying input and output functions and its applications. Control and Decision 22(1 ):91-94 (in Chinese) [7] Zhu c.L. (2005) Determination of molecular mass of polyacrylamide by feedback procedure neural networks. Chemical Engineer 112(1) :20-22 (in Chinese) [8] Hopfield J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79(8):2554-2558 [9] Hopfield lJ. (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. 81(10):3088-3092 [10] He X.G ., Xu S.H. (2004) A feedback process neuron network model and its learning algorithm. Acta Automatica Sinica 30(6):801-806 (in Chinese) [11] Xu Z.F., Liang J.G ., Li P.C., Xu S.H. (2004) A feedback procedure neural network model based on Walsh conversion and its learning algorithm. Information and Control 33(4):404-407 (in Chinese) [12] Li P.c., Xu S.H. (2005) Feedback procedure neural network and its algorithm. Computer Engineering and Design 26(2):459-464 (in Chinese) [13] Wang F. (2006) Faults Diagnosis of Rotating Machinery. China Machine Press, Beijing (in Chinese) [14] Xu S.H. , He X.G. (2004) Research and applications of radial basis process neural networks. Journal of Beijing University of Aeronautics and Astronautics 30(1) :14-17 (in Chinese)
7 Multi-aggregation Process Neural Networks
The inputs to the process neural networks introduced in previous chapters are only time-dependent unary functions. In fact, the input/output functions of the process neural networks need not always depend on time or one variable, they may depend on other multiple process factors, i.e. they may be arbitrary multivariate functions. For example, the output of a practical system whose inputs are relative to both a space position (xy.z) and time t is the joint action result of several inputs depending on these process factors, such as debris flow formation [ll, crop growth prediction 121, earthquake magnitude prediction 131, chemical action in a chemical reaction tower!", etc. The forms of the input functions (such as rainfall, the degree of erosion, pressure, temperature, etc.) of these systems are u;(x,y,z,t) (i=l ,2,.. .,n) which are all multivariate functions (or processes). If neural networks are used to simulate and build models for these dynamic systems, then multi-factor aggregation and accumulation must be considered when the neurons process the input information. Therefore, process neural networks with only a time dimension can be extended into multi-aggregation process neural networks that can process several multivariate functions (or processes). In this chapter, we will give several models of multi-aggregation process neural networks and derive a gradient descent learning algorithm based on the expansion of a multivariate function.
7.1 Multi-aggregation Process Neuron The inputs and the connection weights of a multi-aggregation process neuron can both be multivariate functions (or processes), and the processing for input signals includes weighting, spatial aggregation, multivariate process effect accumulation, activation output, etc. When there is only one process factor, the multi-aggregation process neuron is simplified into a common process neuron. Suppose that the general form of an input function of a multi-aggregation process neuron is an arbitrary multivariate function Xj(tlh , .. .,tp) for i=I,2, ... ,n; where tpE [O,Tpl for p=1,2, ...,P in which some Tp may be 0, i.e. some arguments may not
144
Process Neural Networks
appear. The connection weight function of the input channel is a multivariate process function w ;(t],tz, ... ,tp). An aggregation operator can be a general linear or nonlinear functional operator. For example, a spatial aggregation operator can adopt spatial weighted summation operation and a multivariate process accumulation operator can adopt forms such as multiple integration and other multivariate algebra operations. As a special case, a multi-aggregation process neuron may only have proces s aggregation or spatial aggregation. A general model of a multi-aggregation process neuron is shown in Fig. 7.1.
y
Fig. 7.1 A general model of multi-aggregation process neuron In Fig. 7.1, "EEl" is the spatial aggregation operator of n input functions of the multivariate process; "@" is the accumulation operator of the multivariate process ; K( ·) is the multivariate aggregation kernel function ; K) is the activation function which may be a Sigmoid function, a Gauss function, etc. or an arbitrary bounded function . The input-output mapping relationship of a multi-aggregation process neuron shown in Fig. 7.1 is (7.1)
where X(t),fz ,... ,tp) is the input function vector, W(t ,,fz, ... ,tp) is the connection weight function vector; ()is the activation threshold . If "EEl" adopts spatial weighted summation and " @" adopts multiple integration in the multidimensional process interval [O,Tdx[O,Tz]x.. .x[O,Tp ] , then the input-output mapping relationship of the multi-aggregation process neuron is
In particular, when the kernel function is an identical function , the input-output mapping relationship of the multi-aggregation process neuron is (7.2)
If the multiple integral operation in Eq. (7.2) adopts the multiple integration with
Multi-aggregation Process Neural Networks
145
varying parameters (referring to an integral upper limit), and the activation threshold of the multi -aggregation proce ss neuron is also a multi variate proce ss function , then Eq. (7.2) becom es
y(II'lz,···,l p) =
t(I'l'...r
t x;(tj ,"z"' " "p)x W/tj , "z"'" "p)d,,)d"z·..dr, - O(II'lz,...,l p)). (7.3)
where IpE[O,Tp] for p=I ,2,00 .,P, and £l,.1J,12,00 .,lp) is the activation threshold function of the multi-aggregation proce ss neuron . As is shown in Eq. (7.3), the multi-aggregation proce ss neuron model, whose input s and outputs are both process functions, represents simultaneously the spatial weighted aggregation of several multivariate proce ss input signals and the phase effect accumulation of already input process signals at time (1),12'00 .,Ip) and can implement a synchronous mapping relationship between the inputs and the outputs.
7.2 Multi-aggregation Process Neural Network Model A neural network, which consists of a multi-aggregation process neuron and other types of neuron s according to a certain structure and information transfer flow, is called a multi-aggregation proce ss neural network. Neuron s of the same type have the same structure, share theories and learning algorithms, and implement the same aggregation operation in the network. At the same time , the information transfer between various types of neuron hidden layers should satisfy the definition of each type of neuron input/output signals in network model s in multi-aggregation proce ss neural networks. Multi-aggregation process neural networks have parallelity in nature, are extensions of process neural networks with only a time dimension, and have quite a broad application in practical problems.
7.2.1 A General Model of Multi-aggregation Process Neural Network The information transformation mechanism and the training course of a multiaggregation process neural network are comparatively much more complex than those of a process neural network with only a time dimension, and need spatio-temporal aggregation operation of multiple variables and multiple hierarchies. However, from a functional viewpoint, this sort of network does not have too many particularitie s. For briefness, consider a multi-input-single-output feedback multi-aggregation process neural network with a single multi-aggregation process neuron hidden layer defined by Eq. (7.1) 151. In fact, the processing method for a network with more hidden layers and more outputs is similar. The network structure to be discussed is shown in Fig. 7.2.
146
Process Neural Networks
y
Fig. 7.2 A multi-aggregation process neural network
In Fig. 7.2, Xj( t J,tz, . . . ,tp) (i=1,2,.. . ,n; tpE [O,Tp]) is a P-variable process function; wij(tt,(z, .. . ,tp) is a P-variable connection weight function between the nodes in the input layer and the nodes in the hidden layer; "EEl" is the spatial aggregation operator of an n-dimensional P-variable process input variable; "(8)" is the P-variable process accumulation operator; f is the activation function of the multi-aggregation process neurons; (L , g) is a time-invariant neuron in which g is the activation function of this neuron ; Vj is the connection weight from the nodes in the hidden layer to the node in the output layer. The input-output relationship of a multi-aggregation process neural network represented by Fig. 7.2 is
If "EEl" is the spatial weighted summation, "(8)" is the multiple integration in [O,Ttlx[O,Tz]x... x[O,Tp ] , and K is a kind of kernel function , then the input-output relationship of a multi-aggregation process neural network is
In particular, when the kernel function K is an identical function, the mapping relationship becomes
We can obtain from Eqs. (7.4) and (7.5) that the multi-aggregation process neural networks, whose aggregation operation and mapping mechanism can simultaneously reflect the joint action influence of several multivariate process input signals in multidimensional space and the accumulation result of a multi-factor (multi-variable)
Multi-aggregation Process Neural Networks
147
process effect, can directly be used to build a model and implement signal processing of a complex nonlinear system affected by multiple process factors. This enhances the solution capability of artificial neural networks for practical application problems, and has good adaptability for complex practical problems such as environmental prediction from satellite data, aircraft systems diagnosis , oilfield exploitation process simulation , etc. in the information processing mechanism . For a complex dynamic system with multi-factor joint action, multi-aggregation process neural networks are efficient models for use in direct modeling and as an information processing mechanism.
7.2.2
Multi-aggregation Process Neural Network Model with Multivariate Process Functions as Inputs and Outputs
Some multi-aggregation process neurons, whose inputs and outputs are both multivariate process functions that are defined by Eq. (7.3) and other types of neurons according to certain structures and information mapping relationships, constitute multi-aggregation process neural networks whose inputs and outputs are both process functions. For the briefness, consider a multi-aggregation process neural network with a single multi-aggregation process neuron hidden layer, multivariate process functions as its inputs and outputs, and a linear function as the activation function of the output layer node. Its topological structure is shown in Fig. 7.3.
Fig . 7.3 A multi-aggregat ion process neural network with multivariate proce ss function s as inputs and outputs
The input-output mapping relationship of the multi-aggregation process neural network represented by Fig. 7.3 is y(tl't2 , · .. ,t p ) = ~ v/tl't2 , · .. ,t p )! m
xWij('f"
(
1'1'·.. l' ~Xi('f1' 'f n
2 , .. · , 'fp )
'f2 , . .. , 'fp)d'f,d'f2 ...d r, -B/tl't2 , ,t P ) ).
'»
E
[O,Tp ] ' P =1,2,
,P
(7.6)
148
Process Neural Networks
where y(tlh,. ..,tp) is the output function of the multi-aggregation process neural network, v/tlh, ...,tp) is the connection weight function from the nodes in the hidden layer to the output node that can also be a time-invariant adjustable parameter. B;(tlh,oo .,tp) is the activation threshold function of a multi-aggregation process neuron node. We can deduce from Eq. (7.6) that a multi-aggregation process neural network model whose inputs and outputs are both multivariate process functions has good adaptability for the modeling of many practical problems whose inputs and outputs are both multivariate processes .
7.3 Learning Algorithm As multi-aggregation process neural networks include aggregation and accumulation of multi-input and multi-hierarchy space and processes, their information transformation mechanism and the computational steps for the aggregation/accumulation operation are both quite complicated. Meanwhile, because of the randomness of the form of the connection weight function in the network, if the function type is not limited, it is very difficult to give an efficient learning algorithm directly to confirm the system model by training using a learning sample set. Therefore, a group of proper multivariate function basis (for example, multivariate polynomial basis function) can be imported to the input function space, and the input functions are expanded with finite terms under this group of basis functions within fitting precision . At the same time, the network connection weight functions and the activation threshold functions are also expressed in expansion form of this group of basis functions . In this way, the training problem for the network connection weight functions is converted into a learning problem aiming at basis function coefficients in the expansion , i.e. the multivariate functional function approximation problem of the network connection weight function training is converted into solving the extremum problem of the multivariate function aiming at basis function coefficients in the expansion. It is supposed that solving the problem by adopting a functional method directly is also possible, but further study is needed to obtain an efficient applicable algorithm.
7.3.1 Learning Algorithm of General Models of Multi-aggregation Process Neural Networks Suppose that the input function space iI of multi-aggregation process neural networks belongs to (C([O,Tdx[O,T2]x oo .x [O,Tp])) n, that bl(t,h,oo .,tp), bitlh, . .. ,tp), bltlh,oo .,tp) are a group of basis functions In (C([O,Tdx[O,T2]xoo .x[O,Tp])r, X(tlh,oo .,tp)=(Xl(tlh oo .,tp), X2(tlh oo .,tp), and that 00 "
Multi-aggregation Process Neural Networks
Xn(tt,(z, Xi(t"tz,
149
,tp» is an arbitrary function in if. For given fitting error precision, ,tp) (i=I,2, .. . ,n) is expanded into the finite series form of basis functions L
xj(tl't2 , ••• ,t p) =LajJb/(tl't2 , ... ,tp), i =1,2,....n,
(7.7)
1='
where ail is the expansion coefficient of Xi(t,,(z, .. .,tp) corresponding to the basis function b/(t,,(z,... ,tp), and L is the number of basis function terms satisfying the fitting error precision . At the same time, the connection weight function Wij(tl,(z, ,tp) in Eq. (7.5) is represented in the series form of the basis functions bt(tt,(z, ,tp), bz(tt,(z,.. .,tp), ... , b/(tt,(z,.. .,tp), i.e.
L wij)b L
wij(tl't2 , .. . ,tp) =
/=,
l(tl't2 , · ..
,t p), i = 1,2,...,n;j = 1,2,....m,
(7.8)
where wjjl) is the expansion coefficient of wij(tt,(z, ... ,tp) corresponding to the basis function b!Ct,,(z, ... ,tp), and is an adjustable non-process parameter. Substitute Eqs. (7.7) and (7.8) into Eq. (7.5), and then the input-output transformation relationship of the network can be expressed as
(7.9)
which can be simplified to yield
_ [m
(n
L
L
(I) fT, fT2
(p
y-g ~vJ ~~~ajSwjj 1 1 ·..1 bl(tl't2 , · .. ,t p) (7.10) xbs (tl' t2 , ... ,t p)dt,dt2 ...dr,
- OJ ) -
0 ).
Denote (7.11) As the basis function b/(tt,(z,... ,tp) and the integral interval [O,Ttlx[O,Tz] x .. .x[O,Tp] have already been given in advance, B/s is a precalculated numerical value obtained by multivariate multiple definite integration or a numerical
150
Process NeuralNetworks
computation method . Thus, Eq. (7.10) can be rewritten as
(7.12)
Actually, the above computational course can be simplified as follows. If the input function X;(ll,(z, .. .,lp) (i=1,2, ... ,n) is not expanded in advance, then the input-output transform relationship of the network can be expressed as
r..
y = g [~VJ ( 1 · r (~Xi(11'12,."'lp)J
x( t WY)b/(11'12,·..,lp)Jdl 2...dr, - OJ )- oJ. 1d1
which can be simplified to yield
r r (p y-g(~f;:vJ (~~ fi'ftWij .h .h ....h b/(11'12,·..,lp) _
(I )
1i
T ,
XXj(11'12, ·..,lp)dl1d12···dlp -OJ)-0
J
Denote
Obviously, B li is a numerical value that can be figured out in advance by a multivariate multiple definite integration or a numerical computation method. Thus, the input-output mapping relationship of the network may be expressed as
As it can be seen that the above computational steps decrease the workload of expanding input functions and cut down the computation of a summary sign, so the computational quantity decreased a lot. Denote K learning samples (Xkl(tl,(z, ... ,lp), XkZ(tl,(z,... ,lp), ... , Xkn(t \>lz,... ,lp), dk) for k=1,2, .. .,K. Assume that the real output corresponding to the kth learning sample of the input system is Yk. The error function of the network is defined as
Multi-aggregation Process Neural Networks
151
where a;~k) is the coefficient of Xki(tth, ...,tp) corresponding to b/(tth, ... ,tp) in the basis function expansion. According to the gradient descent algorithm, the learning rule of the network connection weights and the activation thresholds is Vj = v j +tmv j , j = 1,2,....m,
(7.14)
w;j) =wy)+/3t:.wy), i=I,2, ...,n;j=I,2, ..., m; 1=1,2,...,L, OJ =OJ+f6,Oj,j=I,2, ....m,
(7.15) (7.16)
(7.17)
0= 0+ 77t:.O,
where a, /3, rand 77 are learning rate constants. Denote
then LlV j
aE K , =--a =-2L(g( Zk)-dk)g (Zk)!(U kj) ,
t:.w;j/) = -
vj
a~)
= -2t(g(Zk) -d k )g'(Zk
aE
~(
)VJ'(uk)a;~k) ,
),
,
t:.Oj=-aO =-2f;t g(zk)-d k g(zk)vJ(ukj)(-I), j t:.O
(7.18)
k=1
aE =-- =-2L(g(Zk) -dk )g'(Zk)(-1) . K
ao
If the activation functions
(7.19)
(7.20)
(7.21)
k=1
!
and g are both Sigmoid functions, then
f(u)=ftu)(l-ftu)).
The specific learning algorithm is described as follows. Step 1 Choose basis functions bt(tJ,tz,.. .,tp), bz(tth, ... ,tp), ... , bL(tth, ... ,tp) in input function space. The input functions and the network connection weight functions are expressed in the expansion form of this group of basis functions; Step 2 Give the error precision e, the number of accumulation learning iteration times is s=O, and the maximal number of learning iteration times is M; Step 3 Initialize the network connection weights and the activation thresholds Vj' w;j/), OJ' 0 (1=1,2,..., L ; j = 1, 2,...,m; 1=1,2, ...,L);
Step 4
Calculate the error function E according to Eq. (7.13), and if Ece or
152
Process NeuralNetworks
sz-M, go to Step 6. Step 5 Modify the connection weights and the activation thresholds according to Eqs. (7.14)-(7.21); s+1~s; Go to Step 4. Step 6 Output the learning result and stop. The input of multi-aggregation process neural networks may be a multivariate analytic function or discrete sample data that depend on the multidimensional process. Therefore, the basis function of a multivariate input function can adopt functions such as multivariate polynomials that are suitable for both analytic function expansion and discrete process data fitting.
7.3.2 Learning Algorithm of Multi-aggregation Process Neural Networks with Multivariate Functions as Inputs and Outputs In Eq. (7.6), the multivariate connection weight function v/tlh, ... ,tp) and the activation threshold function f1(tlh, ,tp) are expressed in the expansion form of the basis functions b1(tJ,tz, ... ,tp), bZ(tlh, ,tp), ... , bLCtJ,tz, ... ,tp). L
(7.22)
v/tl't2,...,t p) = "f.vj/lb[(tl't2, ...,tp), 1=1
L
(7.23)
OJ(t"t2,..·,tp) = "f.Oylbl(tl't2,..·,tp). 1=1
Substitute the basis function expansions of xltJ,tz, ... ,tp), Wij(t,,(z ,.. .,tp), V/t l,(z,... ,tp) and f1(tlh ,... ,tp) into Eq. (7.6), then the input-output mapping relationship of the network can be expressed as y(tl't2,...,t p) =
~(t vYlb, (t" t2,...,t p)J
Xf( £' £'... £'
x(t
-t
t(
taA(TI' T2 , .. · , Tp )
wijlb/(TI' T2 , ... , Tp)JdT1d T2 •..
1=1
Oji)b/(tl' t2,...,tP ) }
which can be simplified to yield
dr,
J
(7.24)
Multi-aggregation Process Neural Networks
y(t"t2,...,tp) =
t(t n
xf (
153
vY)b/ (t"t2,...,tp)) L
L
t;t;~aiSw~/)
l'1'...1'b/(t""t"2, ...,t"p)
xbs (t""t"2 ,..., t"p)d t"ld t"2 ...d t"p -
t
/=1
(7.25)
Bj/)b/ (t"t2,...,tp)) .
Denote
then Eq. (7.25) can be simplified as
(7.27)
Xkn(tlh,.oo,tp) , Denote K learning samples (Xkl(tlh,oo.,tp), Xdtlh ,oo .,tp), dk(tJ,tz, oo .,tp)) for k=1,2,oo .,K. Suppose that the real output of the network corresponding to the kth learning sample of the system is Yk(tlh, ... ,tp), and the error function of the network is defined as 00"
1 K E = - :LIIYk(t"t2, ...,tp)-dk(t"t2, ...,tp)11
K
k =l
t.[ r 1' ·J'(t,( t v)"b, "" ,',))r(to t.t.a~"w¥' -t. o;"b, ,',»)-d. J r'
=~
(I,
B, (I,,I, ,. .,1,)
1
(1,,1,....
(I,,I, ,...,1,)
dr.dr,...dr,
(7.28)
where aj\k) is the coefficient of Xk;(tlh, oo .,tp) corresponding to brt.tlh, oo .,tp) in the basis function expansion.
Divide the input process interval [O,Tp] into Kp equal parts, and denote respectively the interval division points as t~,t~, ...,t; (p=1,2, ...,P). Choose an arbitrary P-variable division point O~lpg(p.
(t:' ,t~2 ,...,tj) in [O,Tdx[O,Tz]x ...x[O,Tp ] where
Reformulate Eqs. (7.7), (7.8), (7.22) and (7.23) as
154
Process Neural Networks
Xi (t:' ,t~
L
,...i;) = LaA (t:' ,t~ ,...,t~) , i = 1,2, ...,n,
(7.29)
1=1 L
W ij
" W ij(I )b1 (till ' t2l2, ••. , tip). P - L..J P , I - 1, 2, . .. ,n .,}. -- 1, 2 ,...,m, (tIil' tl22 , •.• , tip)
(7.30)
1=1 L
Vj
" (I)bI (tilI ' t2lz , .. . , tip) 2 , .. . , tip)P - L..Vj P , ), --1, 2, ... .m, (tlil' tl2
(7.31)
1=1
L
OJ(t:' ,t~ ,..., t~ ) =Lff;l)bl (t:l ,t~ ,...,t~), j
=1,2, ...,m.
(7.32)
1=1
Substitute Eqs. (7.29)-(7.32) into Eq. (7.28), and then the error function of the network is
The basis function b/(tl,(z, ,tp) is a known function in the multidimensional process interval [O,T!lx[O,Tz]x x[O,Tp], so the value at each multidimensional division point of B/s(tt.tz, ... ,tp) in [O,TIlx[O,Tz]x ... x[O,Tp] can be figured out by integration or a numerical computation methods. By a gradient descent algorithm, the modification rule of the network connection weights and the activation thresholds is the same as in Eqs. (7.14)-(7.16). For the convenience of illustration, denote n
L
L
"""
L
(k)
L..L..L..ai/
(I )BIs (tll' ' tlzZ , ... , tip)" wij p - L.. O(l)b j I (til I ' tl2 Z , .. ., tip) P
i =1 1=1 s=1
From Eq. (7.33), we have
1=1
Multi-aggregation Process NeuralNetworks
155
During the training of multi-aggregation process neural networks, whose inputs and outputs are both multivariate process functions , the computation course is very complex. When the division points of a multivariate process interval are determined, the values of b, (t~1 ,t? ,...,tl) , Bis (t~l ,t;2,...,tl),
T
TI-P, K P
p=l
etc. can be predetermined by
p
integral or numerical computation. Then the network is trained according to Eqs. (7.14)-(7.16) and Eqs. (7.34)-(7.36).
7.4 Application Examples Example 7.1 Classification problem of binary process signal Consider the following three classes of classification problems of binary process signals with 2-dimensional input. The first class of process signal is (tlsin((altl+aZtZ)1t), tzcos(a3tltZ» ; a J,azE [0.5,0.7], a3E [1.0,1.2]; The second class of process signal is (ttsin((bltl+bztz)1t) , tzcos(b3t(tz» ; bJ,bzE [0.75,0.95], b3E [1.2,1.3]; The third class of process signal is (tlsin((cltl+CZtZ)1t), tzCOS(C3tltZ1t»; CJ,CzE [1.0,1.2], C3E [1.3,1.5]; Here the binary process variable s (tlh)E [O,l]x[O,I], a.; b.; Ci U=1,2,3) are the signal
156
ProcessNeural Networks
parameters. In the first class of signals, arbitrarily choose 15 triples, whose parameters at. az are in the interval [0.5,0.7] and a3 is in [1.0,1.2], to constitute 15 sample functions of which 10 are used as the training set samples and 5 as the test set samples. Similarly, generate 10 training sample functions and 5 test sample functions for each of the second and the third classes of signal sets. Thirty 2D binary functions constitute the network training set and 15 constitute the test set. Suppose that the expected output of the first class of signals is 0.33, the second is 0.67 and the third is 1.0. The multi-aggregation process neural network denoted by Eq. (7.5) is used for the binary process signal classification. The structure of the network is 2-5-1, which means there are 2 input nodes, 5 multi-aggregation process neuron hidden nodes and 1 output node in the network. The basis function adopts a 5-order binary polynomial function. The learning rate constants of the network a, /3, rand 1] are respectively 0.50, 0.63, 0.60 and 0.50. The learning error precision is 0.05, and the maximal iteration number is 5000. The network is trained repeatedly 20 times and in each training, the first values are initialized. The network is convergent after 359 iterations on average and Fig. 7.4 shows one of the iteration error curves for one learning course. Fifteen function samples of the test set are classified and recognized respectively by 20 training results in which 5 models (the training result of multi-aggregation process neural networks for one time) are all well-judged, 5 models have 14 well-judged results, 7 models have 13 and 3 models have 12. The mean correct recognition rate is 90.67%. The result of the experiment shows that multi-aggregation process neural networks have adaptability for the classification of multivariate process signals. 1.0 ....
o t:: II)
~
'2 ....
0.5
~ II)
...:l
o
o
100
200 300
400 500
600
Iterations Fig. 7.4 Iteration errorfunction curve
Example 7.2 Dynamic process simulation of primary oil recovery in oilfield exploitation In the course of oilfield exploitation, oil production during the phase of primary oil recovery (natural exploitation depending on original reservoir energy) rests with the dynamic distribution situation p(x,y,z,t) of the reservoir pressure around the well bore.
MUlti-aggregation Process Neural Networks
157
Here, (x.y.z) is the space coordinate of an arbitrary point in the reservoir whose coordinate origin is the midpoint of the thickness of the reservoir, and t is the reservoir exploitation (perforation oil recovery) time. Oilfield development is a nonlinear dynamic system that obeys non-Newtonian fluid percolation laws. After the oil well is perforated to produce oil, the oil well is used as the center to form a pressure funnel in the reservoir. The nearer to the well bore, the smaller the reservoir pressure is; the further from the well bore, the higher the reservoir pressure and the closer to the original reservoir pressure. The fluid (oil, gas and water) in the reservoir pore flows into the well bore and is recovered under the effect of pressure difference. With the extension of exploitation time, the reservoir producing region (the oil drainage radius) is enlarged gradually and the reservoir pressure falls continuously at the same time. In the situation of a heterogeneous reservoir (where geology, physical properties, oil-bearing characteristics and thickness distribution change little), the reservoir pressure is distributed radially using the well bore as a center 16 ,71• Here, p(x ,y,z,t) can be written as p(t,r), where p(t,r) is the reservoir pressure at a point r away from the well bore at time t and dynamically changes with the change of oil well exploration time and the well spacing r. The North Saertu development experimental area of the Daqing oilfield is a large channel sand body sediment and the PH reservoir in the western area is an approximate heterogeneous reservoir. In the period of early oilfield development , 13 wells with 2500 m well spacing were placed in the PH reservoir. The wells did not affect one another during the primary oil recovery stage. Because this region was an oilfield under development as an experimental area, the changes in reservoir pressure around every well and oil production were tested and recorded in detail by the well test method during exploitation. Cumulative oil production and pressure variations over a 15-day period were recorded from one of the oil wells and are shown in Table 7.1. The actual measured data of the above 13 oil wells is fitted with a precision of 0.01 by the adoption of a 3-order binary polynomial function. The actual measured data of 11 wells are used to constitute the training set and the actual measured data of two wells is used to constitute the test set. A multi-aggregation process neural network with multivariate process functions as inputs and outputs represented by Eq. (7.6) is used to simulate the primary oil recovery dynamic process of the oil well. The topological structure of the network is 1-5-1, i.e. 1 input node, 5 multi-aggregation process neuron hidden layer nodes and 1 output node. The basis function is chosen as a 3-order binary polynomial function. The input of the network is the reservoir pressure p(t,r) and the output is the cumulative oil production Q(t) changing with time. The oil drainage radius R can be calculated in the laboratory from core analysis data and the parameters of the reservoir such as porosity, permeability, pore structure, etc. according to theoretical formula. The oil drainage radius of the PH reservoir in the western area is 750 meters. Eq. (7.6) can be specifically reformulated as
Process NeuralNetworks
158
Table 7.1 Variations of formation pressure (MPa) around the oil well and oil production over time Time (d) 0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15
o' 500 12.75 12.75 10.26 12.71 9.32 12.65 8.75 12.46 8.21 12.01 7.79 11.86 7.45 11.37 7.21 11.21 6.87 10.93 6.49 10.82 6.23 10.63 6.05 10.52 5.81 10.35 5.63 10.21 5.47 10.15 5.33 9.90
50 550 12.75 12.75 10.97 12.75 10.14 12.72 9.55 12.60 9.07 12.27 8.67 12.03 8.28 11.51 8.13 11.32 7.79 11.21 7.55 11.05 7.27 10.81 7.00 10.70 6.75 10.56 6.59 10.42 6.50 10.35 6.43 10.18
Formationpressure (MPa) under different spacing 150 200 400 100 250 300 350 750 600 650 700 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 11.33 11.56 11.71 11.83 12.13 12.32 12.45 12.75 12.75 12.75 12.75 10.42 11.13 11.32 11 .66 11.87 12.05 12.31 12.75 12.75 12.75 12.75 10.13 10.65 11.00 11.23 11.47 11.71 11.97 12.70 12.75 12.75 12.75 9.89 10.23 10.68 11.01 11.20 11.43 11.61 12.38 12.55 12.61 12.75 9.33 9.89 10.27 10.65 10.85 11.21 11.49 12.22 12.41 12.58 12.67 9.05 9.77 10.03 10.26 10.51 10.73 10.94 12.00 12.21 12.36 12.58 10.16 10.32 10.51 10.70 8.73 9.38 9.87 11.73 11.89 12.27 12.51 10.00 10.25 10.43 10.57 8.65 9.17 9.64 11.50 11.67 12.12 12.48 8.31 8.75 9.25 9.70 10.00 10.24 10.47 11.39 11.62 12.05 12.43 8.10 8.58 9.10 9.54 9.71 10.15 10.32 11.27 11.53 11.90 12.36 7.75 8.37 8.95 9.30 9.55 9.80 10.15 11.03 11.45 11.80 12.30 7.60 8.21 8.75 9.12 9.47 9.72 10.00 11.00 11.32 11.67 12.25 7.41 8.05 8.60 9.07 9.23 9.47 9.73 10.78 11.25 11.76 12.21 7.35 7.75 8.70 8.87 9.15 9.31 9.65 10.70 11.19 11.70 12.17 7.31 7.52 8.61 8.65 8.97 9.25 9.40 10.34 11.03 11.53 12.15
450
QU) (rrr')
12.75
0.00
12.61
31.20
12.46
57.62
12.23
80.70
11.82
101.46
11.67
116.35
IU5
131.60
10.92
145.78
10.71
159.26
10.63
172.10
10.50
184.36
10.35
196.85
10.20
208.30
10.03
219.06
9.90
229.97
9.71
240.35
'Spacing (m); Q(t) is the cumulativeoil productionof the oil well
Q(t) =
!.b
5 ('750 w/r,r)p(r,r)drdr-B,) Lv/t)f J=I
tE [0,15].
(7.37)
The learning rate constants a, /3, yof the network are respectively 0.45, 0.53 and 0.50. The learning error precision is 0.05, and the maximal iteration number is 3000. For 11 learning samples , the network is convergent after 273 iterations . Oil production of 2 test wells is predicted from the reservoir pressure of 2 test wells. Table.7 .2 shows a IS-day prediction result. The prediction result satisfies the analysis requirement of actual problems. The dynamic model of primary oil recovery in oilfield development established by multi-aggregation process neural networks retains various geological properties
Multi-aggregation Process Neural Networks
159
and percolation rules for the reservoir in the structure and property parameters of multi -aggregation process neural network. It can be applied to other oilfields of the same type and can instruct oilfield production and the formulation of oilfield development schemes. Table 7.2 Oil production predictionresult of oil well (unit: nr' ) Time (d) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Real value 35.5 61.3 83.7 103.3 119.2 134.5 157.3 169.1 181.4 193.1 203.8 214.3 225.2 235.5 246.3
Test well 1 Prediction value 34.1 59.5 81.6 101.7 121.0 135.3 158.4 172.3 183.9 194.6 204.7 214.8 224.5 235.1 245.7
Absolute error 1.4 1.8 2.1 1.6 2.2 1.2 1.1 3.2 2.6 1.5 0.9 0.5 0.7 0.4 0.6
Real value 32.6 60.3 82.1 103.2 117.9 135.4 158.2 170.3 184.6 196.2 207.9 218.7 230.5 241.3 252.0
Test well 2 Prediction value 31.1 59.5 81.6 101.7 121.0 138.3 159.4 172.4 183.9 195.6 206.7 216.8 228.7 239.1 249.7
Absolute error 1.5 0.8 0.5 1.5 3.1 2.9 1.2 2.1 0.7 0.6 1.2 1.9 1.8 2.2 2.3
7.5 Epilogue In this chapter, aiming at an information processing problem where the system input is a multivariate process function as well as a multidimensional process signal, we establish a general model of multi-aggregation process neural networks and a multi-aggregation process neural network model whose inputs and outputs are both multivariate process functions . Multi-aggregation process neural networks can consider sim ultaneously the influence of jo int action and the multivariate process effect accumulation of multiple process factors for complex systems, and have direct modeling and information processing ability for complex systems in multidimensional process space . Therefore, multi-aggregation process neural networks have good adaptability for much actual signal processing related to multivariate process factors and nonlinear system modeling. The computational results of actual application problems confirm the aforementioned conclusions. The information mapping mechanism and the training process of multi-aggregation process neural networks are complex, so it is nece ssary to continue to study this using highly efficient and steady learning algorithms. In addition, the author thinks
160
ProcessNeural Networks
that similar to univariate process neural networks, the solutions to theoretical problems of multi-aggregation process neural networks such as functional approximation ability, computing capability, continuity, etc. ought to be certain , and this should in theory support the effectiveness of multi-aggregation process neural networks in practical applications.
References [1] Chang T.e., Chao R.I. (2007) Application of back-propagation networks in debris flow prediction. Engineering Geology 85(3-4):270-280
[2] Mo X., Liu S., Lin Z., Xu Y., Xiang Y., McVicar T.R. (2004) Prediction of crop yield, water consumption and water use efficiency with a SVAT-crop growth model using remotely sensed data on the North China Plain. Ecological Modeling 183(2-3):301-322 [3] Panakkat A, Adeli H. (2007) Neural network models for earthquake magnitude prediction using multiple seismicity indicators. International journal of neural systems 17(1):13-33 [4] Peng X., Wang W.e., Huang S.P. (2005) Monte Carlo simulation for chemical reaction equilibrium of ammonia synthesis in MCM-41 pores and pillared clays. Fluid Phase Equilibria 231(2):138-149 [5] Xu S.H., He X.G. (2007) The multi-aggregation process neural networks and learning algorithm. Chinese Journal of Computers 30(1):48-56 (in Chinese) [6] Song K.P. (1996) Oil Reservoi r Numerical Simulation . Petroleum Industry Press, Beijing (in Chinese) [7] Wang K.I., He B., Chen R.L. (2007) Predicting parameters of nature oil reservoir using general regression neural network. In: International Conference on Mechatronics and Automation pp.822-826
8 Design and Construction of Process Neural Networks
As a kind of functional approximator, process pattern associative memory machine and time-varying signal classifier, process neural networks have broad applications in modeling and solving various practical problems related to the time process or multivariate process. For example, Ding and Zhong used a wavelet process neural network to solve time series prediction problems [1,2], and used a parallel process neural network to solve the problem of aircraft engine health condition monitoring [3J• Zhong et al. used a continuous wavelet process neural network to solve the problem of monitoring of an aero-engine lubricating oil system [4]. Xu et al. used a process neural network and a quantum genetic algorithm to solve oil recovery ratios [5]; Song et al. used a mixed process neural network to predict the chum in mobile communications [6]. In order to solve practical application problems, we must design and construct corresponding process neural networks in terms of concrete problems, including the choice of the network model, the determination of the number of hidden layers and hidden nodes, the selection or the design of the neuron type in each node layer (including the choice of activation function, etc.), and the design of corresponding learning algorithms and parameters, etc. In this chapter, according to the application background and demands of different practical problems some practical process neural network models with different mapping mechanisms are constructed, such as process neural networks with double hidden layers, discrete process neural networks, cascade process neural networks, feedback process neural networks, self-organizing process neural networks, etc. The learning algorithms and corresponding application examples for these models are also provided in this chapter.
8.1 Process Neural Networks with Double Hidden Layers Considering the adaptability of process neural networks regarding their nonlinear
162
Process Neural Networks
transform capability for handling complex time-varying systems, we can construct a process neural network with double hidden-layers, which combine s process neurons with common time-invariant neurons, for many practical application problems [7]. The model consists of four layers, i.e. an input layer, a process neuron hidden layer, a time-invariant neuron hidden layer, and an output layer. The process neuron hidden layer accomplishes the extraction of the procedural pattern characteristics of time-varying input signals , the spatio-ternporal 2-dimensional aggregation, etc. The time-invariant neuron hidden layer is mainly used to improve the mapping ability for the complex input-output relationship of the system and enhance the flexibility and knowledge memory capability of the network .
8.1 .1 Network Structure For convenience of discussion, suppose that the process neural network with double hidden layers is a multi-input-single-output system and its topological structure is n-m-K-I. In fact, it can be easily extended to the multi-input-multi-output situation . The input layer of the network consists of n node units for inputting n time-varying functions x\(t).xz(t), . .. ,xn(t) to the network. The first hidden layer consists of m process neuron nodes to complete the spatial weighted aggregation and the time process effect accumulation operation of n input functions and the extraction of the procedural pattern feature and the transform relationship of function samples. The second hidden layer consists of K common time-invariant neuron nodes to improve the nonlinear mapping capability of the complex input-output relationship of the system. The fourth layer is an output layer that includes a time-invariant neuron node to complete the system output. The topological structure of the network is shown in Fig. 8.1.
X2(t) _
y
Fig. 8.1 Process neural network s with double hidden layers
The input-output relationship among the layers of the network is as follows . The system input is X (t) = (Xl (t ), X 2 (r), ..., X n (t )), t e [O,T].
Design and Construction of Process Neural Networks
163
The output of the first hidden layer is
(8.1)
where
yjl)
is the output of the jth process neuron in the first hidden layer; wilt) is
the connection weight function between the input node and the first hidden layer node; Bj') is the output activation threshold of the jth process neuron in the first hidden layer; [0,11 is the system input process interval, and function in the first hidden layer. The output of the second hidden layer is ( I)
(Z ) _
(Z)
m Yk - g [ L>jkY j -Bk
)
'
_
k -1,2,..., K
f
is the activation
(8.2)
J= !
where
y?J
is the output of the kth neuron in the second hidden layer; Vjk is the
connection weight between the first hidden layer and the second hidden layer; BiZ) is the output activation threshold of the kth neuron in the second hidden layer; and g is the activation function in the second hidden layer. The system output of the output layer is K
Y = L,ukY?) '
(8.3)
k =1
where Y is the output of process neural networks; ,uk is the connection weight from the second hidden layer to the output layer. Combining Eqs. (8.1)-(8.3), the input-output relationship of the system is
(8.4)
8.1.2 Learning Algorithm Suppose that b\(t),bz(t), . ° o,bdt) are a group of standard orthogonal basis functions satisfying the required input function fitting precision. The input function Xi(t) and the weight function wij(t) are expanded with b1(t),bz(t), ... .bL(t) in the following forms
164
Process Neural Networks L
Xi(t) = L,aab,(t), [= 1
L
wi/t) = L,wy)b[(t), 1=1
Substitute these functions into Eq. (8.4), then the input-output relationship of the system can be expressed as
According to the orthogonality of basis functions, the above equation can be simplified to
(8.5)
For convenience, suppose that the activation functions of various layers are all Sigmoid functions , i.e.ftu)=g(u)=1/(l+e-U ) . Give P learning sample functions (Xpl(t),xp2(t), ... ,xpn(t),dp) for p=1,2,... ,P in which the first subscript of Xpi(t) denotes the serial number of the learning sample and the second denotes the serial number of the input function vector component; dp is the expected output of the network corresponding to the input Xpl (t),xp2(t), .. .,xpnCt). Suppose that YP is the actual output of the network corre sponding to the input Xpl(t),xpzet),...,xpn(t), the learning error function is defined as
where ai~P) is the coefficient in the expansion of the function Xpi(t) corresponding to the basis funct ion bl(t). By adopting the gradient descent algorithm, the modified formula of connection weights and activation thresholds of process neural networks are
+ asu, ' vjk = vj k + /3/iv jk ,
(8.8)
W (l ) IJ
+ ,A w(l) , I~ IJ
(8.9)
= 0 (1)
+ 1]/i0(1) ,
(8.10)
= Ilk
Ilk
=
w(l) IJ O(l ) J
oi
J
2
)
=
J
0i + UOi 2
)
2
),
(8.7)
(8.11)
Design and Construction of Process Neural Networks
165
where a, /3, y, 1] and A, are the learning rate constants of the network. Denote n
L
a(p) -
" " W(l) IJ II
L.J L.J
()(l) = U. J
JP ,
i=1 1=1
then
aE
P
'J.4
p=1
!¥4 =--a =-2L:(J.4g(zkp)-dp)g(Zkp)' P
,
= -2 ~(llkg(Zkp)-dp)Ilkg (zkp)!(u jp)'
(8.13)
a~) = -2t~(llkg(Zkp) -d p )llkg'(zkp)vjd'(ujp)ai\P),
(8.14)
Llv jk = Llwijl) = -
aE
av
(8.12)
jk
ee
-ff-(
)"
Ll()j(l ) =- aBjI) =-2~f:t Ilkg(zkp)-dp Ilkg (zkp)Vjd (ujp)(-l),
ot:
-f( Ilkg(Zkp)-d p) Ll()k(2) =-a()?) =-2~ Ilkg'(zkp)(-l),
(8.15) (8.16)
The training course for the network is described as follows. Step 1 Choose standard orthogonal basis functions b t(t),b 2(t), .. .,bL(t) in the input space. The selection of the number of basis functions L should make the expansion of the basis function satisfy the required precision. Input functions and connection weight functions are denoted as the expansions of basis functions; Step 2 Give the learning error precision of the network e; the accumulative learning iteration times s=O and the maximal learning iteration times M; Step 3 Initialize the network connection weights and the activation thresholds Ilk' Vjk' w;j ), ()j!), ()i 2 ) (i = 1, 2, ...n; j = 1,2,..., m; 1= 1,2,..., L; k = 1,2,..., K) ; Step 4 Calculate the error function E according to Eq. (8.6); If Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and the activation thresholds according to Eqs. (8.7)-(8.16); s+l ~s; Go to Step 4; Step 6 Output the learning result and stop.
8.1.3 Application Examples Example 8.1 Application in rotating machine failure diagnosis 181
Looking at the problem of rotating machine failure diagnosis described in Example 6.2 in Chapter 6, we now adopt a process neural network with double hidden layers as the recognizer of a rotating machine failure automatic diagnosis system. Process neural networks can automatically extract a variety of process pattern characteristics
166
Process Neural Networks
from continuous input signals and generate memory through learning. Aiming at the four failure modes stated in Example 6.2 in Chapter 6, 22 curves in all are chosen of real measured signals, i.e. 5 of axis misalignment, 6 of eccentricity, 7 of abrasion, and 4 normal to constitute a learning sample set, and the test set is made up of another 8 samples. The structure parameters of the network are chosen as follows: I input node, 20 process neuron nodes in the first hidden layer, 10 time-invariant neuron nodes in the second hidden layer, and 1 output node. A Sigmoid function is adopted as the activation function. Because the network input signals change periodically, a trigonometric function system is chosen as the orthogonal basis function, and the number of basis functions is 50 (which is determined by the 22 learning samples and the 8 test samples with fitting precision 0.001 via experiment). The learning rate constants are a=0.50, /3=0.45, y=0.65, ,u=0.55 and ..1.=0.50, the maximal learning times N=lO,OOO, and the learning precision t=0.05. The network is actually convergent after 5937 iterations. The error function curve during the iteration is shown in Fig. 8.2. Eight test samples are recognized and 7 of them are well-judged, and the correction rate is 87.5%. It is a better result compared to some existing methods of rotating machine failure automatic diagnosis. 1.0,---------------,
i ~
...
0.5
~
2000
4000
6000
8000
Iteration
Fig. 8.2 Iteration error function curve
8.2 Discrete Process Neural Network When process neural networks are adopted to solve practical problems related to a time process such as system simulation modeling, signal processing, time series comparative analysis, etc., the situation that the system inputs are in discrete time sample data sequence is always encountered, i.e, information processing for a discrete time process. To solve the problem, we can construct a discrete process neural network model to directly process discrete time series. Discrete process neural networks are actually a special case of process neural networks with continuous time as inputs.
Design and Construction of Process Neural Networks
167
8.2.1 Discrete Process Neuron A discrete process neuron is also made up of weighted input signal, spatio-temporal 2-dimensional aggregation, activation output, etc. Differing from a continuous process neuron, its inputs and connection weights are both in a discrete time sequence. If the spatial aggregation of the input signals of a discrete process neuron is a kind of weighted summary, and the accumulation operation for the time process is a kind of time effect accumulation for the input time series, then the structure of a discrete process neuron is shown in Fig. 8.3.
y
Fig. 8.3 Discrete process neuron model
In Fig. 8.3, X,(t/),x2(t/), ... ,xn(t/) for 1=1,2,... are n discrete time input sequences of a discrete process neuron; w,(t/),W2(t/), ... ,wn(t/) for 1=1,2, ... are the corresponding connection weight sequences; "$" is the spatial aggregation operator for discrete input signals, "®" is a discrete time (process) accumulation operator, and ji-) is the activation function . The input-output relationship of a discrete process neuron can be expressed as
y = f ((W(t) $ X(t»)® K(-)-B),
(8.17)
where X(t) is the input matrix of a discrete process neuron; W(t) is the corresponding connection weight matrix; K(t) is the time accumulation kernel function of a discrete process neuron; Bis the activation threshold of the neuron. If "$" and "®" respectively adopt the spatial weighted summary and the accumulation of the temporal effect on discrete time input signals of the system, and the kernel function K(t)=l , then Eq. (8.17) can be rewritten as (8.18)
where I'J.t/=t,t/+ For the condition of finite time series (as in most practical applications), Eq. (8.18) can be denoted as
168
Process Neural Networks
(8.19)
where T is the length of the time series .
8.2.2 Discrete Process Neural Network Consider a multi-input-single-output system with only one discrete process neuron hidden layer , whose topological structure is shown in Fig. 8.4.
y
Fig. 8.4 Discrete process neural network In Fig. 8.4, "I" denotes the spatial weighted summary operator of the input time series, and "It" denotes the accumulation operator for the time effect of discrete input signals . The input layer has n nodes and the inputs are in discrete time sequence. The hidden layer of the discrete process neural networks has m nodes for completing spatio-temporal weighted aggregation and the accumulation of discrete input signals and the feature extraction and memory for discrete time signals. The output node is a time-invariant neuron to complete the system output. The input-output mapping relation ship of discrete process neural networks is
(8.20)
where wi/tD (/=1,2,...) is the connection weight from the input node i to the hidden layer j at time t/; Vj is the connection weight from the hidden node j to the output node ; Ojl) is the activation threshold of the hidden node j ;f is the activation function of the hidden neuron ; g is the activation function of the output node; () is the activation threshold of the output node. If the inputs of discrete process neural network s are finite time series and the length of the time series is T, then Eq. (8.20) can be reformulated as
Design and Construction of Process Neural Networks
169
8.2.3 Learning Algorithm For the training of the discrete process neural network denoted by Eq. (8.21), what needs to be determined are the connection weights of the network witl) (i=I,2, .. .,n; j=1,2, ... ,m; 1=1,2, .. .,T), V"V2"" ,V m and the activation thresholds. In the following, the training method of the network based on a gradient descent algorithm is presented. Given Klearning samples
where dk is the expected output. Suppose that the actual output corresponding to the kth learning sample input is Yh the error function of the network can be defined as
According to the gradient descent algorithm, the learning rules of the connection weights and activation thresholds of the network are = vj +~Vj' j =1,2, ....m,
(8.23)
wyCt/) = wyCt/) + j3!J..w;/t/), i = 1,2,...,n; j = 1,2,...,m; 1= 1,2,...,T,
(8.24)
Vj
()j(1) --
oA()(I) • -1 2 + ,..... j , } - , , ... ,m, () = () + I]!J..(),
()(I)
j
(8.25) (8.26)
where a, 13, rand I] are the learning rate constants. Denote
L L wyCt/ )x; (t/ )!J..t/ - ()?) =Ukj, n
T
(8.27)
;=1 /=1
(8.28) then (8.29)
170
ProcessNeural Networks
(8.30) (8.31) (8.32)
The learning algorithm is described as follows. Step I Give the error precision DO, the accumulative learning iteration times s=O, and the maximal learning iteration times M; Step 2 Initialize the connection weights and the activation thresholds Vj'
w~l), ()jl ), ()(i=1,2, ...n;j=I,2, ...,m; 1=1,2, ...,L);
Step 3 Calculate the error function E according to Eq. (8.22). If Ecs or sz-M, go to Step 5. Step 4 Modify the connection weights and the activation thresholds according to Eqs . (8.23)-(8.32); s+l-s; go to Step 3. Step 5 Output the learning result and stop.
8.2.4 Application Examples Example 8.2 Water-flooded layer identification in an oil reservoir The status of an oil reservoir as water flooded is an important as well as complex job in oilfield development, especially when an oilfield is in its late development stage. Based on log data, a water-flooded layer is mainly identified by morphological and amplitude characteristics and the combined relationships of various well-log curves that change with depth and can reflect the formation of geophysical properties. The output is a water-flooded level. Therefore, the key to automatically recognizing a water-flooded layer is to build an identification model that can actually reflect the corresponding relationship between the reservoir water-flooded status and the well-log response curves in the research region and extract the morphological characteristics of the well-log curves integrally and objectively. During actual measuring, a logging tool samples multi-parameter data in terms of 8 points per meter in the well, i.e. the sampled information is the actually discrete data changing with depth , and well-log curves are obtained by fitting discrete well-log data . Therefore, discrete proce ss neural networks have good adaptability for solving the problem of identifying reservoir water-flooded status based on well-log data . In actual data processing, the water flooding level is divided into strong water flooding, middle water flooding, weak water flooding, and no water flooding according to the degree of reservoir water flooding in an oil layer. Five variables are
Design and Construction of Process Neural Networks
171
chosen as the parameters of water-flooded layer identification, i.e. spontaneous potential SP, interval transit time AC, deep lateral resistivity RLW , shallow lateral resistivity RLLS , and reservoir effective thickness h. As different reservoirs have different thicknesses, when reservoir log data are inputted into the network, the input process interval may be in disunity, so the reservoir thickness should be normalized in advance. By a rarefying or denseness-making method, the thickness of the reservoir is normalized in the interval [0,1]. Choose 16 sample points in the interval of [0,1] from each reservoir, the change in the valid thickness of the reservoir is denoted by h. At the same time, because the dimension of each log variable is different and there is great difference between the log data of different log variables, the feature parameters should also be standardized. Suppose that Xijl is the lth original measuring value of the jth log parameter of the ith formation and the standardized data is X~I ,
X "I V
=
X "I 'J
-minx'i 'I IJ I,
•
max X "I - mm X "I i, l IJ i.l IJ
(8.33)
Discrete process neural networks described by Eq. (8.21) are used to identify water-flooded status according to discrete log data changing with depth. Via experimental comparative analysis, the topological structure of the network is 5-30-1, i.e. 5 input nodes, 30 discrete process neuron hidden nodes, and 1 time-invariant neuron output node. The learning sample set is made up of 10 typical reservoir samples apiece belonging to strong water flooding, middle water flooding, weak water flooding, and no water flooding. The test set consists of 15 water-flooded reservoir samples. The expected output of the network of no water flooding corresponds to 0.25, weak water flooding to 0.50, middle water flooding to 0.75, and strong water flooding to 1.0. The learning error precision is 0.05. The network is convergent after 5371 iterations. 15 samples are tested and 11 of them are well-judged. The correction rate is 73.3% and it obtained good results. The error function curve during the iteration is shown in Fig. 8.5. 1.0 I:
0
' (ij
'u 0
...c, ... ......0 ~
0.5
Iteration Fig. 8.5 Iteration error function curve
172
ProcessNeural Networks
Example 8.3 Approximation of discrete trigonometric function sample
°
In this example, discrete process neural networks are used to approximate 1 groups of discrete time function sample pairs. Suppose that the input interval of the discrete process is [0,1] and the input sample functions are {sin(2Iatt),cos(21tkt),k} for k=1 ,2,... ,1O. The sample functions are dispersed as {sin(2k1tti),cos(21tkti),k} where ti=i/128 for i=0,1,...,127. The network structure and parameters are chosen as follows: 2 input nodes, 15 hidden nodes and 1 output node. The error precision c=O.OOl; the learning rate constants a=0.5, ,8=0.80, /CO.65; the maximal learning times M=5000. A Walsh transform is implemented for discrete data and the transformed data are submitted to the network for training. The network is convergent after 283 iterations. The approximation error is 0.0009. The approximation result is shown in Table 8.1. Table 8.1 Approximation result of 10 groups of discrete trigonometric function samples Expected output
Actual output
Absolute error
0.1000 0.2000 0.3000 0.4000 0.5000 0.6000
0.0997 0.1996 0.3004 0.4008 0.5006 0.5995
0.0003 0.0004 0.0004 0.0008 0.0006 0.0005
0.7000
0.6991
0.0009
0.8000 0.9000 1.0000
0.8002 0.9003 0.9992
0.0002 0.0003 0.0008
The experimental results show that there is powerful approximation ability for a discrete function sequence when adopting discrete process neural networks and the training algorithm based on a discrete Walsh transform. It shows that process neural networks are very appropriate for modeling real-time discrete systems.
8.3 Cascade Process Neural Network [9] IIi practical problems, the inputs of some time-varying systems may be divided into several time phases, and each system in every phase may have its own special changing rules and characteristics. For instance, a crop planting cycle can be divided into sowing, seedling growth, blossoming and fructification, growing, ripening, etc. Although each growth phase is related to external environmental factors such as temperature, humidity, fertilizer, illumination, etc., the crop has its own growth rule in each phase, and the influence of various environmental factors on the crop in each
Design and Construction of Process Neural Networks
173
growth phase is different. At the same time, the growth state of the crop at each phase also affects the growth state at the next phase. With this problem in mind, a process neural network model with continuous phase inputs and discrete process outputs is considered. The model can be in a cascade structure; the input-output relationship of the system in different phases can be described by different models. As the complex degree of the input-output relationship in each time phase may be different, the structure of each process neural sub-network for realizing the mapping relationship can be the same or different. The input of each sub-network is the state output of the system in the previous phase and the time-varying input signal in the current time phase and the output is the aggregation and accumulation results of the system in the phase.
8.3.1 Network Structure Suppose that the system input process interval [O,T] can be divided into N time phases, and the time intervals of each phase are denoted as [0,Td,[hT2] , • • • ,[TN-J,TN ] where To=0<Ti
_______t
t
lf
_
t
Fig. 8.6 Cascade process neural networks
In Fig. 8.6, X(t)=(X)(t),x2(t), ... ,xn(t)) (tE [O,T]) is the time-varying input of the system; Y=(y(T)),y(T2) , •• • ,y(TN )) is the discrete output of the system at the time partition point.
174
Process Neural Networks
The process neural network P shown in Fig. 8.6 is in a cascade structure that consists of N cascading process neural sub-networks Pi. The topological structure of each Pk may be different. Its inputs are y(Tk- l ) and the process input signals are in the interval [Tk-I,Td. The input function of P, is denoted as X'(t) where X 1(t ) =(Xll(t),X~(t), ..., x~ (t)),
x ' (t) =(x~ (t) ,x: (t),x~ (r),...,x: (rl) , k =2,3, ..., N . Here
= Y(~ -I )'
x~ (t)
k
xj (t) =
t E [Tk - 1' ~), k =2,3,...,N;
{x;Ct) ,
0,
tE
[~_I'Tk);
else,
(8.34)
i =1,2,...,n; k =1,2,...,N.
Suppose that b l(t) ,b2(t), ... ,bdt) are a group of finite standard orthogonal basis functions satisfying the fitting precision requirement of the input functions. For convenience of discussion, suppose that each sub-network of the cascade process neural networks is a multi-input-single-output system with only one process neuron hidden layer. xt (t) and W~k) (r) are expressed in the expansion form of the basis functions L
xt (t) = La~k )bl(t), i = 1,2, ..., n; k = 1,2, ..., N ,
(8.35)
1=1 L
wjjk )(t) = LW~~ )bl(t), i=1,2, ...,n ;j=1,2, ...,mk; k=1,2, ...,N,
(8.36)
1=1
where aj~k ) is the coefficient corresponding to the basis function blt); wjjk )(t) is the connection weight function of the kth sub-network; w~~) is the connection weight between input node and hidden node corresponding to the basis function b/(t), and mk is the number of hidden nodes of Pi. The input-output mapping relationship of the sub-network P, based on function basis expansion can be expressed as follows. PI: (8.37)
Other N-l sub-networks
Design and Construction of Process Neural Networks
175
Eqs. (8.37)-(8.38) are settled to obtain (8.39) (8.40)
where v~ (j=1,2,.. . ,mk) is the connection weight between the hidden layer node and the output node of the sub-network Pk• Due to the orthogonality of the basis functions, Eqs. (8.27) and (8.28) can be respectively simplified as (8.41) (8.42)
8.3.2 Learning Algorithm Suppose that the input function of the system is X(t)=(X\(t),x2(t), .. . ,xn(t» (tE [0,71). First, X(t) is dealt with in terms of phases according to Eq. (8.34) so as to determine the input function xk(t) of each sub-network Pk (k=1,2,...,N). Then train each P, according to a given precision and an expected output respectively . The specific learning steps are as follows. Step 1 The sample functions Xpl(t),xp2(t), ...,xpn(t) are dealt with in terms of phases and determine the input functions X; (t) (p = 1,2,..., P) of each sub-network P, (k=1,2,... ,N) ; Step 2 Choose standard orthogonal basis functions blt) (l= 1,2,...,L) in input space. x;Jt) (i=1,2, ..., n; p=I,2, ...,P; k=I,2, ...,N) is expanded based on the basis functions; Step 3 Give the learning error precision of the network e; the accumulative learning times s=O and the maximal learning times M; Step 4 Initialize the connection weights of the network and the activation thresholds v~ , w~~ ) , By) for k=l; Step 5 Calculate the error function E (the square sum of deviation between the actual output and expected output) of Pk ; If sz-M, go to Step 8; if E
176
Process Neural Networks
or else go to Step 6; Step 6 Modify the connection weights and activation thresholds
v;, w~~)
,
By);
s+ 1~s ; Go to Step 5; Step 7 k+ I -sk; If le-N, go to Step 8, or else go to Step 5; Step 8
Output the learning result and stop.
8.3.3 Application Examples
Example 8.4 Simulation on tertiary oil recovery in oilfield exploitation Oil exploitation in a reservoir is a complex non-Newtonian fluid movement course and the description of underground fluid flow is an important basis of a scientifically formulated oilfield development scheme. When adopting traditional methods to solve the problem, we need to build and solve a complex hydromechanics equation, a non-Newtonian fluid percolation equation, and an injection withdrawal balance equation, but there are many parameters in each equation, the boundary conditions are complex, and all these are hard to determine and solve correctly in practical applications. The process neural networks used for describing a time-varying system do not require a model to be built in advance, the relations among various influential factors and variables are already retained in the network structure parameters and performance parameters. Therefore, a process neural network has good adaptability for solving these problems. Oil exploitation is mainly divided into three phases, i.e. natural state exploitation by primordial formation energy (primary oil recovery), water flooding exploitation (secondary oil recovery), and. polymer flooding exploitation (tertiary oil recovery) . Reservoir recovery efficiency (degree of reserve recovery) is an important index to measure the development level and the economic benefit of an oilfield. Different development modes have different final recovery efficiencies. A displacement production experiment under laboratory simulation conditions is a significant basis for formulating a development scheme for tertiary oil recovery. The experimental course is divided into three stages: water flooding under initial oil saturation, polymer flooding after water flooding invalidation and water flooding at the laststage. Because different displacement materials (water or polymer) are injected at three different exploitation stages, and the viscosity and the chemical properties of water and polymers are quite different, the vadose property of the underground fluid is quite different at the three stages, and should be described by different mathematical models. The injection-production system has two input variables: the injection volume and incremental injection pressure of displacement material, which are both functions varying with time. The system output is phase recovery efficiency. Therefore, good adaptability in simulating the tertiary recovery process is achieved by adopting cascade process neural networks. Eleven artificial core samples that are very similar in volume dimension, lithology, physical properties, and oil-bearing characteristics are chosen, and an experiment is carried out at different injection speeds and incremental injection
Design and Construction of Process Neural Networks
177
pre ssure s (i.e. different development mode s). The sample interva l is 30 minutes, and we obt ain 11 records in all for the whole experiment. The experimental data of one core sample are shown in Table 8.2. Table 8.2 Experimental records of a core sample Series number I 2 3 4 5 6 7 8 9 10 II 12
Injection pressure (PV)
Incremental injection pressure (MPa)
0.17 0.32 0.44
0.015 0.033
0.88 1.50 1.80 2.08 2.16 2.23 2.29
Phase recovery efficiency (%) Beginning of water flooding
0.055 0.120 0.075 0.035 0.0 18
37.96 (end of water flooding and beginning of polymer flooding)
0.061
2.35 2.38
0.097 0.153 0.205 0.233
13
2.45
0.170
14 15 16 17
2.52 2.58 2.64 2.78
0.115 0.045 0.035 0.035
58.52 (end of polymer flooding and beginning
of following water flooding)
61.26 (end of following water flooding)
The experimental result s of core samples 1-8 form the network training set and tho se of core samples 9-11 form the test set. The process neural network adopt s a 3-tier cascade structure, and 3 sub-networks are respectively denoted as PI. P2, P3 , who se structures are all single hidden-layer process neural networks with linear output shown in Fig. 4.2 in Chapter 4. The learning algorithm adopts the gradient descent algorithm. Three input process subintervals are [0,3.5], [3.5,6], [6,8.5]. The topological structure of PI is 2-15-1 and those of P2 and P3 are 3-20-1. A trigonometric function is selected as the orthogonal basis function and the number of basis functions of each sub-network is 50. Learning rate con stants are al=0.60, /31 =0.50, n=0.65; a2=0.55, 1h=0.50, ~=0.60; a3=0.50, f3J=0 .70, ~=0.55 . The maximum learning time is M=20,OOO, the preci sion 8=0.10 . Th e network converges after learning for 9375 time s. Predicted phase recovery effi ciency for the test sample set and the result is shown in Table 8.3. The predi ction result can meet the demands of practical applications.
178
Process NeuralNetworks
Table 8.3 The predictionresult of test samples Sample series Ph Experimental recovery Predictedrecovery number of test set ase efficiency (%) efficiency (%)
Absolute error
Relative error (%)
I
35.21
33.65
1.56
4.43
Sample9
2
Sample 10
3 1 2
53.76 55.Q7 37.27 56.19
0.13 1.36 2.17 1.44
0.24 2.47 5.82 2.56
58.11 37.54 59.01
2.12
Sample 11
3 1 2
53.89 56.43 35.1 54.75 60.23
2.07 1.2
3
60.97
35.47 60.21 62.15
3.65 5.51 2.03
1.18
1.93
8.4 Self-organizing Process Neural Network [10] The concept of self-organizing competitive neural networks was proposed by Kohonen for simulating a scenario where brain cells in different areas have different sensitivity to a stimulus signal in a certain aspect and the specific reflective ability of an individual cell for a specific signal can be formed from subsequent experience and training. This kind of process neural network is significant in pattern classification and optimal combination. Traditional self-organizing neural networks can be extended to self-organizing process neural networks that can directly deal with time -varying process signals and consequently can simulate the behavior of biological neural networks more factually. Self-organizing process neural networks are process neural network models with a two-layer structure and can extract meaningful features or internal law from a group of time-varying data by adopting a self-organizing competitive learning algorithm. They are suitable for process pattern classification, optimal combination, etc.
8.4.1 NetworkStructure Self-organizing process neural networks have a two-layer structure consisting of the input layer and the competitive layer composed of process neurons. All nodes in the input layer and the competitive layer connect fully with one another, and their input signals and the connection weights of the network may be time-dependent functions. The network adaptively extracts connotative pattern characteristics in input functions, carries out self-organization, and exhibits the action result in the competitive layer. Without loss of generality, suppose that the input space of the network is (C[O,71r where [0,71 is the signal input process interval. Suppose that the input
Design and Construction of Process Neural Networks
179
function of the system is X(t)=(Xl(t),x2(t), ... ,xn(t)), and the output is a static vector representing a pattern class. The topological structure of the network is shown in Fig. 8.7. In Fig. 8.7, wij(t) (i=1,2,.. . ,n; j=I,2,... ,m) is the connection weight function between the input node i and the competitive process neuron nodej; Yj (j=1,2,... ,m) is the output of the process neuron j. In practice , they represent a group of similarity degree. YI
Y2
v,
Ym
Competitive layer
Input layer
Fig. 8.7 Self-organizing process neural network
8.4.2 Learning Algorithm Suppose that the training sample set for the network is {X 1(t),X\t),... ,x\t)} where r(t)E (C[O,nt. All these samples belong to one of m given pattern classes according to some criteria. The output of the competitive layer node represents a pattern class and the weight function connected with the node includes basic characteristic information of the pattern class. (1) Competitive learning algorithm
(a) Competitive learning rule When the network is trained, the training samples X'(t),X\t) ,...,X\t) are imported at the input terminal according to determinate or random order . The total weighted input signal from various nodes in the input layer to the process neuron node j in the competitive layer is 5/1) =
L w;/t )x;(t ), j =1,2,oo.,m. n
(8.43)
; =1
Definition 8.1 Suppose that X(t ),Y(t )E (C[O,nt, the inner product of X(t),Y(t) in [O,n is defined as
180
Process Neural Networks
( X (t ),Y (t» =
r
(8.44)
X (t)(Y(t»T dt,
T
where (Y(tn denotes the transpose of Yet). Definition 8.2 Suppose that the function as
X(t)E (C[O,nr, the norm of X(t) is defined
IIX(t)11 = (X (t),X(t»)± = (r X(t)(X (t»
J
T dt
Y.
(8.45)
Definition 8.3
Suppose that the functions X(t),Y(t)E (C[O,nr, the similarity coefficient of X(t),Y(t) in [O,n is defined as (X (t) ,Y(t»
r=
IIX(t)IHY(t)II'
(8.46)
From Eq. (8.45), the similarity coefficient of the kth input sample vector X'(t) and the connection weight function vector W;(t) of the neuron node j in the competitive layer is
( Xk (t ),Wj (f »
(8.47)
IIX (t>ll-IIW/t)11 k
where W;(t)=(wlj(t),W2/t) ,... ,wn/ t» for j=l ,2,.. . ,m. Here, the node / with the maximal similarity coefficient wins in the competition, i.e. / satisfies
r'. = max {rf} . }
j e {l .2•...•ml
(8.48)
For the input sample vector X'(t), if the node / wins in the competition, then the weights are adjusted according to the following rule: when the network again encounters the input X'(t) or an input sample vector similar to X'(t), the winning probability of the node lis increased, i.e. wij(t) (i=I,2,.. .,n;j=I,2,...,m) is adjusted so (t) move toward the sample X'(t) by algorithm as to make the weight function W. } adjusting, and finally make the output of the wining neuron / represent the pattern class that X'(t) represents. (b) Function orthogonal basis expansion The computation and the training of self-organizing process neural networks includes the accumulative operation (for instance integral operation) of process neurons over
Design and Construction of Process Neural Networks
181
time, so the learning algorithm based on orthogonal function basis expansion can be adopted. Suppose that b 1(t),b2(t), ... ,M t), are a group of standard orthogonal basis functions in C[O,n, X(t)=(Xl(t),xz(t), ,xn(t» is the function in an input space. Under the given fitting precision, Xi(t) is expressed in the finite expans ion form of basis functions, i.e. L
(8.49)
xi(t) = LaA(t), i = 1,2 ,....n . 1=1
In addition, the weight function Wi/t) is also expanded by b 1(t),b2(t), ... ,bL(t) L
Wi/t) = L wijl)bl(t), ; = 1,2, ...,n; j = 1,2, ....m,
(8.50)
i= 1
where wiji) is the connection weight between the input layer and the competitive layer of wij(t) corresponding to bi(t) . Substituting Eqs. (8.49) and (8.50) into Eq. (8.47), according to the orthogonality of basis functions, we have
(8.51)
(c) Description of the algorithm Step 1 Generate randomly an initial value of wij) (j=1,2,.. . ,m; ;=1,2,.. . ,n; /=1,2,. .. ,L ) in the interval [0,1];
Step 2
The input training sample
x' (z) = (x: (r), x~ (z), ..., x: (t))
is expanded
by the orthogonal basis functions b[(t),bit),... ,bL(t) according to Eq. (8.49) L
k
Xi (t) = Lai~bl (t) ;
Step 3
Calculate
rf
1=0
according to Eq. (8.51);
Step 4 Determine the winning process neuron/ according to Eq. (8.48) ; Step 5 The connection weights linked with the winning process neuron / are modified as follows while other weight functions remain unchanged A (i) ( k (I» ,J-J,l-, . - " . . - I 2,...,n,. l -, - 1 2,..."L ilW -1]aj/-w ij ij
where 1] is the learning rate constant.
(8.52)
182
Process Neural Networks
When the maximum i1wjj) is small enough (less than a preset small value) , the training ends, or else let W ij(I ) --
w ij(I )
+ UAW(lij ) '
J' -- 12m' , , . .. , ,
I' --
1, 2, •••, n', I - , I 2 , •••, L',
(8.53)
Step 6 Choose another training sample , return to Step 2 and continue with the modified connection weight. After the network training finishes, for any pattern sample X(t) waiting for (X (t) , (t» identification, calculate r. = J for j=I,2,... ,m . If / satisfies
w.
J
r .• J
=
IIX(t)II·IIW/t)II
max {r .}, then the pattern class which the process neuron / stands for is just
j e{l ,2,...,m}
J
the pattern class of the sample X(t) . It is obvious that the above algorithm is used to cluster input samples. A group of samples are clustered into several sub-classes through the above algorithm. (2) Supervised learning algorithm For every learning sample )(t), whose pattern class is known, we can adopt a supervised learning algorithm. After the input pattern has been inputted into the network, rjk is calculated by Eq. (8.51) and the winning neuron / is selected
r
according to Eq. (8.48). If the winning process neuron is the proper classification of )(t), then the connection weight of the corresponding proces s neuron is adjusted in the direction to )(t). Its modification formula is A (I) - , 1 2, ..., L . uWij - T/ ( ailk - wij(I) ) , J• -- J" ,. 1• -- 1, 2, .. . , n,. I -
(8.54)
When/ is an improper classification of )(t), its modification formula is A (I) --T/ail-W (k (I» ,J-J, . - " . 1• - I, 2 , ... ,n,. uW jj ij
I- , I 2, ..., L .
(8.55)
As seen from the above algorithm, it is a classification algorithm with teacher demonstration.
8.4.3 Application Examples Example 8.5 Sedimentary microfacies recognition in oil geology research Sedimentary microfacies recognition is an important as well as complex basic job in oil geology research. Subzone sedimentary microfacies are identified by the morphological and amplitude characteristics of a group of continuous well-log curves that change with depth and can reflect stratigraphic and geophysical properties.
183
Design and Construction of Process NeuralNetworks
Traditional methods are completed manually under the guidance of a law of facies model and facies sequence. Now, some scientists have already solved this problem using a neural network method. However, the existing methods need to convert well-log curves into a group of discrete vectors capable of embodying morphologically changing characteristics of continuous curves. In fact, well-log curves can be considered as a continuous process signal changing with depth. When process neural networks are used for sedimentary microfacies recognition, the networks can directly obtain morphological and amplitude changing characteristics of subzone continuous well-log curves. In oil geology research, sedimentary microfacies recognition mainly relies on four variables, i.e. three well-log curves (spontaneous potential, electric resistivity, micronormal) and subzone thickness. Microfacies types are divided into channel sand, abandoned channel sand, interchannel lamellate sand, and interchannel mud. Typical sample well-log curves of sedimentary microfacies are shown in Fig. 8.8. According to data from the core well and expert interpretation, 23 subzone microfacies samples are chosen from four classes in experiment to form the training set which includes typical samples of four different microfacies types. After the sample set is classified via self-organizing process neural networks, the class of the typical sample stands for the grouping class. If a divisiory class includes two or more typical samples, the classification fails.
H,.
a
~ M\ o~oLLo~o
L
'" :> ;:l
2 '-' 40
40 2.'
2)
~8..
0 2 4
40 40 20L20
0 2 4
6
0 2
6
4
6
0
0 2 4
6
' ~ ~ 30 ~ 30~ 30L30L ~ '-' 20
20
20
20
0
10
0
10
10
'{j .q ~ .::
@ ....
~
10
20
024
6
1
024
6
1
20
0 024
6
00 2 4
h (m) (a)
6
00
024
6
20L20L
l~lO~IO~I~
~
0
2 4
h (m) (b)
6
0 2 4
I~
6
0 2 4
h (m)
h (m)
(c)
(d)
6
Fig. 8.8 Typical samples of sedimentary microfacies (a) Channel sand (b) Abandoned channel sand (c) Interchannellamellate sand (d) Interstream mud
During the processing of real data, as different subzones have different thicknesses and the input process interval is not unified, the subzone thickness should be normalized to unit variance in advance. The maximum thickness of all subzones is
184
Process Neural Networks
rounded off and 1 is added so as to construct a unified process interval. As baseline value 0.2 is taken for the value of the part short of the layer with smallest thickness . In this way, the subzone thickness variable has been included in other input functions, so the subzone thickness parameter can be cut down. For a competitive learning algorithm without teacher demonstration, the structure parameters of the network are chosen as follows: the input layer has 3 nodes and the competitive layer has 4 nodes; the orthogonal basis function is selected as a trigonometric function; the number of basis functions is 50; the maximal cycle times are N=5000. The classification result and the network weight coefficients do not change any more after the network learns for 375 times. For 23 samples, the classification results of 20 samples are correct; the separable rate is 88.95%. For the situation with teacher demonstrat ion, 16 subzone samples belonging to 4 classes are chosen to form a learning sample set and the test set has 9 samples. The network is convergent after 563 iterations. The samples of the test set are recognized and all are well-judged.
8.5 Counter Propagation Process Neural Network [11] Counter propagation neural networks are a kind of three-layer heterogeneous feedforward neural network model proposed by Robert Hecht-Nielson in 1987. The model consists of the input layer, the competitive layer, and the output layer in which the competitive layer and the output layer respectively carry out a self-organizing mapping algorithm under the Grossberg rule to learn. Compared with homogeneous networks, the heterogeneity of counter propagation neural networks makes them closer to the information processing mechanism of the biological cerebral nervous system. They have significant applications in pattern recognition, pattern completeness, signal enhancement , etc., and show high learning efficiency and adaptive capability . Counter propagation process neural networks can be constructed by extending traditional counter propagation neural networks in the time domain. The competitive layer of counter propagation process neural networks consists of process neurons. The system inputs and the connection weights between the input layer nodes and the competitive layer nodes may be time-varying functions ; the output layer consists of common time-invariant neurons; the connection weights between the competitive layer and the output layer are time-invariant adjustable parameters . The competitive layer performs a generalized self-organizing mapping algorithm (a self-organizing mapping algorithm aiming at a time-varying function) which includes a comparative mechanism. The network in this layer implements adaptive classification for process input signals and simultaneously the connection weight functions complete the extraction of pattern classification information included in time-varying input signals. The output layer performs the Grossberg learning rule, implements classification representation and gives the expected outputs according to system requirements.
Design and Construction of Process Neural Networks
185
Counter propagation process neural networks have better mechanical adaptability to practical problems, such as pattern classification of time-varying signals, continuous system signal processing, aircraft engine rotor simulated fault diagnosis [IZI, etc.
8.5.1 Network Structure Counter propagation process neural networks are a feedforward network model with a three-layer structure. The model consists of the input layer, the competitive layer and the output layer. Neuron nodes in various adjacent layers connect fully to each other. Suppose that the input layer has n nodes to complete the input of n time-varying functions to the network; the competitive layer has H nodes composed of process neurons and this layer carries out a generalized self-organizing mapping algorithm to complete adaptive competitive classification for input patterns. The output layer is made up of m common time-invariant neuron nodes, carries out the Grossberg learning rule and gives the expected outputs according to the system requirement s. If the spatial aggregation operation adopts weighted summary and the temporal accumulation operation adopts an integral, the topological structure of the network is shown in Fig. 8.9.
Fig. 8.9 Counter propagation process neural network model
In Fig. 8.9, Xl(t),xZ(t),... ,xn(t) (tE [0,11) are the input functions of the network, wij(t) (i=I,2,... ,n; j =1,2,.. .,H) is the connection weight function from the input layer node i to the competitive layer node j ; Vjk(t) (j=l ,2,.. .,H; k=l ,2,.. .,m) is the connection weight between the competitive layer and the output layer; Yk (k=1,2,... ,m) is the output of the network; [0,11 is the input process interval; f is the neuron activation function of a process neuron.
8.5.2 Learning Algorithm The learning course of counter propagation process neural networks involves two algorithms . The self-organizing mapping algorithm is used between the input layer
186
Process Neural Networks
and the competitive layer so as to complete the training of wij(t) and the adaptive pattern classification for input functions. The Grossberg learning rule is used between the competitive layer and the output layer to adjust the time-invariant connection parameter Vjk and give the system outputs according to requirements. The algorithm is simply described as follows: first, determine the winning process neuron / in the competitive layer according to a self-organizing competitive mapping algorithm described in Section 8.4; second, adjust the connection weight functions from various nodes in the input layer to the node/ according to Eqs. (8.54) and (8.55) while other weight functions remain unchanged; then compute the outputs of the network, compare them with the expected outputs, and adjust the connection weights between the competitive layer and the output layer according to the Grossberg learning rule. The modification formula is (8.56) where
yj1)
is the output of process neuron j in competitive layer (a similarity
degree) ; Yk is the actual output of the output layer node k and dk is the expected output. Repeat the above steps until the error precision requirement is satisfied so as to complete the network training.
8.5.3 Determination of the Number of Pattern Classifications In the structure design of counter propagation process neural networks, it is important to determine the number of nodes in the competitive layer. The outputs of nodes in the competitive layer represent the pattern class of input function samples, so whether the number of nodes in the competitive layer is chosen properly will directly affect the execution efficiency of the network and correctness in solving practical problems . If the number of the real pattern classification of samples is known in advance, then the number of the nodes in the competitive layer can be given directly; if it is unknown , the number can be determined by the following dynamic clustering method. Suppose that the real problem domain includes K function samples: {X'(t), X2(t),... ,XK(t); X'(t)E (C[O,nr}, and these K samples have already included all the modes in the real problems. Three clustering parameters are set: the number of initial classification Hi; the similarity coefficient threshold (suppose that the bigger the similarity coefficient is, the more similar they are), and the between-cl ass distance threshold R. The reciprocal of similarity coefficient defined by Eq. (8.47) is selected as the distance between two input function samples, and the minimal value of distances between each two input function samples in the respective two classes is taken as the between-class distance. Steps for dynamic classification are as follows. Step I In the input function sample set, choose Ho (Ho-::::'K) samples as the
e
Design and Construction of Process Neural Networks
187
delegates of Ho pattern classes and construct Ho classes. Step 2 Compute the similarity coefficients between the rest function samples in the input function sample set and every existing pattern class delegate in tum. If their this function sample can be used as a maximal similarity coefficient is less than member to form a new class and as the delegate of the new pattern class, Ho+ l--+Ho. If their maximal similarity coefficient is bigger than then this function is attributed to the class with maximal similarity coefficient, and the mean value of this function sample and the delegate function sample of the original class is used as the delegate of the new class obtained after merging. Step 3 Compute the between-class distance between each two Ho classes. If the between-class distance between two classes is less than R, two classes are merged and the mean value of class delegate function samples in two classes is used as the delegate of the new class; if the between-class distance is bigger than R, then two input function sample classes remain unchanged . Step 4 After Step 3 above is carried out, the number of classifications may change. Replace Ho with the number of new classifications. If the classification result (including the number of classifications and the specific classification of function samples) changes, go to Step 3; if the classification result does not change any more, the classification finishes. Now the number of classifications Ho can be used as the reference number of nodes in the competitive layer of counter propagation process neural networks.
e,
e,
8.5.4 Application Examples Example 8.6 Water-flooding status identification of oil layer in oil exploitation The problem of water-flooding status identification of an oil layer has been described in Section 8.2 in this chapter. Here, we adopt counter propagation process neural networks to recognize a well-log water-flooded layer. Using many data of water-flooded oil layer analysis from core wells, 80 representative water-flooded oil layer samples are chosen to constitute the training set and 40 oil layer samples are chosen to constitute the test set. The water-flooding level of the oil layer is divided into strong water flooding, middle water flooding, weak water flooding, and no water flooding. The topological structure of counter propagation process neural networks is determined as 5-8-1. Eighty training samples (which are not classified in advance) are substituted into the network for training. The learning precision is 0.05, and the maximal learning times is 5000. In experiment, the network is convergent after 1319 iterations . The well-trained network is used to identify samples in the training sample set, 74 of them are well-judged , and the correction rate is 92.5%. Identify 40 samples in the test set, 31 of them are well-judged, and the correction recognition rate is 77.5%. This is a better result compared with the correction rate of 67% using the current automatic water-flooded layer identification method.
188
ProcessNeuralNetworks
8.6 Radial-Basis Function Process Neural Network In 1985, Powell proposed a radial-basis function (RBF) method of multivariate interpolation. In 1988, Broomhead and Lowe first applied the RBF in the design of neural networks and accordingly RBF neural networks were built. This network is a three-layer feedforward neural network model that realizes nonlinear mapping by changing parameters in the nonlinear transform functions of neurons and improves the learning rate by the linearization of connection weight adjustment. In recent years, it has been broadly applied to application problems, such as pattern recognition, association rule mining, signal processing, etc. Traditional radial-basis function neural networks are extended in the time domain and a radial-basis function process neural network model [13J can be constructed. The model has a three-layer feedforward structure: the first layer is the input layer composed of signal source (process functions related to time) nodes; the middle hidden layer unit is the radial-basis process neuron, the transformation function is the radial-basis kernel function, and the center of the kernel function can also be a time-dependent function; the third layer is the output layer to respond to the network input pattern .
8.6.1 Radial-Basis Process Neuron A radial-basis process neuron is composed of spatio-temporal 2-dimensional aggregation and radial-basis kernel function transformation, etc., and its structure is shown in Fig. 8.10.
y
Fig. 8.10 Radial-basis process neuron
In Fig. 8.10, Xl(t),X2(t), ... ,xn(t) are the input functions of the radial-basis process neuron in time interval [0,11; "f" is a temporal accumulation operator (for example, integral operation with respect to time); K( ·) is the kernel function of the radial-basis process neuron. The input-output relationship of the radial-basis process neuron is (8.57)
or
Design and Construction of Process Neural Networks
OJ
= JK(IIX(t)-x'(t)ll)dt ,
189
(8 .58)
where X(t)=(x] (t),x2(t) ,... ,xn(t» is the input of the network; X(t)= (x; (r), x; (t) ,...,x~ (t» is the kernel center function of the radial-basis process neuron;
11 ·11 is a norm;
OJ
is the output of the radial-basis process neuron.
8.6.2 Network Structure Suppose that the input layer of the radial-basis process neural network has n nodes for completing the input of the network of time-varying functions; the middle radial-basis process neuron hidden layer has m nodes, the transformation function of each unit is the radial-basis kernel function; the network output is the linear weighted summary of the output signals of hidden layer nodes . The topological structure of the network is shown in Fig . 8.11. In Fig. 8.11, Wj (j=l ,2,.. . ,m) is the weight coefficient of the output layer, and is an adjustable parameter of the network. Suppose that X(t)=(Xl(t),X2(t),... ,xn(t» is the input function of the network, x j(t) is the kernel center function of the jth radial-basis process neuron where tE [0,11 ; "f" is the integral in [0,11. then the input-output relationship of radial -basis process neural networks is F(X(t» =
f
WjK(
rIIX(t)-Xj(t)I~t),
r
j(t)!l)dt.
J=l
or F(X(t» = fW j j =l
K(IIX(t)-X
(8.59)
(8.60)
y
xn(t ) -~i<----~ Input layer
Hidden layer
Output layer
Fig . 8.11 RBF process neural network
190
Process Neural Networks
8.6.3 Learning Algorithm Suppose that the input functions and the kernel center functions of RBF process neural networks belong to (C[O,nt. The network training mainly includes the adjustment of property parameters in the radial-basis kernel function K(·), the determination of the radial-basis kernel center function )f(t), and the iteration and modification of weight coefficients in the output layer. The network is trained to satisfy the input-output mapping relationship of training samples by way of teacher demonstration. For X(t)E (C[O,nt, define
IIX (r) - X (t)11 = (( X(t) - X (t») (X (t) - X (t»)T j
j
j
r. J
(8.61)
In the following we will first consider the training of RBF process neural networks denoted by Eq. (8.59). Suppose that b\(t),bz(t),... ,bL(t) are a group of standard orthogonal basis functions satisfying the fitting precision requirement of input functions and kernel center functions, X(t)=(Xl(t),xZ(t) ,oo .,xn(t» is a random function in input space. Suppose that Xj(t) can be expressed as L
x;(t) = LaA(t),
(8.62)
1=1
and suppose that the kernel center function of the jth neuron in the hidden layer can be denoted in the form of basis expansion
(8.63) According to Eqs. (8.61 )-(8.63) and the standard orthogonality of basis functions, we have
(8.64)
From Eq. (8.57), the output of the jth neuron in the hidden layer is
Design and Construction of Process Neural Networks
191
Give K learning samples (Xkl(t).xdt), ... .xkn(t),dk) for k=I,2, ...,K, where d, is the expected output of the kth sample in process interval xn(t). Suppose that the actual output of the kth sample is Yh the network error function is defined as K
(8.66)
E = L(Yk -dk )2. k= l
If the radial-basis kernel function is chosen as a Gauss function , that is K(v) = expr-v' /(2a 2
»,
(8.67)
where a is called the mean deviation of m kernel center functions which can be determined by network training of the learning sample set or by the following formulas
d=
~(t.r[
X i(t)-
J
~t.Xi(t)rdt
(8.68)
I
(8.69)
The training of RBF process neural networks can learn from the traditional gradient descent algorithm, and the learning iteration formulas of expansion coefficients in kernel center functions and weight coefficients in the output layer are respectively dE(s) , 1· - 1, 2, ... , n,. j ·-12 ailj ( s+ 1)-ailj() s -a--.- , ,...,m,. [-12 - , , ..., L , da~ dE(s)
.
w/s+l) = W j(s)-TJ--, j = 1,2, ....m,
dW j
(8.70) (8.71)
where s is the learning iteration time; a and TJ are the network learning rate constants. The specific learning algorithm is described as follows . Step 1 Give the learning precision e, and set accumulative learning iteration time s=O and the maximum learning iteration time N. Choose basis functions in input space blt) for /=1,2,.. . ,L; Step 2 According to Eqs. (8.62)-(8.63), expand the input function X(t) and the kernel center function xj(t) based on basis functions bt(t) for /=1,2, ,L; Step 3 Initialize the network parameters Wj, a~ (i =1,2, , n; j =1,2, ..., m; /=1 ,2,...,L) ;
Step 4
Calculate o according to Eqs. (8.68)-(8.69);
192
Process NeuralNetworks
Step 5
Calculate II X (z)- X j (t) II according to Eq. (8.61); calculate the output
OJ of the j th neuron in the hidden layer; then calculate Yk;
Step 6 Calculate the error function E according to Eq. (8.66); if E<E: or si-N, go to Step 8; Step 7 Modify expan sion coefficients and weight coefficients according to Eqs, (8.70)-(8.71); s+ l-+s; go to Step 4; Step 8 Output the learning result and stop. For training of RBF process neural networks denoted by Eq. (8.60), because the radial-basis process neuron does kernel function transformation first and then integrates on the input process interval , we cannot use the orthogonality of basis functions during the training . The following method can be adopted . Suppo se that Xi(t) and .J(t) can be denoted in the finite expansion form of basis functions bJCt) for 1=1,2, .. .,L. When the network parameters are initialized, every input function, kernel center function and kernel function K(·) of the radial-basi s process neuron are determined in each iteration after calculation according to Eqs . (8.68)-(8.71), so
r
K(IIX(t)-X j(t)II~t
'can be figured out by adopting methods
such as integral operation or numerical computation. Thus, the gradient descent algorithm can be adopted to complete the training of the network parameters wj,a~ (i=1,2, ...,n; j = 1,2,...,m; 1=1,2, ...,L ) .
8.6.4 Application Examples Example 8.7 Rotating machine failure diagnosis For the problem of rotating machine failure diagnosis discussed in Example 6.2 in Chapter 6, we can also adopt RBF process neural networks denoted by Eq. (8.59) as a failure type identifier. The structure of RBF process neural networks is chosen as 1-4-1, the kernel center functions of process neurons are chosen as the typical curves (determinate functions) of four running states for the rotating machine , the orthogonal basis function is chosen as a trigonometric function system , the number of basis functions is 50, and the radial-basis kernel function is chosen as a Gauss function. As stated above, the kernel center functions are determinate in this problem. Moreover, after the input sample functions and the kernel center functions are determined, (J can be figured out from Eqs. (8.68)-(8.69), so in the training of RBF process neural networks , we only need to determine the connection weight coefficients Wj for j= 1,2,.. .,m between the hidden layer and the output layer. According to the above algorithm, the network error function is defined by Eq. (8.66) , and weights are modified iteratively according to Eq. (8.71) with error precision 8=0.05, and learning rate constant 1]=0.45. The network is convergent after 371 iterations. Finally , 8 test samples are recognized and are all well-judged.
Design and Construction of Process Neural Networks
193
8.7 Epilogue In this chapter, with background information about the applications of process neural networks, many kinds of process neural network models are constructed, and the corresponding learning algorithms and application examples are provided. They include proce ss neural networks with double hidden layers , discrete process neural networks, cascade process neural network s, self-organizing process neural networks, counter propagation process neural networks , RBF process neural networks, etc. In practice, because the spatial aggregation operator and the temporal accumulation operator can be chosen in various forms together with the flexibility of topological structure of process neural network s, more network models can be designed to meet real demands. In the practical applications of proces s neural networks, the key lies in a con structed network model capable of correctly describing the information flow direction and the mapping relationship among various variables of the actual systems . At the same time, the learning algorithm should have high computational efficiency, stability, and convergence. These are all important factors affect ing the practical application of process neural network s. In particular, current research on computation efficiency, stability, and convergence of the process neural network learning algorithm is quite insufficient. More researches considering extent and depth are needed . In addition , effective learning algorithm s starting directly from functional computation are worth studying.
References [I] Ding G., Zhong S.S. (2008) Time series prediction using wavelet process neural network. Chinese Physics B 17(6):1998-2003 [2] Zhong S.S., Ding G. (2007) Time series prediction based on Elman process neural network and its application. Journal of Information and Computational Science 4(1):405-411 [3] Ding G., Zhong S.S. (2007) Approximation capability analysis of parallel process neural network with application to aircraft engine health condition monitoring. Lecture Notes in Computer Science 4493(3) :66-72 [4] Zhong S.S., Li Y., Ding G., Lin L. (2007) Continuous wavelet process neural network and its application. Neural Network World 17(5):483-495 [5] Xu Z.F., Wang H.W., Wu G.S. (2007) Converse solution of oil recovery ratio based on process neural network and quantum genetic algorithm. Journal of China University ofPetroleum 31(6):120-126 (in Chine se) [6] Song GJ. , Yang D.Q., WU L., Wang TJ., Tang S.W. (2006) A mixed process neural network and its application to churn prediction in mobile communications. In: Sixth IEEE International Confe rence on Data Mining pp.798-802 [7] Xu S.H., He X.G., Shang F.R. (2004) Research and application of process neural
194
Process Neural Networks
network with two hidden-layer based on expansion of basis function. Control and Decision 19(1):36-48 (in Chinese) [8] Li Y., Zhong S.S. (2006) Failure detection of aero-engine based on process neural network with double hidden-layers. Journal ofPropulsion Technology 27(6):559-562 [9] Xu S.H., He X.G. (2004) Research and applications in cascade process neural networks. PR&AI 17(2):207-211 (in Chinese) [10] Xu S.H., He X.G. (2003) Research and applications of self-organization process neural networks. Journal of Computer Research and Development 40(11):1612-1615 (in Chinese) [11] Xu S.H., He X.G. (2003) A counter propagation process neural networks and its applications. Proceedings of the 13th China Neural Network s Council (CNNC'2003) , pp.241-244 (in Chinese) [12] Ding G., Bian X., Hou L.G., Zhong S.S. (2008) Aircraft engine rotor simulated fault diagnosis using counter propagation process neural network. Aviation precision manufacturing 44(5):17-20 (in Chinese) [13] Xu S.H., He X.G. (2004) Research and applications of radial basis process neural networks. Journal of Beijing University of Aeronautics and Astronautics 30(1) :14-17 (in Chinese)
9 Application of Process Neural Networks
Process neural networks have broad applications in practical problems relating to time-varying processes. Some examples have already been given in previous chapters when introducing some specific process neural networks, such as time series prediction (1-31, soft sensing in sewage disposal systems [4], simulation and forecasting in the process of reservoir exploitation , sedimentary facies and reservoir water-flooding identification [5,6), rotating machinery fault diagnosis [7,8 J, etc, In this chapter, we will describe more applications of process neural networks in various fields, for example, in process modeling, nonlinear system identification, process control, classification and clustering, process optimization, forecasting and prediction, evaluation and decision-making, macroscopic control, etc. Besides, we will discuss the possible application of process neural networks in more fields.
9.1 Application in Process Modeling Acrylarnide homogeneous polymerization is an important chemical reaction in the chemical industry. Temperature is an important factor that influences the concentration of the resultant and reaction time. In order to obtain the functional relationship between temperature changes and the concentration of the resultant, the traditional methods adopt differential dynamics and thermodynamic equations. However, it is difficult to find out the actual relationship because of the complexity of the physical and chemical models. The above problem can be precisely solved by process neural networks [9 J, and the following is about how to solve the problem according to the experimental data of acrylamide homogeneous polymerization,
Example 9.1 Modeling of acrylamide homogeneous polymerization process Some experimental results of acrylamide homogeneous polymerization are listed in Table 9.1. In the table, t denotes the cumulative time in minutes starting from O. T; U=I,2, ... ,9) are the testing results at the temperature (0C) in the ith group experiment;
196
Process NeuralNetworks
Nm is the molecule number of the resultant (x10 6 ern") produced in each group of experiment and is known as the concentration of the acrylarnide homogeneous polymerization resultant.
Table 9.1 The experimental results of nine groupsof polymerization chemical reaction t (min)
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170
Nm
Experimental temperature (0C)
r.
T1
17 19 23 27 30 34 39 45 52 64 73 81 82 84 88 92 101 113 1811
18 16 16 17 18 19 20 21 22.5 24 26 28 32 36 45 54 64
68 1838
T3
T4
t,
T6
T7
t;
T9
18 21 23 26 31 36 43.5 52.5 59 68 73
18 18 19.5 21.5 24.5 29.5 34 38 46 56 63 74 78 93 95 97 100 110 1853
14 15 17 20 22 25 28 33 38 46 52 60 66
15 16 16 16.5 18 20 22 24 29 32 36 40 44.5 50.5 57.5 66.5 67 69 1701
15 16 18 21 25 30 36 40 54
14 15 16 17.5 19.5 22 24 26.5 30.5 35.5 43 54.5 67 79.5 83 87 93 98 1915
13 15 16 18 19 21 23 25 27 30 34 41
77
79 83 87 96 98 102 1894
72
76 81 83 94 1496
64
73 76 77
78 79 84 86 89 1754
44
49 54 65 77
80 1644
In Fig. 9.1, we give nine curves that describe the change of acrylamide homogeneous polymerization experimental temperature (0C) varying with time (min) . Each curve corresponds to the molecule number of a group of experimental resultants. For a process neural network, each curve corresponds to an input sequence relating to time , and the molecule number of the resultant is the output of the process neural network.
120
G
~
...u :::... u
80
::l
c.
E ~
40
0
0
40
80 Time (min)
120
160
Fig. 9.1 Curves of temperature-time
Applicationof ProcessNeural Networks
197
In this example, a process neural network with a single hidden layer is adopted in modeling, and the training of the network adopts the learning algorithm based on gradient descent. The parameters of the network are selected as follows: one input node; the basis function for weight function expansion is the Walsh function; the number of basis functions is 32; 22 hidden layer nodes; one output node; the learning rate constant is 0.5; the maximum learning times is 5000; the learning error precision is 0.001. In order to test the generalization of a process neural network after learning, we use 30 different learning results of PNN to test 5 non-training samples respectively , and compare them with actual values. One of the testing results is shown in Table 9.2. Table 9.2 The identifying resultsof test samples Molecule number predicted (x106 cm")
Actualmolecule number(x106 cm")
Average absolute error (x106 cm" )
Average relative error (%)
1360
1561
201
12.9
2
1963
1574
289
19.4
3
1394
1653
259
15.7
4
1476
1542
66
4.3
5
1685
1922
137
7.5
Test sample
Fig. 9.2 gives the error variance curve for the training of a process neural network with 38 samples, and PNN reaches the given error learning precision (0.001) after iterating 573 times. In order to test the generalization capability of a process neural network after learning , we test the network with five non-training samples, and compare the results with actual values. The test results are shown in Table 9.2.
3
...o ...... ~
2
o
o
200
400
600
Iterations Fig. 9.2 Learning error curve
198
Process Neural Networks
9.2 Application in Nonlinear System Identification Nonlinear system identification and process control is an important research task in automatic control. The main object of automatic control research is a dynamic system, which requires designing a controller using acquired information in the given controlled object, makes the control system satisfy certain properties such as stability, precision, robustness, real-time, etc. and is simple and easy to realize. The traditional control method is mainly based on the control of a precise mathematical model when solving the actual system control problem, i.e. first a precise mathematical model of the controlled object is constructed, and then the perfect controller is designed according to the analysis of the mathematical model. At present, the theory and the method of linear system control have been perfectly solved, and results for solving the control problems of complex nonlinear systems have been obtained. However, because of the complexity and time-varying properties, it is difficult to construct a precise mathematical description of the actual system, especially for those processes with uncertain properties. Therefore, some theories and techniques still need solving. For example, general analysis and design methods for nonlinear system are unavailable, and many controlled objects cannot be mathematically modeled sufficiently precisely by mechanism analysis or system identification methods, so it is difficult to satisfy real-time requirements, etc. The nonlinear transformation characteristics and the great parallel computing capability of neural networks provide an effective method for nonlinear system identification and control. At present, many researchers have applied artificial neural network techniques to control research in complex nonlinear systems, for example, Zhang et al. applied a delayed dynamic neural network to on-line identification [101, Rubio Avila, et at. applied a feedforward neural network to nonlinear system identification [ Il l , Wang et at. applied a Hopfield neural network to discrete-time nonlinear system identification [121, Gao and Wang applied an Elman neural network to identification of a nonlinear dynamic system [13], etc. When solving the problems of system procedure input and time sequence dependence, these models usually realize input/output time delay by using external time-delay, i.e. constructing a time-discretization cyclic network. However, this makes the system structure complicated and brings many hidden problems, which are difficult to foresee, to the structure of learning algorithm for the network, the convergence, the stability of the algorithm, etc. Process neural networks have a nonlinear mapping capability for the relationship between the inputs and the outputs of a time-varying system, i.e. have adaptive learning capability and nonlinear modeling capability in the time-varying environment. Therefore, they have better adaptability when applied to nonlinear system identification and process control. The identification of a nonlinear system requires a model that is equal to the identified system in terms of the time-varying input/output data of a nonlinear dynamic system, that makes the actual system and the identification model have the same output for the same initial condition and specific input with definite identified
Application of Process Neural Networks
199
precision, so as to be able to forecast and predict actual system parameters . The system identification method based on a process neural network solves the identification of a nonlinear system according to the field knowledge of actual problems and existing system testing data by using the process neural network as the system identifier.
9.2.1 Principle of Nonlinear System Identification Suppose P is a nonlinear system needing to be identified, U, Y are the input and the output function space of the system separately, then we can take P as a mapping operator from input space to output space, and define the system in the form of inputs/outputs pairs {U,Y} . A typical system identification model is shown in Fig. 9.3.
Fig. 9.3 System identification model
In Fig. 9.3, P denotes the actual nonlinear system which needs identifying ; P is the identification model which needs identifying (called the identifying matching of P); u(t) is the input function of the actual system; yet) is the output function of the actual system which does not contain noise; z(t) is the output function of the actual system which contains noise, yet) is the output function of the identification model,
vet) is the noise which affects the output terminal of the actual system. The identification algorithm mainly includes the mechanism and algorithm for adjusting the structure parameters of the identification model and property parameters . It can adjust the parameters of the identification model in an adaptive mode according to the deviation e(t) between the actual output data of the system and the output data of the identification model P. Nonlinear system identification based on a process neural network ascertains the identifier model P of P based on a process neural network, i.e. ascertains parameters such as the connection weight function and the activation threshold in the process neural network by directly learning the time-varying inputs/output data of the system, and makes it able to express the forward dynamic characteristic of the system,
200
ProcessNeural Networks
making the rule function (i.e. for the same input, the output deviation between the P and
p) a minimum e(t) =11 y(t) - Y(t) II< e,
where
(9.1)
e is the controlling precision of system identification.
9.2.2 Process Neural Network for System Identification The process neural network model used for nonlinear system identification can adopt the process neural network whose inputs and output are all time-varying functions defined in Section 4.6. Suppose the output functions of the system y(t)E C[O,n, b](t),b 2(t),...,bL(t) form a group of standard orthogonal basis function s which satisfy the fitting precision of y(t) in C[O,n, denote y(t) as L
y(t) = L c/b/(t).
(9.2)
/= J
Suppose the nonlinear system that needs identifying is a Multi-inputsingle-output (MISO) system, the input layer of a process neural network used for system identification has n nodes, the first hidden layer, i.e. the process neuron hidden layer has m nodes, the second hidden layer, i.e. the time-invariant neuron hidden layer has L nodes, and the output layer has one node. The structure of the network is shown in Fig. 9.4.
y(t)
Fig. 9.4 The process neural network used for system identification
In Fig. 9.4, the transform relationship between the inputs and the output of the network is
(9.3)
Application of Process Neural Networks
201
where Uj(t) is the input function of the system; wij(t) is the connection weight function between the input layer and the first hidden layer node; OJ\) is the jth activation threshold of the neuron of the first hidden layer; f is the incitation function of the process neuron; Vjl is the connection weight from neuron j in the first hidden layer to neuron I in the second hidden layer. If the transform process of internal variables of a nonlinear system that needs identifying is complex, we can add a hidden layer to the above process neural network model.
9.2.3 Nonlinear System Identification Process Suppose the learning sample set of a system identification model is {U(k)(t), y(k)(t)} for k=I,2, ...,K; K is the dimension of the sample set. Now we discuss the identification methods of two common structures of identification models. (1) Parallel model The parallel model of nonlinear system identification is shown in Fig. 9.5.
Fig. 9.5 Parallel model
From Fig. 9.5, the identification model of a process neural network is connected in parallel with the actual system which needs identifying , thus we can directly use Eq. (9.3) to express the process neural network as the system identifier. The output deviation between P and ft, i.e. the identifying error e(t), can be used as the adjustment signal for network training. This is a learning problem with teacher demonstration; the actual system is like a teacher, which provides the needed expected output for the process neural network. The identification algorithm module encapsulates the specific learning algorithm and the training mechanism of the process neural network identification model, and can be trained by adopting the algorithm described in Section 5.2.
202
ProcessNeural Networks
(2) Parallel-serial model
The parallel-serial structure model of the identification system is shown in Fig. 9.6. The actual system P and the identification model
P
are connected in parallel with
system input u(t) . The output of P feedbacks to the input terminal of P by a series-wound delay, and ascertains and modifies the identifying model by identification algorithm. In fact, this identification structure ascertains P by using simultaneously the input of system P and the delay output information, and is beneficial to the stability of the identification model. According to the information flow direction of the parallel-serial identification model, the designed process neural network model is shown in Fig. 9.7.
Fig. 9.6 The series-wound-parallel model
y(t)
Fig. 9.7 The process neural network used for system series-wound-parallel model identification
In Fig. 9.7, the input layer of the process neural network has n+1 nodes, the input is the delay output y(t-1) of Uj(t),uz(t), .. .,uit) and the system P at time t-'r, r is the time delay unit of the system, mainly considering the influence of the state of the system at time r-z on the output of the system at time t. The first hidden layer, i.e. the process neuron hidden layer has m nodes, which implements the time accumulation and spatial aggregation of system process inputs. The second hidden layer, i.e. the time-invariant neuron hidden layer has L nodes, which implements learning of the
Application of Process Neural Networks
203
coefficients of the basis function expansion of output y(t) of system P. The output nodes of the fourth layer make the linear combination between the output of the second hidden layer and the basis function, and implement the system output. In fact, the above model is a kind of process neural network used for a parallel-serial identification model. If the spatial aggregation operation of a process neuron is taken as a weighted sum, and the temporal (process) cumulating operation is taken as an integral, then the relationship between the inputs and the output of a parallel-serial identification model is
where woit) is the connection function between the input layer nodes of y(t-t) and the first hidden layer nodes. The learning process of a network is divided into two steps: first carrying through the network training with the objects (ChCZ, • • "CL), which are the basis function expansion coefficients of the expected output of the system y(t), from the input layer to the second hidden layer through the first hidden layer. Then forming the system outputs by combining linearly the output values of the second hidden layer, which are taken as the connection weight coefficients of the output layer, with the basis functions b 1(t),b z(t), ...,bL(t). The detailed process can be completed by the algorithms described in Section 5.2.
9.3 Application in Process Control At present, traditional neural networks have been widely applied to the system process control, e.g., Venkateswarlu and Venkat propose dynamically recurrent radial basis function networks for single-input single-output (SISO) control of uncertain nonlinear processes [141, Tomohisa Hayakawa et al. develop neuroadaptive controllers for a class of nonlinear uncertain dynamical systems [15], Balasubramaniam et al. use neural networks to realize optimal control for nonlinear singular systems [16J, Na et al. present a new adaptive time-delay positive feedback controller (ATPFC) for a class of nonlinear time-delay systems [17], etc. System process control based on a process neural network involves designing the system controller based on a process neural network to complete the process control of the actual system based on model identification of a nonlinear system by using the adaptive learning capability for the time-varying environment and nonlinear modeling capability for the time-varying system of a process neural network. In this section, we mainly research the implementation methods of a typical system process control problem based on a process neural network.
204
ProcessNeuralNetworks
9.3.1 Process Control of Nonlinear System The process control of a nonlinear system has many structural models, and a typical controlling model is shown in Fig. 9.8. In Fig. 9.8, ret) is the tracing signal of system P; u(t) is the process control signal of the system controller output; yet) is the output of the actual system; yet) is the output of the system identifier; e(t), el(t), e2(t) are the deviations, where e(t) is also the input of the controller. The models and parameters of the system identifier and the controller are modified by the identifying and modifying algorithm.
ret) )--l--~ y( t)
Fig. 9.8 A nonlinear system control structure
In the process control of the nonlinear system, the design of the process controller is a key problem. In a situation where we have some prior knowledge of the control system, i.e. several groups of input or output signals, the forward input and output relationship (system forward model) of an actual system can be identified by adopting the process neural network as the system identifier that is discussed in Section 9.2. The design of the process controller also can adopt the process neural network model. Because the system controller is equal to the inverse model of the nonlinear system, the transferring function r l of the control system and the actual system satisfy rlp=!. The system output yet) should trace the change of ret), and the control precision relies on the precision of the inverse model (controller) of the system.
9.3.2 Designing and Solving of the Process Controller There are two methods for designing and solving of the process controller. One is to adopt the direct inverse model method, i.e. we regard the output signals of the actual system as the input signals of the controller, and the corresponding input signals of the actual system as the outputs of the controller, and then carry out the modeling and
Application of Process Neural Networks
205
solving of the controller by directly using the outputs/inputs signals of the actual system. Because the process controller and the actual system are two converse processes, their transferring functions should satisfy r lp=l. If the process controller ascertained by the direct inverse model method satisfies the transferring propert y between the forward model and the inverse model of the system, it undoubtedl y is a direct and simple method in the design of the process controller. But there are some uncertain factors in the actual nonlinear system, this kind of one-to-one transform relationship cannot be realized in any case, and sometimes it may result in an incorrect system inverse model. Next, we will construct a system controller modeling and solving method by adopting an indirect inverse model. The solving of system control signals is implemented by taking the process neural network used for system identification shown in Fig. 9.4 as the forward model of the actual system from input to output, and using this forward model to replace the actual system. To solve this problem, a process neural network shown in Fig. 9.9 is constructed.
Fig. 9.9 Inverse solving of process neural network
This network model adds an input layer whose node number is n to the front of the system forward model shown in Fig. 9.4. The input layer of the forward model is taken as the first hidden layer, and the network structure shown in Fig. 9.4 does not change. The input of the current model is an n dimensional identical function vector Xo(t)= l (tE [0,71), and the process output of the actual system is taken as the expected output of this model. When the network connection weights are adjusted according to an output error signal e(t), the network connection weights (functions) of the forward model remain unchanged, and only the connection weight function Wo(t)= (W,(t),W2(t),.. .,Wn(t))T between the added input layer and the input layer of the forward model is adjusted. From Fig. 9.9 , the input of the first hidden layer of the model, i.e , the input layer
of the forward model is
i.e. WI (t),W2(t),.. . ,wn(t) are the input function vectors of the forward model at the
206
Process Neural Networks
same time, where "*,, denotes the inner product of two vector functions. Because the inputs of the forward model are the output signals of the controller, when the weight function Wo(t) is adjusted to satisfy the input/output relationship of the system forward model, the ascertained Wo(t) is the process input signal corresponding to the expected output yet) of the system forward model, i.e. the output signal of the process controller, and Wo(t) satisfies the mutual converse relationship between the system controller and the system forward model, consequently completing the solving of the system process control signals. Solving of the system controller based on a process neural network is also divided into two stages: the first stage is to ascertain the forward model of the controlled system, i.e. to ascertain the process neural network identification model satisfying the input/output relation of the actual system denoted by Eq. (9.3); the second stage is to solve the system inverse model to satisfy the system control demand. The system forward model can be ascertained by the identification method described in Section 9.2. Now consider the solving of the system inverse model, i.e. ascertain the process control signals U\(t),U2(t),... ,un(t) corresponding to the expected output yet) of the system. From the above analysis, the connection weight function between the input layer and the first hidden layer in Fig. 9.9 is the object function of inverse solving of the control system. In the process of solving, the network connection weight function and parameter values of the system forward model, i.e. from the first hidden layer to the output layer in Fig. 9.7, remain unchanged, and the network training only needs to adjust the connection weight w\(t),W2(t),... ,wn(t) between the input layer and the first hidden layer. If we adopt the gradient descent algorithm to train the network, we generally need to adjust the connection weight of all layers in Fig. 9.9 at the same time. If we only adjust the connection weight in the first layer in the network in Fig. 9.9 according to the demand, the gradient descent algorithm in fact only acquires a small part of the contribution to the whole error signal of the network. The training period of this adjusting strategy is long, and the convergence and the stability of the algorithm are difficult to predict. Therefore, we consider a training method, which combines the gradient descent algorithm with the genetic algorithm. The weight functions w\(t),W2(t),... ,wn(t) are expressed as the expansion forms of the orthogonal function base introduced by training of the system forward model. Adopt the genetic algorithm to obtain the expansion coefficients, and then combine them with the basis functions to acquire the system control signal. In the design of the genetic algorithm, the fitness function of the chromosome is constructed according to the error function e(t) of the control system. Firstly, expand the w\(t),W2(t), ... ,wn(t) with the orthogonal basis function the same as when training the forward model L
Wj(t) = Lwj(l)b/(t), i=1,2, ...,n, [=1
(9.5)
Application of Process Neural Networks
where
w?)
207
are the basis expansion coefficients of Wi(t) corresponding to bL
The chromosome is directly decimal-coded; the genetic operation is described as follows: (a) Determine the initial population dimension N. (b) Initialization of chromosome population. Use directly the solution vectors in the space of the problem field as the chromosome, i.e. directly code WilL) (i=1,2, ... ,n) by decimal real number, the gene number of each
wi!), w?),...,
chromosome is nL, and a random number is given in the interval [-1,1] to initialize the chromosome of each population. We can also select N samples in a learning sample set to construct the chromosome directly by using the ordinal arrangement of the expansion coefficients of basis function. (c) Construction of fitness functions of the chromosome . The fitness function of the chromosome can be constructed according to the output error function E = Ile(t)11 of the system described in Fig. 9.6. Considering the actual significance of fitness, we can select the reciprocal of E+ 1, i.e. I g(W)=-.
(9.6)
I+E
(d) Selecting populations . Adopt the fitness distribution method in proportion . For the chromosome k with fitness gk (k=I,2,oo.,N) , the selection probability is computed by
r, = g
k/ ~
gj
•
Construct a rotary wheel according to these
probabilities, rotate the rotary wheel N times, and select a chromosome to add to a new population each time. (e) Arithmetic crossover. Arithmetic crossover is defined as two chromosomes
where Ae [0, I] is a random number. . I mutation. . For grven . (L)) ,Iif iIts (f) Non-symmetnca ancestors W-(-m - WOI , ... , W;, ..., wOn element
Wi
is selected to mutate, and the generated progeny is denoted as
W'=(~:)'''''»-;''''''~~»)' and then following two possibilities W;=Wi+L1(S,W~ -Wi)'
w;
may change randomly according to the
or
where s is the genetic number, w~ and
L
w;=wi-L1(S,W i - Wi L Wi
),
are the upper and lower bounds of
Wi .
Function L1(s,z) tends to be zero along with the increase of s. For example, L1(s,z)=z'r(I-slepoch)d, where rE [0,1] is a random number; epoch is the maximum genetic number; d is the parameter which defines the non-symmetrical degree.
208
Process NeuralNetworks
(g) Circle control. Compute the fitness value of each chromosome according to the fitness function; the circle control condition can be the minimum precision control of the object function or the objective value control of the fitness function. According to the above algorithm, WI(t),W2(t), .. . ,wn(t) can be determined, i.e. the process control signal of system U.(t),U2(t),... ,ui t).
9.3.3 Simulation Experiment Example 9.2 Identification and process control of nonlinear system with continuous input/output function sample pair Consider the identification and the process control of the nonlinear system P with 16 continuous input/output function sample pairs. The 16 input/output function sample pairs are: (tsin(0.Sm)+2.0, tcos(0.5m)+2.0); (tsin(0.6m)+2.0, tcos(0.6m)+2 .0); (tsin(0.7m)+2.0, tcos(0.7m)+2 .0); (tsin(0 .8m)+2.0, tcos(0.8m)+2 .0); (tsin(0.9m)+2.0, tcos(0.9m)+2 .0); (tsin(1 .0m)+2 .0, tcos(I .Om)+2.0); (tsin(1.lm)+2.0, tcos( 1.1 m)+2 .0); (tsin(1.2m)+2 .0, tcos( 1.2m)+2.0); (tsin(1.3m)+2.0, tcos(1.3m)+2 .0); (tsin(1.4m)+2 .0, tcos(1.4m)+2 .0); (tsin(1.Sm)+2.0, tcos(l.Sm)+2.0) ; (tsin(1.6m)+2.0, tcos(1.6m)+2.0); (tsin(1.7m)+2 .0, tcos(1.7m)+2.0) ; (tsin(l.8m)+2.0, tcos(1.8m)+2.0) ; (tsin(1.9m)+2.0, tcos(I .9m)+2.0); (tsin(2.0m)+2.0, tcos(2.0m)+2 .0); where te [-2,2] .
Consider a process neural network identification model, i.e., the forward model of the system, based on a continuous Walsh transform learning algorithm [18]. Its network topological structure is selected as 1-50-S0-32-1, i.e. one input node, SO process neuron nodes in the first hidden layer, SO general time-invariant neuron nodes in the second hidden layer, 32 time-invariant neuron nodes in the third hidden layer (in the experiment, the number of the continuous Walsh basis function is selected as 32), and one output node. This model adds a time-invariant neuron hidden layer compared with the network model denoted by Eq. (9.3), to improve the nonlinear mapping capability of the network. All the learning rate constants of the network are selected as 0.3, the finite error precision is 0.05, and the maximum iteration number is 10,000. For 16 learning samples, the network converges when it iterates 938 times, and the approximate error precision is 0.049S0. The output function curve of the network and the actual expected output function curve of two samples are shown in Fig. 9.10 and Fig. 9.11 respectively, in which the solid line is the expected output function of the system, and the dashed line is the output function of the system identification model.
Applicationof ProcessNeural Networks
209
4.0 , - - - - - - - - - - - - - - - - - - ,
r> "
3.0
"
2.0 1.0
!
/'
',
/
_.. ~,.> - - . '"'""
(;'
~
j ;'
o
o
-1.0
-2.0
/
"
-"
/
~
1.0
2.0
Fig. 9.10 Outputfunction curve of the model with the output tcos(0.9m)+2.0 4.0
~\
3.0
"" " " ,
\
2.0
I
t
\.
1.0
o
"
~I
/' ... .
"........ ... - -
V·
-2.0
-1.0
o
~
/
/i
"
/
\
,\
., \
. .,../
\
\
1.0
2.0
Fig .9.11 Outputfunction curve of the model with the output tcos(1.5m)+2.0
Now we construct the control model of the system, i.e. the inverse model of the system by using the forward model of the nonlinear system. The proces s control signal s of the systems whose outputs are tcos(0.85m)+2.0 and tcos( 1.65m)+2.0 are solved by using the genetic algorithm. The population size N=6, the initial chromosome is the continuous Walsh expansion coefficients of basis function of the output function of six learning sample s in the training set, the crossover probability is 0.5, the mutation probability is 0.1, the upper limit of the gene is 5.0, and the lower limit of the gene is -2.5. The inverse model satisfies the control error precision 0.08 after it iterates 273 times. When the system is in the cond ition of control signal input, the actual output signal curve of the network and the expected output function curve are shown in Figs . 9.12 and 9.13, respectively, in which the solid line is the expected output function of the system, the dashed line is the actual output of the system . 4.0 ,--- - - - - - - - - - - - - - - - - ,
O'-----.L.---- .J..-- - --'-------J -2.0 -1.0 o 1.0 2.0 Fig. 9.12 Two output curves of tcos(0 .85t1t)+2.0
210
Process Neural Networks
4.0 3.0
2.0 1.0
a
-2.0
-1.0
a
1.0
2.0
Fig. 9.13 Two output curves of tcos(1.65t1t)+2.0 From the results of the simulation experiment, we can see that the two curves coincide in the middle part of the control process interval; there exist some deviations in the two ends of the interval. We can improve the control quality of the system by methods such as increasing the number of learning samples in the experiment, improving the network training control error precision, etc.
9.4 Application in Clustering and Classification In practice, there is much classification and clustering of the process signals of a nonlinear dynamic system, e.g. complex equipment, monitoring and control of the equipment and system [191, fault detecting [201• Monitoring and control keeps the normal outputs when the inputs are normal; detecting finds out deviant outputs when the inputs are normal. In applications, first take a group of correct inputs/outputs of the equipment as the learning samples; construct the equipment-running model based on a process neural network. When the system output corresponding to a correct input deviates from the predicted value of the model, the equipment may have a certain fault or take the input/output data of a correct mode and various fault modes as the learning samples, and use the process neural network to solve the classification. Then we can not only check whether there is a fault or not, but also diagnose what fault it is. The rotating machine fault diagnosis described in Example 6.2 is a typical example. There are some actual problems where we need not know the classification structure of the research objects in advance. We can carry out classification only according to a group of similar indexes among the research objects, then carry out clustering by adopting the self-organizing competitive process neural network of non-teacher demonstration without the limitation of prior knowledge of the research objects. For example, take the sedimentary facials differentiating problem described in Example 8.6. Next, we will give another example that carries out classification for the time-varying signals with singular values by adopting rational expression process neural networks.
Application of Process Neural Networks
211
Example 9.3 Application of rational expression process neural networks in time-varying signal process with singular value Considering the rational process neural network model described in Section 4.5.2, its learning algorithm is as follows. Suppose the input function space of the process neural network is (C[O,nr, and b t(t),b2(t), .. .,bL(t) are a group of standard orthogonal basis functions that satisfy the fitting precision of the input function in C[O,n (such as the triangle basis function, Walsh basis function). Suppose X(t)=(Xt(t)0X2(t) ,... ,xn(t» are the functions of the input space of the network. The forms expanded by basis functions of x;(t) are denoted as L
xj(t) =LaA(t).
(9.7)
1=1
The numerator unit connection weight function wij(t) and the denominator unit connection weight function vij(t) of the rational process neuron are also expanded by basis functions L
wij(t) =L wt' )b, (t),
(9.8)
1=1
vij (t) =
L. vjjl)b,(t) . L
(9.9)
1=1
where w~ ) , vjj) are respectively the coefficients of basis function expansion of wilt) ,
Vj/t) corresponding to bt(t), and are adjustable time-invariant parameters. Given K learning samples (Xkl(t),xdt), ... ,xkn(t),dk) (k=I,2,.. .,K) . According to the
standard orthogonal basis function, the output of the rational process neuron dual unit numerator part corresponding to the input of the kth sample is
r
h~;) =f ( (~Wij (t)xkj(t)}t - Bt) ) = = =
where
(k )
aiI
f[ r(~(t
Wjj l)bl(t»)(taj\k)bJt») }t-Bt»)
f( ~(tt w~)a~k) r f( ~t wjj)a~k)
(9.10)
(bl (t)bs (t) )dt )-Bt»)
-Bt»),
j = 1,2,....m,
are the coefficients of the basis function expansion of Xkj(t)
212
Process Neural Networks
corresponding to bt(t). Similarly, the output of the denominator part is L
- f ( ~~Vij n
(k) _
hdj
(I)
(k )
ail
(d) )
- ( )j
• _
,j-l,2,....m.
(9.11)
Suppose the actual output of the input network of the kth learning sample is Yh the error function is defined as K
E = L(Yk -dk )2
t.( g(t (f( t.t.a,\"-!'j' -U;o,)/f( t.t. a,\"'&'-0;") )-0)-d, J k=1
=
Uj
(9.12) According to the gradient descent method, the modified formula s of network connection weights and incitation threshold are (I ) wij(I ) -w ij
dE 1-1,2, .. . - 2,...,m,. I -- 1, 2,..., L, -aam' ...,n,j-l,
(9.13)
w ij
vij(I ) -- vij(I)
se .-
j3~,'-1, 2,..., n,. j' -- 1, 2, ...,m,. I -- 1, 2 , ..., L,
-
(9.14)
Vij
Uj
() (u )
j
=uj-..t dE ,j=I,2, ....m,
dU j
=
() (u )
j
-
dE , j . =1,2 ,... .m, r d()(u)
(9.15)
(9.16)
J (d )
()j
(d )
= ()j
dE
-1]~, j
() = ()- J1
where a, fJ, Denote
.
d()j
= 1,2,oo.,m,
dE
eo :
(9.17)
(9.18)
A" y, 1], J1 are the learning efficiency constants.
then (9.19)
Application of Process Neural Networks
a~y) = 2t(g(Zk) - dk) g'(Zk)uj -ai~k)huJ'(~tai~k)VY) - Bjd) )/h~, aE
-f(
Uj
k=1
)'
huj a-=2L" g(zk)-d k g (Zk)-;-'
213
(9.20) (9.21)
~
a~~) = 2t(g(Zk )-dk )g'(Zk)uj f'( ~tai~k)Wi~/) -Bj") }-l) /hdj,
(9.22)
a~~) = 2t(g(Zk )-dk )g'(Zk)uj huJ'( ~ta~k)Vi~/) _B)d) }-1)/h~ ,
(9.23)
aE
K
aB =2~(g(Zk)-dk)g'(Zk)(-I)
.
(9.24)
The learning algorithm is described as follows . Step I Select an appropriate standard orthogonal basis function in the input space, determine the number of basis functions L according to the fitting precision demand of training sample function, and expand the input functions as the finite series forms of basis functions. Step 2 Give the learning error precision DO, the learning iteration times s= I, and the maximum learning iteration number M; initialize the parameters of the network (connection weights and incitation thresholds)
(/ )
(/ ), u j ' B(U ), B(d) B (i1-I j j ' - , 2 , . .. ,n,. )' -I - , 2 ,...,m,. [-I - , 2,..., L) .
wij , Vi)
Step 3 Compute the network error E according to Eq. (9.12); if E<£ or sz-M, then go to Step 5. Step 4 Update connection weights and incitation thresholds according to Eqs. (9.13)-(9 .24); s=s+ I; go to Step 3. Step 5 Output the learning results and stop. Generate three similar classes, 60 subsection exponential time-varying waveform signals by adding noise to a signal generator with anomalous function inner template. The power index of the subsection index function and subsection interval are randomly set within a certain range. The power index of the first class signal is randomly selected in the interval [1,1.5], and the length of the subsection interval is randomly set in the interval [0.9,1.0] . The power index of the second class signal is randomly selected in the interval [1.9,2.2], and the length of the subsection interval is randomly set in the interval [0.8,1.1]. Those intervals of the parameters of the third class signals are [2.9,3.3] and [0.7,1.2] respectively. Compose a uniform process signal interval [0, I] after unitary processing. The typical signal curve is shown in Fig. 9.14. Construct a learning sample set by 40 time-varying signal functions, and compose the testing set with 20 time-varying signal function samples. Compare the results of tests, the structure of the rational expression process neural network is selected as 1-3-3-1, i.e. I continuous signal input node, 3 dual layer units of process neurons, 3 rational expression conforming units, and 1 output node. The basis
214
Process Neural Networks
function is selected as the Walsh standard orthogonal function system. The 60 time-varying signal samples are expanded by the Walsh basis functions using a fitting precision of 0.01. When the number of basis functions is 64, it satisfies the fitting precision. The learning rate constant is selected as 0.45, the training error precision is 0.05. After the training of the network is repeated 20 times with the network parameter initialized each time, 20 classification models are obtained, and the network converges after 152 iterations on average. Fig. 9.15 shows one of the error curves of these models. We carry out classification identification on 20 tests by samples using 20 different classification models respectively . Four of the 20 test models have one classification error respectively in the identification of 20 test samples, 7 models have 2 classification errors respectively, 6 models have 3 classification errors, 2 models have 4 classification errors respectively, and the worst model has 5 classification errors. The average correct classification rate is 87.75%. Because the curve structure and the conformation of three classes of signal curves are very similar, and the process pattern characteristic difference is small, it is a good classification identification result.
,o~ oo
0.2
0.4
(a)
0.6
0.8
1.0
':Dj~jAA1U.A11J o
0.2
0.4
(b )
0.6
0.8
1.0
':0 AJJA jJjJjiJ o
0.2
0.4
(c)
0.6
0.8
1.0
Fig. 9.14 Typical signal curve. (a) Typical signal curve of the first class; (b) Typical signal curve of the second class; (c) Typical signal curve of the third class
1.00 ,.----.-- ,-----.-- ,------,- ...--.-...--.---, 0.75 0.50 0.25
40
80
120
Fig. 9.15 The learning error curve
160
200
Application of Process Neural Networks
215
The learning error curve has a certain surge at the initial stage of the training of the rational expression process neural network. The error curve gradually descends with the increase of the learning times; 20 repeated training results have similar regularities .
9.5 Application in Process Optimization Process optimization is a typical problem in time-varying system process control; it has important significance in some production processes, such as agriculture, chemical engineering, machine manufacture, etc. For example, in modem agriculture production such as crop greenhouse cultivation, etc., we can control the condition of various crops during their growth processes, the inputs of the system are process control parameters such as temperature , humidity, illumination , fertilizer, the concentration of CO 2, etc., and the outputs are various crop growth and quality curves depending on time variation. Take the test sampling data of the growth process of crops in various production conditions to form the learning sample set, and construct a crop's growth process model based on a process neural network. Take the quality and the yield as the control object after modeling, seek the optimal production process, ascertain environment variable process functions that are optimum for each growth phase of the crop, and the process functions include temperature, humidity, illumination, etc., to complete "precise agriculture" production. Besides, in chemical reaction process during the production of various chemicals, take various time control parameters including the concentration and the proportion of the chemical reagent, the temperature and the pressure of the container, as the inputs of the chemical production process, and the outputs are indices, such as the quantity and the quality of the chemical reaction product. Search for optimization after modeling, and ascertain the optimal input curves for influencing the optimization of the parameters, to complete the optimal control of the chemical reaction process. In the economic field, we apply the process neural network to various input/output systems. First, construct a model for the system, and then search for the optimal and satisfying control parameter. This will obtain a meaningful result. Therefore, if there are enough data for developing the research in the aspect, the acquisition of a certain beneficial result is possible. Moreover, in the process of industrial production , attempts to optimize the production technique flow, reduce consumption, save energy, improve the quality and the yield of products, etc. by applying process optimization technology of the process neural network are worth researching . In theory and algorithm, we need to study the optimizing theory of the functional when solving these process optimization problems and find out the corresponding functional optimization calculation methods. At present, researches in this aspect are
216
Process Neural Networks
insufficient, research findings are few, and extensive and deep research work is needed to provide service for various real applications.
9.6 Applications in Forecast and Prediction Process neural networks have favorable process statistical analysis characteristics for a time-varying signal that has randomness and regularity at the same time, and has favorable adaptability in constructing a forecast and prediction model for a nonlinear dynamic system. Next, we will explain the application of process neural networks in the forecast and the prediction of a nonlinear dynamic system using the examples of flux forecast and the jam prediction of a mobile telecommunication network .
Example 9.4 Flux forecast and jam prediction for a mobile telecommunication network Dynamic telecommunication network equipment generates a large quantity of alarm data every day. The alarm data are used for reporting the detected exceptional conditions, which contain valuable implicit knowledge about the state of the network . The traditional method forecasts network faults only by analyzing the incidence relation of the alarm data. In fact, the performance of the network equipment has a close relationship with the generation of the alarm data . In the following, we will analyze the test data set which is collected from the network management center of Sichuan Province in China and exists in a BSC (Base Station Controller which is network equipment) to find out the interrelation of the data. The network is trained by using recorded data from 15 days, and sampling data are obtained from 25 different BSC devices . Each set of data has the following 11 attributes : (a) BSC Il), (b) BSC name, (c) vendor name, (d) wireless switching-on rate, (e) switching-off rate, (f) throughput, (g) switching-off success rate, (h) switching-on success rate, (i) SDCCH congestion rate, (j) TCH congestion rate, (k) alarm. Among these attributes, (a) and (b) contain basic information about the BSC and the vendor and have no process features . Attributes (d)-G) are attributes of the performance of the equipment, and have process features. The value of the alarm attribute is 1 or 0, according to the existence of an alarm or not respectively. We construct a forecast and prediction model by using a four-layer process neural network. In the network, a first hidden layer is a process neural hidden layer with 15 nodes, and a second hidden layer is a general time-invariant neuron hidden layer with 15 nodes. Network parameters are set as: seven input nodes (or eight input nodes when the vendor is included); Fourier functions chosen as base functions, whose number is 32; both inputs and outputs are communication flows; the learning efficiency constant is 0.05; the maximum learning times is 10,000; the learning precision is 0.001. The experiments are divided into four groups, and each group is further divided
Application of Process Neural Networks
217
into two parts : one includes vendor attributes, and the other does not. In the first group of experiments, the relation between training error and times of iteration is constructed. Figs. 9.16 and 9.17 show the training curves of the experiments. When iterations reach 10,000, the training error is low enough.
0.8
g 0.6 u
g; 0.4
';:
"§ !-
0.2
OL....----.---:==:::::::;:==:;=::;::==;===;=d a
2000
4000 6000 Iterations
8000
10.000
Fig. 9.16 Training error (not including vendorattribute)
0.8 0 0.6 t: u
ell
c:
'c
-§ f-
0.4 0.2 0
0
2000
4000 6000 Iterations
8000
10,000
Fig. 9.17 Training error (including vendor attribute) In the second group of the experiments, we analyze the prediction precision and the prediction error (prediction error=l-prediction precision) of the model. From the prediction error, we can observe the stability of the model. Test samples (not included in a training sample set) are used for testing the trained prediction model. Figs. 9.18 and 9.19 show the prediction precision of the prediction model of the process neural network for different network equipment. Fig. 9.19 shows that the process neural network has high precision whether it includes the vendor attribute in inputs or not. In Figs. 9.20 and 9.21, we can see that pred iction error of different equipment is stable. In the third and fourth groups of experiments, we compare the performance of the process neural network with that of a BP algorithm and an RBF algorithm. In these experiments, a BP algorithm and an RBF algorithm are used in SAS statistical software. Figs. 9.22 and 9.23 show the experimental results when the network does not include the vendor attribute in inputs. The BP algorithm and the RBF algorithm
~
C
cr
~
g.
..,0
C.
0
0
<:
S' (JQ
2" c,
S' o
Zg
0
a ..,
0
g.
&. (')
@
"'0
-=
tv
riIe O'
"'l
~
MYBSCCl DCBSCBl DYBSCAI
0
-
30
-a'
C
.c
(:)
00
(:)
~
ItCl
(JQ
S'
2" c,
---S' o
0
'" 0'
D,
@
0 "0
g.
!'i'
@ c,
"'0
IC
;...
Ie
OYBSCB2
MYBSCA l
SNBSCA2
lSBSCBl
PZBSCA2
NCBSCA2
MYBSCBl
Z
~
~ ::J, cr C
..,0
c.
0
;:a
30
"0
:,
.c
0
0 ;;;.
~
0
0
MYBSCAl
SNBSCA2
lS&SC8l
PZBSCA2
NCBSCA2
MVllSCBl
DVBSCB 2
DVBSCC2
DVBSCA2
DYBSCC'
MYBSCA2
PlBSCA.
l S&SCA2
OCBSCA'
t.lY8SCCl DC8SC8l OY8SCAI
XCBXCAl
MYBSCB2
GYBSCAI
NCBSCCl
&NBSCA l
GABSCAI
XCBSCBl
NCBXCA I
00
00
0
0
~
0
0\
0
DYBSCC2
~
0
<:
(:) tv
0
DYBSCA2
OYBSCCI
MYBSCA2
PZBSCA I
lSBSCA2
;;- DC8SCAI
Q
'l-
XCBXCAI
MYBSCB2
GYBSCAI
NCBSCC '
SNBSCAI
GABSCAI
XCBSCBl
NCBXCA1
0
0
Erro r ~
\0
0 N 00
\0
0
tv
Ul
\0
0
Preci sion 0\
-.I
\0
0
8
-(:)
..,
cr C
~
~ ::J,
0
C.
0
0
-e
c.
S' (JQ
2"
(')
S'
S' g
0
0'
r;;'
(')
0
..,
0 "0
g.
!'i'
p,.
@
"'0
ee
;...
Ie
riO'
"'l
~
- - -
Mv esCAl
S NBSCA2
LSBSCB l
---
PZBSCA2
NCBSCA2
MYBSCBl
DYBSCB2
OYBSCC2
DVBSCA2
DVBSCC1
- , PZBS CA1 '0 - - 3 MYBSCA2
OYBSCA1 ..,"'"~ OCBSCA1 --B LSBSCA2 c ---
Z MYBSCC I !l oCBSCiil
XCBXCA1
MYBSCB2
GYBSC A1
NCBSC CI
SNBSCA1
GABSC A1
XCBSCB1
NCBXCA1
0
00
00
0
~
\0
0 00
tv
\0
0
\0
tv
Ul
0
Precision
0\
-.I
\0
0
0 0
(:)
~
N
~
a-
~
~
ac:
~
CIl
a1:1
ce
Application of Process Neural Networks
0.10 0.08
...0 ...... UJ
0.06 0.04 0.02 0
ctwork equipment
Fig. 9.21 Prediction error (including vendor attribute)
1.00 0.92 c
0.84
u
0.76
0 .:;;
u
0:
0.68
o I'N ' : 0.988237 O BI': 0.925788 x RBF: 0.9482 11
0.60
Network equipment
Fig. 9.22 Comparison of prediction precision (not including vendor attribute)
0.40 0.32 ... 0.24
...e
UJ
0.16 0.08
o
o 'NN, 0." 07" O BI':0.074212 x RBF: 0.52674
~~QQQ9",8
f\ -J
\
~
¥
"0
ctwork equipment
Fig.9.23 Comparison of prediction error (not including vendor attribute)
219
220
ProcessNeural Networks
are accurate for some network equipment, but they have low precision for several equipment types. It shows that the precision and the stability of a process neural network are higher than those of a traditional network. When the network includes the vendor attribute in inputs (Figs. 9.24 and 9.25), we find out that the precision of PNN is a little lower than that of BP and RBF because the vendor attribute has no process feature . However, the process neural network has high stability and still has high accuracy in this instance. 1.00 0.97 c 0
' ;;'
.
'u u 0-
0.94 o PN : 0.9H07lili o Br ,0.991250 x RBF: 0.994731
0.91 0.88 0.85
ctwork equipment
Fig. 9.24 Comparison of prediction precision (including vendor attribute)
0.10 0.08
...0 ...... UJ
o I' ' : 0.0 19212 o Ill' : 0.00H750 x RIlF: 0.005269
0.06 0.04 0.02 0
ctwork equipment
Fig. 9.25 Comparison of prediction error (including vendor attribute)
In this example, we use the communication flux from a group of time segments in history as the time-varying process input samples and use the communication flux in a certain subsequent time point of the corresponding time segment as the corresponding output samples. The process neural network preferably completes the generalized forecast after the learning and the modeling of the process neural network.
Application of Process Neural Networks
221
Example 9.5 Application in PID control prediction of chemical production In chemical production, the process prediction control of a fractionating tower reactor is very important for improving the yield and the quality of chemical products . In the test data for the reactor, the input variables can be chosen from two groups of values: (I) Set values of the input variables, namely, the set values of the POI controller have five process variables: 2FCI302_SP, 2FCI308_SP, 2FCI306_SP, 2FC1215_SP and 2FCI2l8_SP; (2) The real values (measured values) of the input variables: 2FC1302, 2FC1308, 2FC1306, 2FC12l5, 2FC12l8. In test data for the reactor, there are three output variables: 2TIl302_CV, 2QI1306_CV and 2QIl304_CV. In one chemical production, 4887 input and output values of process measurement are measured and recorded. The sample interval is I s, and part of the sampling data is shown in columns 1-8 of Table 9.3. The multi-input-multi-output process neural network with a single hidden layer is used for process prediction of the output values of the reactor, and the topological structure of the network is chosen as 5-21-4. Starting from chemical production, eight real-input measured values (eight successive sample values of five input variables) are used orderly to predict four successive output values (four subsequent values of three output variables). For three output variables of the reactor 2TIl302_CV, 2QI1306_CV and 2QIl304_CV, the corresponding output variables (prediction results) of the process neural network are denoted as 2TI 1302_CV_REAL, 2QIl306_CV_REAL and 2QI1304_CV_REAL. By adopting a process neural network training algorithm based on the Walsh transform, part of the prediction results are shown in columns 9-11 of Table 9.3. Part of the process prediction results and the real measured values of three output variables are respectively shown in Figs. 9.26-9.28, where the light curves are the prediction curves of the process neural networks ; the dark curves are the expected output curves, i.e. the fitting curves of the reactor test data. As shown in Table 9.3 and Figs. 9.26-9.28, the process prediction model, which uses the process neural network as the reactor, achieves higher prediction precision . In practice, many other problems can adopt the process neural network to build a prediction model. For example, in weather forecasting, according to weather conditions over the years and the data reflecting climate change laws, we can build a prediction model based on process neural networks to predict climate change conditions in the near future and the middle to long term with different probabilities. In another example, to forecast the macro control and economic situation of a national economy, according to objective data and the regularity of economic operation, firstly, sum up all kinds of factors affecting economic development, and then under the condition that the general goal of national economic development is determined, predict the effect of comprehensive macro-controls based on various factors. We believe that as long as the sampled economic operation data are adequate, on the one hand, an economic operation decision model can be built by adopting a specially designed process neural network (e.g. a process neural network including fuzzy values) combined with objective laws of economic development, and the model can, for the determinate goal, inversely seek rational process control indexes
222
Process Neural Networks
Table 9.3 Part of the test data and prediction results of the reactor ...J 00
00
M
N
o
>
~,
~
M
<
W
> U
">,
,
25 M
U, >0
5N
N
o M
5N
51.703
12625
9.958
0.1989
22.654
197.3569
3.842752
1.720410
197.3226
3.847345
1.719169
51.611
12625
9.949
0.1985
22.185
197.3627
3.833413
1.722878
197.2940
3.842506
1.720708
52.496
\2625
9.930
0.1993
22.084
197.2416
3.791197
1.721718
197.2623
3.798063
1.719753
51.375
12625
9.820
0.1993
22.218
197.0514
3.819826
1.716119
\97.1374
3.795799
1.721422
51.324
12625
9.769
0. 1980
22.523
\96.8669
3.772746
1.72\066
\96.8737
3.775503
I. 721050
51.724
12625
9.772
0. \980
22.320
\97.0514
3.812689
1.719888
197.0325
3.8\6259
I. 720573
51.423
12625
9.772
0.\ 985
22.202
197.4607
3.812768
1.724559
\97.4881
3.8\ 1220
1.724457
51.632
12625
9.785
0.1985
22.298
197.4204
3.75\421
1.730540
\97.4055
3.738469
1.728948
52.378
12625
9.758
0.1980
22.058
197.328\
3.718828
1.728733
197.2353
3.734223
1.728826
52.259
12875
9.790
0.1980
22.489
\96.9187
3.729149
1.727716
197.0095
3.725853
1.728824
50.893
12875
9.792
0.1990
22.554
196.8957
3.745962
1.719202
196.8948
3.745363
1.719254
52.053
\2875
9.801
0.\983
22.263
196.7400
3.827659
1.7\ 1979
196.7389
3.827601
1.711786
51.216
12875
9.855
0.1979
22.188
\96.8092
3.842712
1.713227
196.8095
3.842876
1.713309
51.094
12875
9.813
0.1998
22.632
196.9880
3.936727
1.713239
196.9877
3.93687
1.713229
51.440
12875
9.780
0.2006
22.595
197.0456
3.9\9298
1.710837
197.0461
3.919102
1.710668
51.680
\2875
9.812
0.2002
22.298
196.838\
3.997875
1.7\2953
196.8056
4.002178
1.712016
51.508
\2875
9.804
0.\998
22.557
196.9015
4.03\108
1.713683
\96.8500
4.037784
1.7\2030
51.484
\2875
9.816
0.2006
22.313
196.6882
4.\349 \6
1.715818
\96.6438
4.140297
1.714054
52.274
12875
9.824
0.1998
22.224
196.5096
4.131458
1.7\1781
196.5479
4.137560
1.711203
52.058
12875
9.808
0.1996
22.286
196.5787
4.160254
1.704013
196.6682
4.137916
1.709078
52.754
12875
9.823
0.2001
22.344
196.5267
4.183951
1.704395
196.5138
4.183955
1.704763
51.8\6
12625
9.805
0.2001
22.309
196.4288
4.241944
1.698021
196.4883
4.233400
1.695516
52.584
12625
9.783
0. \998
22.428
196.4922
4.216187
1.689842
196.4455
4.227890
1.690182
51.343
12625
9.764
0.1993
22.549
\96.4979
4.2592
1.680353
196.4655
4.258337
1.680526
50.966
12625
9.788
0.1988
22.597
196.4057
4.210317
1.681163
196.4732
4.209069
1.681546
51.400
12625
9.781
0. \985
22.316
\96.2674
4.\ 26871
1.6809 \8
\96.3352
4.125\43
1.681713
50.993
12625
9.763
0.1990
22.264
\96.2097
4.097279
1.680659
196.\657
4.094817
1.683129
52.487
12625
9.763
0.1993
22.347
\96.2963
3.996 \06
1.684845
\96.2998
3.998278
1.684304
51.600
12625
9.789
0.2006
22.391
\96.5844
4.070610
1.680332
196.5606
4.066\ 48
1.682184
51.676
12625
9.754
0.2001
22.476
196.4807
4.035893
1.683330
196.5291
4.031475
1.682652
so as to achieve the preset goal. On the other hand, we can also build a prediction model according to practical economic operation data and predict the future economic development situation. This is just the control and prediction model required for an "accurate economy" that some economi sts dream about. We believe that so long as economi sts cooperate with mathematicians and IT experts intimately, the above goals are achievable.
Applicationof ProcessNeural Networks
223
199
.;< '" 197 03
;..,
195 2
0
4
6
8
10
12
14
X axis (time)
Fig. 9.26 The curves of 2TIl302_CV and 2TIl302_CV_REAL (unit: 10 s) 6.0 5.4
'" 4.8
.;<
03
s,
4.2 3.6 3.0
o
2
6
4
8
10
12
14
X axis (time)
Fig. 9.27 The curves of 2QIl306_CV and 2QIl306_CV_REAL (unit: 10 s) 1.80 1.76 .;< '" 03 ;..,
1.72 1.68 1.64 1.60 0
2
4
6 X
8
10
12
14
axis (time)
Fig. 9.28 The curves of 2QI1304_CV and 2QIl304 _CV_REAL (unit: 10 s)
Example 9.6 Prediction of sunspot activity As is known to all, sunspot activity has a major influence on human living environments, so people have long paid attention to it. Because sunspot activity has certain regularity, physicists have already built some prediction models and obtained
224
Process Neural Networks
some results in the past. In fact, human beings have already accumulated much observation data about sunspot activity during the endless process of observing the sun. Obviously, they are typical time-varying process records. Using these observed data, we can conveniently build a prediction model of sunspot activity by adopting a process neural network. First, time is divided into several sections, e.g. one year as a section (a time section may overlap or not). Then the observed results in a time section are regarded as a time-varying function. The observed results in the previous time section are the network inputs, and those in the next time section the expected output of the network. Accordingly, we can construct a sample training set and train the network so as to build a prediction model. A research group at Harbin Institute of Technology has done much work on this, and their research achievements indicate that the prediction effect is ideal. In addition, the generalization and extension of applications using various expensive experimental processes are also typical process prediction problems. For example, the generalization and utilization of data from space experiments (satellite, aircraft, etc.) allows us to make enough accurate predictions using an environment factor analysis based on finite experimental data from fewer experiments.
9.7 Application in Evaluation and Decision Evaluation and decision is important for production management and economic operation, and its influence on the development of enterprises cannot be underestimated with the simulation and control of the oil and natural gas exploitation process during the course of oilfield development as an example. When the well pattern of an oilfield, development mode, geological condition, oil reserves, etc. are determinate, take the time-varying factors such as separated layer water intake per unit thickness, production differential pressure, etc. as the inputs, and take the oil and gas yield, water ratio, degree of reserve recovery, etc. as the outputs. Then we can build a production and operation model based on process neural networks, simulate and monitor the oilfield development process, and then evaluate the oilfield development effect. At the same time, the built model can also be used for phase prediction for the next development situation of the oilfield. In addition, in the course of new projects or technological transformation of enterprises, the evaluation and the optimization of the input/output benefit must be considered before investment. Meanwhile, forecast and prediction for the projects, process optimization, input time, the control on input quantity, etc. also need solving. Generally speaking, the output can be regarded as an accumulative result with respect to time, but not a one-to-one function relationship. It is thus evident that the process neural network has an apparent advantage in solving these problems. Therefore, the process neural network has great meaning for the scientific decision of enterprise management, when building a rational evolution and optimization model and method for input/output benefit.
Application of Process Neural Networks
225
The requirements for evaluation and decision are also embodied in many other aspect s. In the following, we will take the evaluation of "on-condition maintenance" of an aircraft engine as an example . Example 9.7 Evaluation on "on-condition maintenance" of aircraft engine In aviation, security is the first thing that must be considered and real-time condition monitoring of an aircraft engine is necessary to ensure flight safety. At the same time, in order to reduce the expensive cost of maintenance, every large airline company in the world adopts a maintenance program using so-called "on-condition maintenance" , i.e. before determining whether an engine needs shop maintenance or not, the security of the engine is firstly estimated accurately according to various states of the engine , and then the concrete manipulation of the engine is determined . In this way, under the precondition of ensuring security, the service life of the engine is improved greatly and the operation cost is reduced. The exhaust gas temperature (EGT) of the aircraft engine is one of the important parameters reflecting the working state of the aircraft engine. Subject to the material temperature of the heat component of the engine, the value of EGT must be controlled. By monitoring the change in EGT value, the performance and the degradation situation of the aircraft engine can be judged . While cruising, EGT is a process function varying with time. As EGT is affected by many complex factors, at present we cannot use a precise mathematical model to predict its changing rules. In the following, a modeling method using process neural networks is adopted to solve this problem. This is an actual example, the EGT data are taken from the aircraft of a famous airline company in China, and an aircraft that was maintained by an aircraft maintenance engineering company in Beijing. The air fleet number of the engine is 767-ER, the engine model is JT9D-7R4E, the engine number is xxxxxx, and the scheduled flight number is Bxxxx. The sampling period is from Jan 4, 2000 to Dec. 18, 2000, and the sample time interval is generally about one week and can be approximately regarded as equal-interval. We have obtained 44 discrete data altogether. Eight successive discrete data are fitted to construct a time-varying function as the input function of the process neural network , and the 9th set of data is taken as the output. The topological structure of the process neural network is 1-80-1 . The input functions and the connection weight functions of the process neural network are expanded based on a trigonometric function and the number of basis functions is eight. In this way, we obtained 36 groups of samples. The first 30 groups of samples are taken as the training samples of the process neural network, and the other 6 groups of samples as the test samples . The error precision of the network is set as 0.01 and the maximum iteration number is 10,000. The network is convergent after 6711 iterations. The test results are shown in Table 9.4. As can be seen from Table 9.4, the average relative error of EGT test results predicted by the process neural network is 2.82%, and the maximum relative error is not more than 5.00%. This can satisfy the requirements of actual engineering analysis, and realize the real-time monitoring of the condition of the aircraft engine and thus allow us to choose the best time to service the aircraft engine .
226
Process Neural Networks
Table 9.4 EGT test results predicted by a process neural network Real value (0e)
Prediction value (0e)
Absolute error (0e)
Relative error
35.4000
36.8805
1.4805
4.18
2
35.9000
37.2637
1.3637
3.80
3
35.7000
35.6048
0.0952
0.27
4
37.1000
35.8792
1.2208
3.29
5
36.6000
37.7090
1.l090
3.03
6
39.7000
37.7973
0.9027
2.33
Sample No.
(%)
9.8 Application in Macro Control Process neural networks not only are suitable for solving problems in natural science research, engineering computation, etc., but also can be applied in research on the macro control of a national economy and so on. We have previously illustrated the application of process neural networks in national economic development prediction, and will take "accurate economics" as an example in analyzing possible further applications of process neural networks in national economic macro control etc. The general development of a national economy depends on the harmonious development of each trade and branch area, i.e. the development and mutual equilibrium between various trades, such as manufacturing, the energy industry, the auto industry, agriculture, science and technology, finance, telecoms, etc. In the situation where the total development goal is determinate, a macro control and prediction model can be built. In the condition where the macro goal is determinate, the development of a national economy may require an inversely rational development speed for various trades and correlative fields. Process neural networks can be used, according to historical data and the objective rules of economic development, to build a process model for the dependent relationship between the development speed .of each trade and its various dependent factors. On this basis, the process model for the relationship between the general development course of the national economy and the development of various trades can be built. Here, the development indexes of each trade and the whole national economy development indexes are all regarded as time-dependent functions (some discrete sampling data at time nodes), and thus process neural networks can be used firstly to build a multi-value functional model for this dependent relationship, and then functional optimization is carried out to find out the optimal combination of the development speeds of various trades. Generally, there is more than one economic index, i.e. the output number of process neural networks is more than one and there are several set decision
Application of Process Neural Networks
227
objectives. In order to judge whether the required objective is achieved, all of them can be considered as points in n-dimensional space, and then the minimum value of distances between each two points is calculated. When the number of objectives n becomes a continuous potential (for instance a real interval), the objective is a function of the real interval. We can build a model firstly using a process neural network with functions as its outputs and then solve the inverse functional problem. This is a problem of process optimization and prediction. Optimization means seeking and determining rational inputs such that the objective achieved is the best; prediction means computing the value of the functional under different inputs with the model built after learning. A national economic development model based on process neural networks built in this way can, according to the general development goal, control the development speed of every trade and field macroscopically, and implement the development mode of "accurate economics".
9.9 Other Applications As can be seen from the information processing mechanism and the functional approximation ability of the process neural network model, process neural networks have good adaptability in modeling and solving many time-related processes or problems dependent on the multivariate function. The previous chapters proposed possible applications of process neural networks in some fields. In principle, all kinds of actual problems with the following characteristics can be modeled and solved by process neural networks: (1) the inputs are a group of processes, i.e. time-dependent functions or even arbitrary multivariate functions; (2) the outputs are numerical values or a variety of process functions. Next, we will introduce briefly several possible applications of process neural networks. (1) Health process control Health process control is a complex system problem related to physiology, health nutrition, exercise physiology, and personal physiological indexes. According to process data of various physiological indexes and professional knowledge of healthy nutrition and exercise physiology, etc. we can build a health process control model based on process neural networks. The outputs are various ideal physical examination indexes and biochemistry examination indexes and the inputs are some health care elements of nourishment intake, exercise schedules, etc. (2) More accurate simulation of biological neural networks All kinds of artificial neural network models form information processing models built by simulating the information processing mechanism of the biological neural system. They can be combined with brain science research, with different brain functions denoted by different neural networks or the integrated models of several
228
Process Neural Networks
neural network s, while a neural network or an integrated model of several neural network s is regarded as a generalized neuron. Imitating the information processing course of biological neural networks in this way, we can construct a generalized neural network model, and accordingly simulate more precisely the network structure and the information processing course of biological neural systems. (3) Optimization of various functionals and process control In many nonlinear time-varying systems, because many inputs are "processes", running optimization can come down to functional optimization based on the proces s neural networks , which means after the functional model is built for the system, we must find proper inputs such that the corresponding outputs satisfy some optimal goals . Similarly, the process control of a nonlinear time-varying system may often come down to solving an inverse functional to achieve the expected outputs (control goal). That means, after building the functional model for the system, solving inversely the corresponding process input function of the system from the expected outputs such that the corresponding system outputs of the solved input function s are just the expected output s. The above-mentioned problem s exist generally in practical applications. For example, in the course of continuous industrial production, the technological process design and optimization for improving output and quality or saving energy and reducing loss are just problems of thiskind, Before finishing this chapter, we will introduce another possible application of process neural network s. This problem is very complex and is difficult to solve satisfactorily . However, it has a critical meaning for China and is worth working on unremittingly. (4) Simulation of soil erosion and the desertification process Soil erosion and desertification is the erosion and exfoliation process of soil under the action of wind, water, artificial factors, etc., and it is a ubiquitous natural phenomenon. It not only results in soil degeneration, but also creates sediment depo sits, water body pollution, ecological damage to the environment, and even causes sand storms . Preventing soil erosion and desertification is an important issue today within the field of ecological environmental protection. Soil erosion and desertification is an extremely complex process. It is due to many natural elements such as climate , landform, geology, soil, vegetation, etc. and due to human activities (called uniformly the erosion factor), and at the same time in its mechanism and processe s, there are physical or even chemical and biological actions. The interaction of these factors constitutes the complex soil erosion and desertification process. Quantitative description is the essential object of soil erosion and desertification research, that means we must evaluate and predict some factors such as erosion quantities in some region-annual erosion quantities, rainstorm erosion quantities, or flood season erosion quantities . The objective of modeling of the soil erosion system
Applicationof ProcessNeural Networks
229
is to describe the basic factors leading to soil erosion and the process of their interaction with mathematical methods, and then to predict soil loss quantities. At present, the main modeling method for soil erosion and desertification is to establish a soil erosion and desertification model in a different space and dimension using test data obtained by observing various test environments (for instance, indoor and open-air plot, runoff plot, and small watershed) and by simulating soil erosion using a mathematical model to estimate and predict erosion quantity and sediment production quantity. All kinds of models built at present are generally divided into experiential statistical models and physical genetic models. The former one uses observed test data and mathematical statistical methods and derives the equation computing soil loss quantity by choosing sensitive factors affecting soil erosion . The latter one applies fundamental principles of hydrology, hydraulics, agrology, river load dynamics, and other relative disciplines to describe soil erosion and sediment production based on the physical process of soil erosion, according to known rainfall and runoff conditions and to predict soil erosion quantities within a given time interval. At present, there mainly exist the following three problems in soil erosion modeling. (a) Lack of a description of the soil erosion process The existing soil erosion modeling methods usually ignore energy process delineation for soil erosion dynamic factors. For example, as the dynamic source of raindrop splashes and runoff action, the rainfall eroding force factor is a function of the particular rainfall process. Even the most widely applied USLE model only uses a few static variables (such as the rainfall, the kinetic energy of the rainfall, the maximum 15 min rainfall intensity EI l 5 and the maximum 30 min rainfall intensity E130 ) and soil loss to make a multivariate regression analysis and correlation analysis, and selects one of them, e.g., Eho to be used as the index of the rainfall eroding force. Because different regions have different rainfall processes, and rainfall processes in the same region are also different , so the established model is difficult to generalize effectively. (b) Space and scale effect The existing methods of soil erosion research are mainly to build models based on the test observation of plots. Such experimental plots may be subminiature (the slope length is only a few meters or a sand bath), ordinary (the slope length is tens of meters), or large-scale (hundreds of meters or even a valley). When generalized, the built erosion model based on this brings a scale effect problem for model transformation. As the erosion factors and sediment production process (splash, denudation, removal, and pileup) have large inhomogeneity and variability in space, this increases the complexity of the erosion and sediment production simulation and model transformation on different scales. (c) Applicability Statistical modeling and generic modeling of soil erosion are two completely different methods. The former describes complex interactions and causality of various factors by simple mathematical fitting and obtains varied empiric equation
230
Process Neural Networks
forms. The latter simplifies erosion factors and transfigures the erosion process. This makes the existing erosion models not suitable for other spatio-temporal conditions and larger dynamic scopes. Many factors affecting soil erosion (such as rainfall, landform, and vegetation, or even the erodability of the soil itself) are all functions of time, and soil erosion-sediment removal-and deposit is also a complex spatio-temporal change process. Therefore, to construct an effective soil erosion model needs new methods and measures to describe the spatio-temporal change process. When process neural networks (especially multi-aggregation process neural network models) are used for simulation and modeling of the soil erosion process, the nonlinear transformation mechanism and the process description ability of multi-aggregation process neural networks for multivariate and multidimensional spatio-temporal data can be used to delineate the time-varying process of energy distribution. They can also overcome the absence of a process in the soil erosion empirical model, simulate factually the soil erosion evolution rule and the regional spatio-temporal change rule for model parameters in the soil erosion system, and provide a system modeling method that has universality and practicability for analyzing the soil erosion process. In addition, a fuzzy process neural network can be adopted to build a soil erosion process analysis prediction model. Such networks can combine current existing objective principle cognition of soil erosion and the information processing mechanism of process neural networks and improve the dependability and the generalizability of the soil erosion prediction model. In variational problems, the domain of functional independent variables is generally a process interval related to time, so the scale effect problem of multiscale model transformation in a soil erosion model can be transformed to a variational problem of process neural networks. A new-style artificial neural network model with spatio-temporal constraint can be established to simulate the complex spatio-temporal change process of soil erosion. China is one of the nations whose soil erosion is the severest in the world. As a result, a series of environmental protection problems is generated. Possible flood disasters in the Changjiang valley and the above-ground 'suspended river' of the lower Yellow River threaten lives and possessions; the soil and water loss of the northwest Loess plateau and the severe degeneration of northeast black soil are both related closely to soil erosion and desertification. Theoretical and methodological research on modeling of the soil erosion system using process neural networks can make full use of the existing long-term soil erosion observed data to build a new-style soil erosion model that can describe the complex spatio-temporal evolution process and information processing mechanism for multi-factor comprehensive action, and has important theoretical meaning and broad application value. In this chapter, some possible applications of process neural networks in many fields have been listed. In principle, systems with procedural information (accurate or fuzzy) whose inputs are process functions or inputs/outputs are both (unitary and multivariate) process functions can be modeled and solved using process neural networks. Therefore, process neural networks have broad adaptability and scope for application in the solving of practical problems.
Application of Process Neural Networks
231
References [I] Ding G., Zhong S.S. (2008) Time series prediction using wavelet process neural network. Chinese Physics B 17(6):1998-2003 [2] Guan S.P., Lti X., Zhang YR. (2007) Short-term load forecasting based on process neural
network.
Journal
of
Northeastern
University
(Natural
Science)
28(10):1450-1453 (in Chinese) [3] Zhong S.S., Ding G. (2007) Time series prediction based on Elman process neural network and its application. Journal of Information and Computational Science 4(1):405-411 [4] Liu Z.w. , Xue H., Wang X.Y., Yang B., Lu S.Y. (2006) Study on algorithm of proces s neural network for soft sensing in sewage disposal system. In: Sixth International Symposium on Instrumentation and Control Technology: Sensors, Automatic Measurement, Control, and Computer Simulation 6358(2) [5] Xu S.H., Liu Y., He X.G. (2004) Automatic identification of water-flooded formation based on process neural network. Acta Petrolei Sinica 25(4):54-57 (in Chinese) [6] Zhang Z., Zheng M. (2008) Parameter identification method of time-varying system based on process neural network. Journal of Jiang Nan University (Natural Science Edition) 7(6):637-640 (in Chinese) [7] Li Y., Zhong S.S. (2006) Failure detection of aero-engine based on process neural network with double hidden-layers. Journal of Propulsion Technology 27(6) : 559-562 [8] Ding G., Bian X., Hou L.G., et at. (2008) Aircraft engine rotor simulated fault diagnosis using counterpropagation process neural network. Aviation Precision Manufacturing 44(5):17-20 (in Chinese) [9] Zhao X.F., Zhu C.L., Liu L.X., et al. (2002) Prediction of molecular mass of polyacrylamide with process neural networks. Chinese Journal of Applied Chemistry 19(7):637-640 (in Chinese) [10] Zhang J.H., Li H.G., Wu X.L., Guan X.P. (2008) Nonlinear system identification with delayed neural network. In: 27th Chinese Control Conference pp.7I7-720 [11] Rubio Avila J.D.J., Ferreyra Ramirez A., Aviles-Cruz C. (2008) Nonlinear system identification with a feedforward neural network and an optimal bounded ellipsoid algorithm. WSEAS Transactions on Computers 7(5) :542-551 [12] Wang W.Y., Li LH., Wang W.M., Su S.F., Wang N.J. (2005) A new convergence condition for discrete-time nonlinear system identification using a Hopfield neural network. In: 2005 IEEE International Conference on Systems, Man and Cybernetics 1:685-689 [13] Gao Q.H., Wang S.A. (2007) Identification of nonlinear dynamic system based on Elman neural network. Computer Engineering and Applications 42(31):87-89 (in Chinese) [14] Venkateswarlu C, Venkat Rao K. (2005) Dynamic recurrent radial basis function
232
Process Neural Networks
network model predictive control of unstable nonlinear processes. Chemical Engineering Science 60(23):6718-6732 [15] Tomohisa H., Wassim M.H., et al. (2008) Neural network adaptive control for a class of nonlinear uncertain dynamical systems with asymptotic stability guarantees. IEEE Transactions on Neural Networks 19(1):80-89 [16] Balasubramaniam P., Abdul SJ., Kumaresan N. (2007) Optimal control for nonlinear singular systems with quadratic performance using neural networks. Applied Mathematics and Computation 187(2) : 1535-1543 [17] Na J., Ren X.M., Huang H. (2008) Time-delay positive feedback control for nonlinear time-delay systems with neural network compensation. Acta Automatica Sinica 34(9) :1196-1203 (in Chinese) [18] Li P.e., Xu S.H. (2005) Training of procedure neural network based on continuous Walsh conversion. Computer Engineering and Design 26(3):702 -703 (in Chinese) [19] Li Y., Zhong S.S. (2006) Failure detection of aero-engine based on process neural network with double hidden-layers. Journal of Propulsion Technology 27(6): 559-562 [20] Ding G., Bian X., Hou L.G., Zhong S.S. (2008) Aircraft engine rotor simulated fault diagnosis using counterpropagation process neural network. Aviation Precision Manufacturing Technology 44(5) :17-20 (in Chinese)
Postscript
Process neural networks extend traditional neural networks in the time domain and are well suited to modeling the information processing mechanisms of the biological nervous system, from the viewpoint of neurobiology. They can perform well in solving many practical problems related to (time) processes. The research content of process neural networks is abundant and the theories and models are recent, thus many problems need to be studied and perfected further. At the time of finishing this book, the authors put forward some theoretical and practical problems worth studying further, which provide a reference for the readers. (1) Problem of functional approximation or fitting The learning problem for parameters such as the connection weight function, activation thresholds, and so on, for process neural networks can actually be extended to functional approximation or fitting. We need to study further problems such as functional approximation ability (also called the density of process neural networks in corresponding functional space), the approximation method (i.e. learning algorithm) , the approximation error estimate and generalized error prediction, etc. The theory and methods above are studied necessarily from the functional analysis point of view and by the theory and methods of functional analysis, which supports the resolution of the above-mentioned theories and calculation methods of the process neural networks in theory. (2) Continuity of the process neural network model In previous chapters, the continuity of some specific process neural network models has been studied and it is proven that process neural networks can approximate an arbitrary continuous functional. The expected models for most problems in nature are continuous (discrete process neural networks may be regarded as a special case of a continuous situation), so from the viewpoint of functional analysis, it is necessary to study the continuity of process neural networks from a unified point of view. On the other hand, biological neural networks involve much non-continuity of the output, so to meet the demands of research applications and to develop the theory, research into
234
Process Neural Networks
the properties of non-continuous process neural networks is worth pursuing.
(3) Extremum solving problem of a functional After learning and modeling, process neural networks define a functional mapping relationship between the inputs and the outputs (including situations such as continuous input/discrete output, continuous input/continuous output, discrete input/discrete output, discrete input/continuous output, etc.). The optimization of process neural networks involves finding some inputs such that the corresponding output achieves a maximum or minimum. From the perspective of functional, this is just the extremum-solving problem of a functional (functional optimization). Therefore, it is of great importance for many practical applications to study functional optimization to give a specific optimization computation method.
(4) Solving inversely independent variables of a functional from functional value (inverse functional problem) After learning and modeling, the input-output mapping relationship of process neural networks defines a functional in reality. Many practical applications can come down to choosing proper inputs so that some distance between the output and the expected result achieves the minimum. Obviously, this problem can also come down to the optimization of the process neural networks . From the viewpoint of functional analysis the problem is, for a determinate functional, to solve inversely the independent variables for which the functional value corresponds to starting from the functional value, namely an inverse functional problem (similar to an inverse function problem in mathematical analysis). For process neural networks, especially a network model with continuous inputs and continuous outputs, it is important to study the method for solving the inverse functional problem with universality for the realization of process optimization control, dynamic system process forecasting and prediction , and so on.
(5) Approximation of the compound function Compound function approximation mainly involves the approximation ability of the compound function, the design of the approximation algorithm (the learning algorithm), the estimate of approximation error, generalized error estimate, etc. Suppose that the form of the approximated function (the network expected output function) is
where h.(xI(t),x2(t), ... ,xn (t» (s=I,2, oo .,S) is called the principal component function of the input function Xl (t),x2(t), .. .,xit). The general form of the process neural network function class used to approximate the compound function G(t) can be denoted as
Postscript
235
where, F denotes the input-output mapping relationship of the process neural networks. The approximation of the compound function for the input sample X\k(t),x2k(t), .. .,xnk(t) (k=I,2, .. .,K), is to choose parameters WJ,WZ" ' .,Wm (which may be time-varying) such that
IIIF (x lk (t),X Zk(t)"",x k
nk
(t); wI'w2' ···, wm ) - Gk(1)11
achieves the minimal where Gk(t) is the corresponding function value of G(t) when the independent variables are Xlk(t),xZk(t), .. "Xnk(t) and 11 ·11 denotes some norm. Obviously, general process neural networks are all a special case of the above process neural network function class F. (6) Furtherresearch on modelsand structures of process neuralnetworks
Process neural networks have better adaptability for dynamic system modeling where inputs and outputs are process functions or with process information (precise or fuzzy). Process neural network models and structures can be further studied from the following viewpoints, with reference to practical applications: (a) Research on mathematical theory, information processing mechanisms and learning algorithms (including algorithm design and the estimate of approximation error and generalized error, etc.) for process neural networks with feedback; (b) Research on the selection of a process neuron kernel function, aggregation operator, and so on (combined with specific applications); (c) Research on process neural networks whose connection weights are easy to understand. For example, the weights in a reasoning network based on weighted fuzzy logic have quite unambiguous semantics; (d) The relationship of network scale and various included information quantities (the structure and scale of the network hidden layer, the scale of the learning sample set, the complex degree of the system, the error control precision, etc.). (7) Research on the information processing mechanism and learning algorithm for multi-aggregation process neuralnetworks
Multi-aggregation process neural networks are a form of extension to process neural networks with only a time dimension. They enhance the modeling ability of the complex system with a time and space dimension and an ability to describe the system state, and enlarge the applied value for some applications such as the recognition and classification of multi-dimensional images. But multi-aggregation process neural networks involve a multi-dimensional multivariate process aggregation operation and their information flow, aggregation mechanism, and mapping relationship are all complex. At the same time, the complexity of the learning algorithm and the computational complexity of the network all increase
236
Process Neural Networks
greatly. Thus, in-depth research into the multi-aggregation information processing mechanism and learning algorithms is of great importance for the development of practical application s of multi-aggregation process neural networks. (8) Research on relative problems of SVM (support vector machines) of process neural networks The support vector machine is a pattern classification method built on the structure risk minimization principle of statistical learning theory and at present has good application in time-invariant space. Useful research goals are: to extend support vector machines into time-varying function space, to refer to the construction idea and method of support vector machines in time-invariant space, to establish a support vector machine model in process function space, to study the selection mechanism and learning algorithm of learning samples and control generalized errors for pattern classification problems of time-varying objects. (9) Complexity problem research for various algorithms (learning, optimization, etc.) The nonlinear transform mechanism of process neural networks is quite different from that of traditional neural networks and the learning algorithm is much more complex compared to that for traditional neural networks, so it has practical meaning to study computational complexity problems for various learning algorithms of process neural networks (learning, optimization , etc.). On this basis, research seeking a simplified replacement algorithm is also very meaningful. (10) Other correlative theoretical problems .Process neural networks are a general form of traditional neural networks, and we might as well consider the combination of process neural networks and other intelligent algorithms. (a) Fuzzy neural networks regarded as process neural networks (the domain of input/output may be different); (b) The combination of process neural networks and evolutionary computation; (c) The combination of process neural networks and fuzzy computation; (d) The approximation (approximability , approximation algorithm, error estimate, generalization ability, etc.) of a logic function (or Boolean function). The purpose lies in trying to translate some "logical problems" into what can be "computed" by process neural networks in order to reveal the possible mechanism when the human brain adopts complex neural networks for logical reasoning. Epilogue This book proposed the process neural network, for which the inputs to its neurons may be process or time-varying functions. Its learning came down to the approximation of functional. Some basic theory and algorithms are solved, and some
Postscript
237
successful elementary applications are given. In the end, according to correlative theoretical and practical need, we proposed some theoretical and practical problems worth studying further. Research on fuzzy logic, neural networks, and evolutionary computation have all risen in the past 40 years, and they are the most active and successful three fields of current artificial intelligence research. The authors believe that it is an important goal for intelligent system research in future to adopt various means to combine organically fuzzy logic, neural networks, and evolutionary computation . Generalized process neural networks whose inputs and outputs are all the points of a functional space (for example, distance space, normed linear space, Banach space, and Hilbert space) with corresponding activation functional and multiple aggregation operators are actually a mathematical abstract of various existing neural network models. From a mathematical viewpoint, it sums up the current state of various existing network models. Obviously, research on this is presently insufficient: many theoretical and practical problems await further thorough research . At the end of this book, we should say that present research is still superficial : research on process neural networks is just beginning. Those working on both theoretical research and practical applications shoulder heavy responsibilities .
Index A
Activation Threshold 4, 39 Activation Function 26 Adaptive Theory 3 Adjustable Parameters 184 Aggregation Mechanism 235 Analytic Function 152 Approximation Capability 3 Approximation Theorem 64 Artificial Neural Network 3,7,23 Artificial Intelligence 1, 9 Association Analysis 31 Automata 2 Axon Hillock 21 B Back-Propagation 28 Boolean Logic 2 Bounded Function 82 Brain 4 Brain-State-In-Box Model
3
C Cascade Process Neural Network 172 Classification 3, 30 Cluster 31 Coefficient Matrix 116 Competitive Learning 178 Competitive Learning Algorithm 179 Competitive Learning Rule 179 Compound Function Approximation 234 Computational Complexity 56 Computational Intelligence 9 Computing Capability 27 Conclusion 100 Connection Weight Coefficients 35 Connectionism 3 Continuity 31
Counter Propagation Process Neural Network 184 Convergence 105 Convolution 47 D Data Mining 3 I Damped Newton Method 62 Deviation 8 Diagnosis 38 Digital Or Analog 6 Discrete Process Neural Network 166 Discrete Process Neuron 167 Discrete Time Sequences 104 Dynamic System Simulation 53 E Error 28 Error Function 90 Error Precision 91 Expert System 2 Evolutionary Computations Evolutionary Programming Evolutionary Strategy 14
9 14
F Feedback 37 Feedback Neural Network 23 Feedforward 23 Feedforward Neural Network 3, 23 Finite Basis Function 89 Finite Terms 148 Fitting Curve 91 Fitting Precision 200 Fourier Transformation 97 Funahashi Approximation Theorem 27 Function Orthogonal Basis Expansion 180
Index
Function Space 119 Functional 37,67 Functional Approximation Capability 53 Functional Approximator 161 Functional Neuron 84 Functional Space 84 Fuzzy Computing 9,15
G Gauss Function 28 Generalization 197 Generalized Error 233 Generalized Error Prediction 229 Generalized Polynomial 120 Genetic Algorithms, GA 9 Global Optimal Solution 15 Gradient Descent Algorithm 90 Gradient Vector 61 Gram Matrix 117
H Hebb Rule 3 Hecht-Nielson Approximation Theorem 26 Hessian Matrix 61 Hidden Layer 68 High Order 113 Hopfield Network 137 I Image Recognition 13 Inductive Principle 2 Infinity 102 Inner Product 98 Input/Output 143 Integrable Function 46 Integral Formula 99 Integral Mean Value Theorem 81 Intelligent Behavior 4, 14 Intelligent Decision Support 31 Intelligent System 5 Interval 95 Iteration Formula 62
J Jacobi Matrix
61
K Knowledge Engineering
239
2
L Learning Algorithm 88 Learning Sample 89 Learning Rate 90 Learning Rule 90 Least Square Method 115 Lemma 101 Limit 103 Linear Element 3 Linear Threshold 27 Lipschitz Condition 64 Local Minimum Value 91 Logical Predication 50 Logical Reasoning 6 M Machine Learning 13 Mapping Mechanism 48 Membership Function 2 Monotone Increasing 59 Multi-Aggregation 143 Multi-Aggregation Process Neural Network 143 Multi-Aggregation Process Neuron Multiple Integral 144
143
N Neural Computing 12 Neuron 16,47 Network Structure 129, 178 Newton Iteration Formula 61 Newton Method 62 III Nonlinear Mapping Norm 235 Normalize 101 Numerical Analysis 109
o One-Dimensional Search 92 Optimal Partition 121 Optimal Piecewise Approximation Optimization Algorithm 91 Orthogonal Basis 93 Orthogonal Function System 93
112
240
Index
Orthogonality
97
p
Parallel-Serial 202 Partial Derivative 115 Parallel Distributed Processing, PDP 37 Perceptron 3 Periodic Function 105 Piecewise Interpolation 108 Power 213 Process Neural Network 4, 16 Property Parameter 33
Q Quantization
Series 94 Sigmoid Function 96 Signal Conversion 97 Simulation Experiment 208 Signal Processing 159 Spatial Weighted Aggregation 37, 45 Spline Function 108 Stability 132 Standard Orthogonal Basis 93 Step Function 55 Structural Formula Process Neuron 68 Support Vector Machine, SVM 3 Synapses 21 System Simulation 4
40
R Radial-Basis Function Process Neural Network 188 Radial-Basis Process Neuron 188 Range 28 Rational Expression 113 Rational Square Approximation 112 Real Number 207 Recursive Function 2 Regression 4 Robustness 13
S Sample Point 98 Self-Organizing Competitive Neural Network 178 Self-Organizing Process Neural Network 178 Sensitivity 68 Sequence 101
T Taylor Series 115 Temporal Accumulation Operation 68 Time Granularity 77 Time-Delay Feedback 129 Time-Varying Function 39 Topological Structure 23 Topology 80 Transformation Mechanism 20, 53 Trigonometric Basis Function 94 Truth Degree II Turing 27,67 U Upper Bound
134
W Walsh Basis Function 88, 208 Wavelet Basis Function 88 Wavelet Transform 97 Weighted Fuzzy Logic 10, 32