24• Fuzzy Systems
24• Fuzzy Systems Fuzzy Control Abstract | Full Text: PDF (379K) Fuzzy Image Processing and Recogniti...
81 downloads
1334 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
24• Fuzzy Systems
24• Fuzzy Systems Fuzzy Control Abstract | Full Text: PDF (379K) Fuzzy Image Processing and Recognition Abstract | Full Text: PDF (1029K) Fuzzy Information Retrieval and Databases Abstract | Full Text: PDF (136K) Fuzzy Model Fundamentals Abstract | Full Text: PDF (2041K) Fuzzy Neural Nets Abstract | Full Text: PDF (162K) Fuzzy Pattern Recognition Abstract | Full Text: PDF (280K) Fuzzy Statistics Abstract | Full Text: PDF (284K) Fuzzy Systems Abstract | Full Text: PDF (143K) Possibility Theory Abstract | Full Text: PDF (174K)
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...0ELECTRONICS%20ENGINEERING/24.Fuzzy%20Systems.htm17.06.2008 15:55:38
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3504.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Control Standard Article Rainer Palm1 1Siemens AG, Munich, Germany Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3504 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (379K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Fuzzy Control Techniques The Fuzzy Controller as a Nonlinear Transfer Element Heuristic Control and Model-Based Control Cell Mapping Supervisory Control Adaptive Control About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3504.htm17.06.2008 15:56:51
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
92
FUZZY CONTROL
FUZZY CONTROL Fuzzy control is a control approach which is based on the concept of fuzzy sets and fuzzy logic invented by Lotfi Zadeh in 1965 (1). Fuzzy sets are noncrisp or nonsharp sets or numbers, and fuzzy logic is a logic which deals with implications or IF THEN statements using noncrisp truth values. Fuzzy control deals with IF THEN statements or IF THEN rules, respectively, but in the sense of control commands like ‘‘IF temperature is LOW THEN change current of heater by a POSITIVE HIGH value.’’ In this rule LOW and POSITIVE HIGH are fuzzy terms which are not sharply described. With the help of rules like that, one can formulate the knowledge of an operator in a complex plant with the aim to introduce an automatic control of the plant or of parts of it. Another option is to build up an advisory system by means of a set of fuzzy rules that supports the human operator making decisions. Fuzzy control is not only useful when human operators come into play but also in existing automatic control loops. Here, the fuzzy controller is a nonlinear control element that is able to improve control performance and robustness of a plant. In automatic control it is often required to have a process model available for compensation of nonlinear system’s behavior and a corresponding feed forward control. For complex systems or plants it is therefore of advantage to use fuzzy system plant models in order to simplify both the identification and the control task. The following article deals with
common fuzzy control techniques seen both from the system’s and the controller’s point of view. A special part is attended to the nonlinear nature of fuzzy control. Aspects of heuristic and model based fuzzy control are dealt with and the main points of supervisory and adaptive control are discussed. Fuzzy control in the form of set of IF-THEN fuzzy rules was initiated by E. H. Mamdani when he started an investigation of fuzzy set theory–based algorithms for the control of a simple dynamic plant (2). Østergaard reported a fuzzy control application of a heat exchanger (3), and in 1982 Holmblad and Østergaard presented a cement kiln fuzzy controller (4). However, mainly due to the attention that Japan’s industry paid to the new control technology, it was not until the late 1980s that fuzzy control became more and more accepted in industry. The commonly used technique in industrial process control is the Proportional-Integral-Differential (PID) controller, and it is used in a variety of different control schemes (e.g., adaptive, gain scheduling, and supervisory control architectures). Today, processes and plants under control are so complex that PID controllers are not sufficient even though augmented with additional adaptive, gain scheduling, and supervisory algorithms. Although there is a large number of methods and theories (5) to cope with sufficiently complex control problems in the automation, robotics, consumer and industrial electronics, car, aircraft, and ship-building industries, the restrictions for applying these methods are either too strong or too complicated to be applied in a practically efficient and inexpensive manner. Therefore, control engineers are in a need of simpler process and plant models and controller design methods far removed from the sophisticated mathematical models available and their underlying rigorous assumptions. These simpler design methods should provide good performance characteristics, and they should be robust enough with regard to disturbances, parameter uncertainties, and unmodeled structural properties of the process under control. In connection with traditional control techniques, fuzzy control provides a variety of design methods that can cope with modern control problems. There are three main aspects of fuzzy controllers that go beyond the conventional controllers designed via traditional control methods: 1. The use of IF-THEN rules. 2. The universal approximation property. 3. The property of dealing with vague (fuzzy) values. The first aspect concerns the human operator’s knowledge and its heuristic experience for controlling a plant. This knowledge is formulated in terms of IF-THEN fuzzy rules. In the same way, the plant’s behavior can also be expressed by a set of IF-THEN fuzzy rules. The major problem is to identify the fuzzy rules and the regarding parameters such that the operator’s control actions and the systems’s response are sufficiently well described (6–8). Identification of this type of fuzzy rules can be done in two ways: 1. Knowledge acquisition via the use of interviewing techniques from the area of knowledge-based expert systems. This type of identification has been applied successfully to the control of Single Input/Single Output
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
FUZZY CONTROL
(SISO) plants and processes, but is difficult to apply and verify for Multi Input/Multi Output (MIMO) control problems. 2. Black box type of identification via the use of clustering, neural nets, and genetic algorithm–based techniques. In the latter approach one distinguishes between structure identification and parameter identification. Structure identification requires structural a priori knowledge about the system to be controlled (e.g., whether the system is assumed to be linear, and what the order of the system might be). If one has to identify a plant with only little structural knowledge, one has to use algorithms that learn from data. The result of structure identification is a set of fuzzy rules. Parameter identification deals with a proper parametrization, scaling, and normalization of physical signals. Parameter identification is a comparetively simple task and can be done by classical methods (e.g., Linear Quadratic (LQ) methods and related techniques). The second aspect, the universal approximation property, means that a fuzzy system with product-based rule firing, centroid defuzzification, and Gaussian membership functions can approximate any real continuous function on a compact set to arbitrary accuracy (9–11). However, in most cases the approximation of a finite state space by a finite number of fuzzy rules is required while using triangular or trapezoidal membership functions. In this case certain approximation errors must be accepted. The approximation property is due to the overlap of the membership functions from the IF parts of the set of fuzzy rules. Because of this overlap, every rule is influenced by its neighboring rules. The result is that every point in state space is approximated by a subset of fuzzy rules. The third aspect considers control tasks where the controller inputs are fuzzy values instead of being crisp variables. In contrast to classical controllers, fuzzy controllers (FC) can also deal with fuzzy values, and even the mixture of crisp and fuzzy values becomes possible. Fuzzy values are qualitative ‘‘numbers’’ obtained from different sources. One particular source is a qualitative statement of a human operator while controlling a plant, like ‘‘temperature is high.’’ Another source may originate from a sensor that provides information about the intensity of a physical signal with in a certain interval. Here, the intensity or distribution of the signal with respect to this interval is expressed by a membership function. This article is arranged as follows: The following section deals with fuzzy control techniques, including the design goal, the definition of a fuzzy region, and the most important FC techniques for systems and controllers. Then the article deals with the fuzzy controller as a nonlinear transfer element while the computational structure of a fuzzy controller, its transfer characteristics, and its nonlinearity are discussed. Different heuristic and model-based control strategies, such as the Mamdani controller, the sliding mode fuzzy controller, the cell mapping control strategy, and the Takagi Sugeno control strategy, are discussed, a short overview of supervisory control is provided; and finally the main aspects of adaptive fuzzy control are discussed. FUZZY CONTROL TECHNIQUES FC techniques can be divided into experiential (heuristic) and model-based techniques. The choice for a special FC tech-
93
nique depends on how the system to be controlled is described. Figure 1 shows the most important FC techniques dealing with systems and controllers. The following subsection deals with the design goal of fuzzy control. In a subsequent subsection the fuzzy region is defined. Finally, the individual FC techniques for systems and controllers are outlined. The Design Goal The objective of the design in fuzzy control can be stated as follows: 1. Stabilization. In stabilization control problems, a fuzzy controller, called a stabilizer, or regulator, is to be designed so that the state vector of the closed-loop system will be stabilized around a point (operating point, or a setpoint) of the state space. The asymptotic stabilization control problem is to find a control law in terms of a set of fuzzy rules such that, starting anywhere in a region around the setpoint xd, the state vector x of the closed-loop system goes to the setpoint xd, as t goes to infinity. 2. Tracking. In tracking control problems, a fuzzy controller is to be designed so that the closed-loop system output follows a given time-varying trajectory. The asymptotic tracking problem is to find a control law in terms of a set of fuzzy rules such that starting from any initial state x0 in a region around xd(t), the tracking error x(t) ⫺ xd(t) tends to 0 while the whole state vector remains bounded. Let us stress here that perfect tracking (i.e., when the initial states imply zero tracking error) is not possible. Therefore, the design objective of having asymptotic tracking cannot be achieved. In this case, one should aim at bounded-error tracking, with small tracking errors to be obtained for trajectories of particular interest. From a theoretical point of view, there is a relationship between the stabilization and the tracking control problems. Stabilization can be regarded as a special case of tracking where the desired trajectory is a constant. On the other hand, if, for example, we have to design a tracker for the open-loop system y¨ + f (y, ˙ y, u) = 0
(1)
so that e(t) ⫽ y(t) ⫺ yd(t) tends to zero, the problem is equivalent to the asymptotic stabilization of the system, e¨ + f (e, ˙ e, u, yd , y˙d , y¨d ) = 0
(2)
its state vector components being e and e˙. Thus the tracker design problem can be solved if one designs a regulator for the latter nonautonomous open-loop system. Performance. In linear control, the desired behavior of the closed-loop system can be systematically specified in exact quantitative terms. For example, the specifications of the desired behavior can be formulated in the time domain in terms of rise time and settling time, overshoot and undershoot, etc. Thus, for this type of control, one first postulates the quanti-
94
FUZZY CONTROL
Controllers
Systems
Figure 1. FC techniques. Collection of fuzzy control techniques for systems and controllers. Mamdani controllers can, e.g., be applied to systems described by differential equations.
Differential equations
Mamdani rules
Mamdani controllers (PD, PID, SMFC)
TS controllers
Relational equations
TS rules
Relational controllers
Predictive controllers
tative specifications of the desired behavior of the closed-loop system and then designs a controller that meets these specifications (for example, by choosing the poles of the closed-loop system appropriately). As observed in Ref. 12, such systematic specifications of the desired behavior of nonlinear closed-loop systems, except for those that can be approximated by linear systems, are not obvious at all because the response of a nonlinear system (open or closed loop) to one input vector does not reflect its response to another input vector. Furthermore, a frequency domain description of the behavior of the system is not possible either. The consequence is that in specifying the desired behavior of a nonlinear closed-loop system, one employs some qualitative specifications of performance, including stability, accuracy and response speed, and robustness. Stability. Stability must be guaranteed for the model used for design (the nominal model) either in a local or in a global sense. The regions of stability and convergence are also of interest. One should, however, keep in mind that stability does not imply the ability to withstand persistent disturbances of even small magnitude. This is so since the stability of a nonlinear system is defined with respect to initial conditions, and only temporary disturbances may be translated as initial conditions. Thus stability of a nonlinear system is different from stability of a linear system. In the case of a linear system, stability always implies the ability to withstand bounded disturbances when, of course, the system stays in its linear range of operation. The effects of persistent disturbances on the behavior of a nonlinear system are addressed by the notion of robustness. Accuracy and Response Speed. Accuracy and response speed must be considered for some desired trajectories in the region of operation. For some classes of systems, appropriate design methods can guarantee consistent tracking accuracy independent of the desired trajectory, as is the case in sliding mode control and related control methods. Robustness. Robustness reflects the sensitivity of the closed-loop system to effects that are neglected in the nominal model used for design. These effects can be disturbances, measurement noise, unmodeled dynamics, etc. The closedloop system should be insensitive to these neglected effects in the sense that they should not negatively affect its stability.
Hybrid controllers
We want to stress here that the aforementioned specifications of desired behavior are in conflict with each other to some extent, and a good control system can be designed only based on tradeoffs in terms of robustness versus performance, cost versus performance, etc. Fuzzy Regions In fuzzy control, a crisp state vector x ⫽ (x1, . . ., xn)T is a state vector the values of which are defined on the closed interval (the domain) X of reals. A crisp control input vector u ⫽ (u1, . . ., un)T is a control input vector the values of which are defined on the closed interval (the domain) U of reals. The set of fuzzy values of a component xi is called the term set of xi denoted as TXi ⫽ 兵LXi1, . . ., LXimi其 (e.g., NB, NM, NS, Z, PS, PM, PB with N negative, P positive, S small, M medium, B big). LXij is defined by a membership function 兰X 애Xij(x)/x. The term set of ui is likewise denoted as TUi ⫽ 兵LUi1, . . ., LUiki其. LUij is defined by a membership function 兰U 애Uij(u)/u. An arbitrary fuzzy value from TXi is denoted as LXi that can be any one of LXi1, . . ., LXimi. An arbitrary fuzzy value from TUi will be denoted as LUi and can be any one of LUi1, . . ., LUiki. A fuzzy state vector LX ⫽ (LX1, . . ., LXn)T denotes a vector of fuzzy values. Each component x1, . . ., xn of the state vector x takes a corresponding fuzzy value LX1, . . ., LXn, where LXi 僆 TXi. A fuzzy region LXi ⫽ (LX1i , . . ., LXni )T is defined as a fuzzy state vector for which there exists a contiguous set of crisp state vectors 兵x*其, each crisp state vector satisfying the given fuzzy state vector LXi to a certain degree different from 0. The fuzzy state space is defined as the set of all fuzzy regions LXi. Example Let x ⫽ (x1, x2)T, TX1 ⫽ 兵LX11, LX12, LX13其, and TX2 ⫽ 兵LX21, LX22, LX23其. Then the total number of different fuzzy state vectors is M ⫽ 9 and the corresponding state vectors are 1. LX 1 ⫽ (LX11, LX21)T 2. LX 2 ⫽ (LX11, LX22)T 3. LX 3 ⫽ (LX11, LX23)T 4. LX 4 ⫽ (LX12, LX21)T 5. LX 5 ⫽ (LX12, LX22)T 6. LX 6 ⫽ (LX12, LX23)T 7. LX 7 ⫽ (LX13, LX21)T
FUZZY CONTROL
8. LX 8 ⫽ (LX13, LX22)T 9. LX 9 ⫽ (LX13, LX23)T FC Techniques for Systems and Controllers In this subsection we deal with systems and controllers according to the scheme shown in Fig. 1. Given a model (heuristic or analytical) of the physical system to be controlled and the specifications of its desired behavior, design a feedback control law in the form of a set of fuzzy rules such that the closed-loop system exhibits the desired behavior. The general control scheme is shown in Fig. 2. Here, we have the following notations: x is the state vector (also controller input) xd is the desired state vector u is the control input vector (also controller output)
In the following we define two basic types of nonlinear control problems: namely, nonlinear regulation (stabilization) and nonlinear tracking (12). Then we will briefly discuss the specifications of desired behavior, such as performance, stability, and robustness, in the context of nonlinear control. Stabilization and Tracking. In general, the tasks of a control system can be divided into two basic categories: Heuristic System Models. When an analytical model of the plant is not available, the control design has to be carried out on the basis of qualitative modeling. This can be done either in terms of a set of Mamdani fuzzy rules or a fuzzy relation (6,13). A typical Mamdani rule of a continuous first-order system is x is PS AND u is NB THEN x˙ is NS
(3)
and for a discrete system RSi :
x˙ = A · x + B · u
IF x(k) is PS AND u(k) is NB THEN x(k + 1) is NS
A typical fuzzy relational equation of a discrete firstorder system is X (k + 1) = X (k) ◦ U (k) ◦ S
(4)
relating the state at time k ⫹ 1 to the state and control input at time k. S is the fuzzy relation. 폶 denotes the
On the other hand, a TS fuzzy rule consists of a fuzzy antecedent part and a consequent part consisting of an analytical equation. A typical TS rule for a first-order system is IF
x
Fuzzy controller FC
u
x = LX i
THEN x˙ = Ai · x + Bi · u
(6)
where LXi is the ith fuzzy region for x, and Ai and Bi are parameters corresponding to that region.
Mamdani Controller. A Mamdani controller works in the following way: 1. A crisp value is scaled into a normalized domain. 2. The normalized value is fuzzified with respect to the input fuzzy sets. 3. By means of a set of fuzzy rules, a fuzzy output value is provided. 4. The fuzzy output is defuzzified with the help of an appropriate defuzzification method (center of gravity, height method, etc.). 5. The defuzzified value is denormalized into a physical domain. A typical Mamdani controller is RCi :
Fuzzy or crisp system S or FS
Figure 2. General control structure.
x
IF
x = LX i
THEN u = LU i
(7)
where LUi is the corresponding fuzzy value for the control variable. Relational Controller. According to the description of the system in terms of a relational equation, a typical discrete fuzzy relational equation for a controller is U (k) = X (k) ◦ C
(8)
where X is the fuzzy state, U is the fuzzy control variable, and C is the fuzzy relation. A relational controller is another representation of a Mamdani Controller. Takagi Sugeno Controller. A typical TS controller is RCi :
xd
(5)
Fuzzy controllers can be classified as follows:
y =x
RSi : IF
relational composition (e.g., max-min composition). A fuzzy relation is another representation of a Mamdani fuzzy system. Analytical Systems Models. If an analytical model of the plant is available, then the system’s behavior can be described by a set of differential equations or by a set of so-called Takagi Sugeno fuzzy rules (TS rules) (8). A typical differential equation of an open-loop system is
RSi :
where the vectors x, xd, u, are continuous functions of time. For simplicity, the output vector y is set to be equal to the state vector x:
95
IF x = LX i
THEN u = Ki · x
(9)
where LXi is the ith fuzzy region for x, and Ki is the gain corresponding to that region. Predictive Controller. A special way of predictive fuzzy control was introduced by Yasunobu (14) for automatic train operation. It includes control rules for the time k to predict the behavior of the system for the next time-
96
FUZZY CONTROL
step k ⫹ 1. By means of a performance index J(k), which appears for a specific control action u(k), different features like velocity, riding comfort, energy saving, and accuracy of a stop gap are evaluated. By means of going through the whole range of possible control actions u(k), one obtains a range of corresponding performance indices J(k) from which the control action u(k ⫹ 1) with the highest performance index J(k) is applied to the plant. A typical predictive control rule is IF the performance index J(k) ⫽ LJi is obtained AND a control value u(k) is chosen to be LUi THEN the control value to be applied to the plant for the next timestep k ⫹ 1 is chosen to be u(k ⫹ 1) ⫽ LUi. A formal description is IF J(k) = LJ i AND u(k) = LU i
THEN u(k + 1) = LU i (10)
A further relationship to model predictive control (15) can be found in Refs. 16 and 17. Hybrid Controller. A hybrid controller is represented by a mixture of fuzzy controller and conventional controller. Fuzzy hybrid controllers are, e.g., applied for tuning conventional controllers and in adaptation schemes. Another application is the use of a nonlinear fuzzy mapping in nonlinear control tasks. A typical hybrid controller appears if the control law consists of a Mamdani controller Cfuzz and an analytical feedforward term Ccomp that compensates (e.g., statical or dynamical forces in a mechanical system): u = Cfuzz (xx, x d ) + Ccomp (xx )
(11)
where xd is the desired vector. Further information can be found in Ref. 18. THE FUZZY CONTROLLER AS A NONLINEAR TRANSFER ELEMENT A fuzzy logic controller defines a control law in the form of a static nonlinear transfer element (TE) due to the nonlinear nature of the computations performed by a fuzzy controller. However, the control law of a fuzzy controller is not represented in an analytic form, but by a set of fuzzy rules. The antecedent of a fuzzy rule (IF part) describes a fuzzy region in the state space. Thus one effectively partitions an otherwise continuous state space by covering it with a finite number of fuzzy regions and, consequently, fuzzy rules. The consequent of a fuzzy rule (THEN part) specifies a control law applicable within the fuzzy region from the IF part of the same fuzzy rule. During control with a fuzzy controller, a point in the
xd
e Scaling
Figure 3. The computational structure of a fuzzy controller. The arrangement of the blocks correspond to the sequence of computation.
x
state space is affected to a different extent by the control laws associated with all the fuzzy regions to which this particular point in the state space belongs. By using the operations of aggregation and defuzzification, a specific control law for this particular point is determined. As the point moves in the state space, the control law changes smoothly. This implies that a fuzzy controller yields a smooth nonlinear control law despite the quantization of the state space in a finite number of fuzzy regions. One goal of this section is to describe computation with a fuzzy controller and its formal description as a static nonlinear transfer element and thus provide the background knowledge needed for understanding control with a fuzzy controller. Furthermore, we show the relationship between conventional and rule-based transfer elements, thus establishing the compatibility between these two conceptually different, in terms of representation, types of transfer elements. The Computational Structure of a Fuzzy Controller A control law represented in the form of a fuzzy controller directly depends on the measurements of signals and is thus a static control law. This means that the fuzzy rule-based representation of a fuzzy controller does not include any dynamics, which makes a fuzzy controller a static transfer element, like a state controller. Furthermore, a fuzzy controller is, in general, a nonlinear static transfer element that is due to those computational steps of its computational structure that have nonlinear properties. In what follows we will describe the computational structure of a fuzzy controller by presenting the computational steps that it involves. The computational structure of a fuzzy controller consists of a number of computational steps and is illustrated in Fig. 3: 1. 2. 3. 4. 5.
Input scaling (normalization) Fuzzification of controller-input variables Inference (rule firing) Defuzzification of controller-output variables Output scaling (denormalization)
The state variables x1, x2, . . ., xn (or e, e˙, . . ., e(n⫺1)) that appear in the IF part of the fuzzy rules of a fuzzy controller are also called controller inputs. The control input variables u1, u2, . . ., um that appear in the THEN part of the fuzzy rules of a fuzzy controller are also called controller outputs. We will now consider each of the computational steps for the case of a multiple-input/single-output (MISO) fuzzy controller. The generalization to the case of multiple-input/multipleoutput fuzzy controller, where there are m controller outputs u1, u2, . . ., um instead of a single controller output u, can easily be done.
Fuzzification
Rule firing
Defuzzification
Denormalization
u
FUZZY CONTROL
Input Scaling. There are two principal cases in the context of input scaling: 1. The membership functions defining the fuzzy values of the controller inputs and controller outputs are defined off-line on their actual physical domains. In this case the controller inputs and controller outputs are processed only using fuzzification, rule firing, and defuzzification. For example, this is the case of a Takagi-Sugeno fuzzy controller. 2. The membership functions defining the fuzzy values of controller inputs and controller outputs are defined offline, on a common normalized domain. This means that the actual, crisp physical values of the controller inputs and controller outputs are mapped onto the same predetermined normalized domain. This mapping, called normalization, is done by appropriate normalization factors. Input scaling is then the multiplication of a physical, crisp controller input, with a normalization factor so that it is mapped onto the normalized domain. Output scaling is the multiplication of a normalized controller output with a denormalization factor so that it is mapped back onto the physical domain of the controller outputs. The advantage of the second case is that fuzzification, rule firing, and defuzzification can be designed independent of the actual physical domains of the controller inputs and controller outputs. To illustrate the notion of input scaling, let us consider, for example, the state vector e ⫽ (e1, e2, . . ., en)T ⫽ (e, e˙, . . ., e(n⫺1))T, where for each i, ei ⫽ xi ⫺ xdi. This vector of physical controller inputs is normalized with the help of a matrix Ne containing predetermined normalization factors for each component of e. The normalization is done as eN = Ne · e
(12)
with
Ne 1 0 Ne = .. . 0
0 Ne 2 .. . 0
... ... .. . ...
0 0 .. . Ne k
(13)
where Nei are real numbers and the normalized domain for e is, say, EN ⫽ [⫺a, ⫹a].
97
eN
e
eN
e
e N = Ne e e N = Ne e Figure 4. Normalization of the phase plane. Different normalization factors Ne and Ne˙ correspond to different slopes of the line Ne ⭈ e ⫹ Ne˙ ⭈ e˙ ⫽ 0.
fects the angle of a line that divides the phase plane into two semiplanes (see Fig. 4). Furthermore, we can see how the supports of the membership functions defining the fuzzy values of e and e˙ change because of the input scaling of these controller inputs (see Fig. 5). In the next three subsections on fuzzification, rule firing, and defuzzification, we consider only the case when the fuzzy values of the controller inputs and controller outputs are defined on normalized domains (e.g., EN and UN), and in this case we will omit the lower index N from the notation of normalized domains and fuzzy and crisp values. In the subsection on denormalization we will use the lower index N to distinguish between normalized and nonnormalized fuzzy and crisp values. Fuzzification. During fuzzification a crisp controller input x* is assigned a degree of membership to the fuzzy region from the IF part of a fuzzy rule. Let LEi1, . . ., LEin be some fuzzy values taken by the controller inputs e1, . . ., en in the IF part of the ith fuzzy rule RiC of a fuzzy controller; that is, these fuzzy values define the fuzzy region LEi ⫽ (LEi1, . . ., LEin)T. Each of the preceding fuzzy values, LEik is defined by a membership function on the same (normalized) domain of error E. Thus the fuzzy value LEik is given by the membership function 兰E 애LEik(ek)/ek. Let us consider now a particular normalized crisp controller input e ∗ = (e ∗1 , . . ., e ∗n )T
(16)
from the normalized domain E. Each e*k is a normalized crisp
Example Let e ⫽ (e1, e2)T ⫽ (e, e˙)T with e = x − xd
and e˙ = x˙d − x˙d
(14)
1 1
Then input scaling of e into eN and e˙ into e˙N yields eN = Ne · e and e˙N = Ne˙ · e˙
e
(15)
where Ne and Ne˙ are the normalization factors for e and e˙, respectively. In the context of a phase plane representation of the dynamic behavior of the controller inputs, the input scaling af-
1
–a e⋅
a eN e⋅ N
Figure 5. Change of the supports of the membership functions due to input scaling. Scaling normalizes different supports for e and e˙ to a common support for eN and e¨N.
98
FUZZY CONTROL
value obtained after the input scaling of the current physical controller input. The fuzzification of the crisp normalized controller input then consists of finding the membership degree of e*k in 兰E 애LEik(ek)/ek. This is done for every element of e*.
애i(e*) of the fuzzy region LEi is computed as
Example Consider the fuzzy rule RiC given as
Second, given the degree of satisfaction 애i(e*) of the fuzzy region LEi, the normalized controller output of the ith fuzzy rule is computed as
RiC : IF e = (PSe , NMe˙ )
THEN u = PMu
(17)
where PSe is the fuzzy value POSITIVE SMALL of the controller input e, NMe˙ is the fuzzy value NEGATIVE MEDIUM of the second controller input e˙, and PMu is the fuzzy value NEGATIVE MEDIUM of the single controller output u. The membership functions representing these two fuzzy values are given in Fig. 6. In this example we have e ⫽ (e, e˙)T and thus the IF part of the preceding rule represents the fuzzy region LEi ⫽ (PSe, NMe˙)T. Furthermore, let e* ⫽ a1 and e˙* ⫽ a2 be the current normalized values of the physical controller inputs e* and e˙*, respectively, as depicted in Fig. 6. Then from Fig. 6 we obtain the degrees of membership 애PSe(a1) ⫽ 0.3 and 애NMe˙(a2) ⫽ 0.65. Rule Firing. For a multi-input/single-output fuzzy controller, the ith fuzzy rule of the set of fuzzy rules has the form RiC : IF e = LE i
THEN u = LU i
(18)
where the fuzzy region LEi from the IF part of the preceding fuzzy rule is given as LEi ⫽ (LEi1, LEi2, . . ., LEin)T. Also, LEik denotes the fuzzy value of the kth normalized controller input ek that belongs to the term set of ek given as TEk ⫽ 兵LEk1, LUk2, . . ., LUkn其. Furthermore, LUi denotes an arbitrary fuzzy value taken by the normalized controller output u, and this fuzzy value belongs to the term set TU of u; that is, TU ⫽ 兵LU1, LU2, . . ., LUn其. Let the membership functions defining the fuzzy values from LEi and LUi be denoted by 兰E 애LEik(ek)/ek (k ⫽ 1, 2, . . ., n) and 兰U 애iLUi(u)/u, respectively. The membership function 兰U 애LUi(u)/u is defined on the normalized domain U, and the membership functions 兰E 애LEik(ek)/ek are defined on the normalized domain E. Given a controller input vector e* consisting of the normalized crisp values e*1 , . . ., e*n , first the degree of satisfaction
µ i (ee ∗ ) = min µLE i (e∗1 ), µLE i (e∗2 ), . . ., µLE i (e∗n ) 1
i
CLU = U
n
2
i ∗ e µCLU i (u)/u = min µ (e ), µLU i (u)/u
(19)
(20)
U
Thus the controller output of the ith fuzzy rule is modified by the degree of satisfaction 애i(e*) of the fuzzy region LEi and hence defined as the fuzzy subset CLUi ⫽ 兰U 애CLUi(u)/u of 兰U 애LUi(u)/u. That is,
∀u:µ CLU i (u) =
µLU i (u)
if µLU i (u) ≤ µ i ,
µLU i (u) = µ i (ee∗ ) otherwise
(21)
The fuzzy set CLUi ⫽ 兰U 애CLUi(u)/u is called the clipped controller output. It represents the modified version of the controller output 兰U 애LUi(u)/u from the ith fuzzy rule given certain crisp controller input e*1 , . . ., e*n . In the final stage of rule firing, the clipped controller outputs of all fuzzy rules are combined in a global controller output via aggregation: ∀u:µCU (u) = max(µCLU 1 , . . ., µCLU M )
(22)
where CU ⫽ 兰U 애CU(u)/u is the fuzzy set defining the fuzzy value of the global controller output. The type of rule firing described here is called max-min composition. Another type of composition can be found in Ref. 40. Defuzzification. The result of rule firing is a fuzzy set CU with a membership function 兰U 애CU(u)/u, as defined in Eq. (22). The purpose of defuzzification is to obtain a scalar value u from 애CU. The scalar value u is called a defuzzified controller output. This is done by the center of gravity method as follows. In the continuous case we have
µ
NB
NM
NS
1
Z
PS
PM
U
µCU (u) du
and for the discrete case
0.3
u= a1
(23)
PB
0.65
–a
µCU (u) · u du
U
u=
a2
U
µCU (u) · u du
a e e⋅
Figure 6. Fuzzification of crisp values e* and e˙*. Fuzzification of e* ⫽ a1 with respect to a fuzzy set NM is obtained by finding the crosspoint between a1 and the corresponding membership function NM.
µCU (u)
(24)
U
Example Consider the normalized domain U ⫽ 兵1, 2, . . ., 8其 and let the fuzzy set CU be given as CU = {0.5/3, 0.8/4, 1/5, 0.5/6, 0.2/7}
(25)
FUZZY CONTROL
99
where µcu
x = input; y = output;
1
N = negative; P = positive; Z = zero; S = small; B = big
0.5
1
2
3
4
5 vcog
6
7
8
u
Figure 7. Defuzzification of a fuzzy controller output. Defuzzification of a fuzzy set 애CU is obtained by computing the u-coordinate of the center of gravity of the membership function.
Then the defuzzified controller output u is computed as (see also Fig. 7) u=
0.5 · 3 + 0.8 · 4 + 1 · 5 + 0.5 · 6 + 0.2 · 7 = 4.7 0.5 + 0.8 + 1 + 0.5 + 0.2
2. Shape and location of the corresponding membership functions are chosen so that they always overlap at the degree of membership 애X ⫽ 0.5 (see Fig. 8). 3. For the specific crisp controller input xin one obtains the degrees of membership 애XNS(xin) ⬎ 0 and 애XZ(xin) ⬎ 0, where the remaining degrees of membership 애XNB(xin), 애XPS(xin), and 애XPB(xin) are equal to zero. Hence, only rules R2 and R3 fire. The controller output set is computed by cutting the output set 애YPS at the level of 애XNS(xin) and 애YZ at 애XZ(xin). The resulting output membership function 애Y takes every rule into account, performing the union of the resulting output membership function 애YRi of each rule Ri (i ⫽ 1, . . ., 5) which means the maximum operation between them. 4. The crisp controller output y is obtained by calculating the center of gravity of the output set LY:
(26)
y= Denormalization. In the denormalization procedure the defuzzified normalized controller output uN is denormalized with the help of an off-line predetermined scalar denormalization factor N⫺1 u , which is the inverse of the normalization factor Nu. Let the normalization of the controller output be performed as uN = Nu · u
(27)
Then the denormalization of uN is u = Nu−1 · uN
+A −A
µY ( y) · y dy Ri
+A
−A
(29)
µY ( y) dy Ri
The cut operation (min operation), the max operation over all resulting fuzzy subsets LYRi, and the center of gravity are nonlinear operations that cause a nonlinear operating line between x and y. This seems to make a systematic design of a desired transfer function with the help of membership functions difficult. However, in the x domain there are operating points A1, A2, A3, A4, and A5 at which only one of the five
(28)
The choice of Nu essentially determines, together with the rest of the scaling factors, the stability of the system to be controlled. In the case of Takagi Sugeno fuzzy controllers, the preceding computational steps are performed on the actual physical domains of the controller inputs and outputs. Thus the computational steps of normalization and denormalization are not involved in the computational structure of a Takagi Sugeno fuzzy controller, which, in turn, eliminates the need for input and output scaling factors.
NB NS 1
1
Z µ
PS PB
Input set 0 –A = A1 A2 Crisp input
x 0
C2
A3 NB 1
NS
1
Z µ
C1 A4 +A= A1
PS
PB 1
The Transfer Characteristics
Output set
The way to obtain a specific input output transfer characteristics shows the following example (SISO): 0
1. Suppose there is a set of rules like
R1 : IF x = NB THEN y = PB R2 : IF x = NS THEN y = PS R3 : IF x = Z THEN y = Z R4 : IF x = PS THEN y = NS R5 : IF x = PB THEN y = NB
Support
0
0
y
a
–A
+A Crisp output center of gravity (c. o. g.)
Figure 8. Membership functions for input x and output y. The output membership function is obtained by clipping the output membership functions at the corresponding degrees of membership of the input.
100
FUZZY CONTROL
rules fires. At these operating points the center of gravity can be calculated more easily than for the intermediate points. The operating points A1, A2, A3, A4, and A5 form points in the x-y domain (see Fig. 9). The values of the transfer characteristic between the operating points may show a slight nonlinear behavior, but from a linear approximation (interpolation) between two operating points one obtains the relation between the supports of the input and output membership functions, on the one hand, and slopes required of the transfer characteristic, on the other hand. The Nonlinearity of the Fuzzy Controller In this subsection we will describe the sources of nonlinearity of the transfer characteristic of a fuzzy controller by relating them to particular computational steps. System theory distinguishes between two basic types of systems: linear and nonlinear. A system is linear if and only if it has both the additivity property and the scaling property; otherwise it is a nonlinear system. Additivity Property (Superposition Property). Let it be the case that y1 = f (x)
and y2 = f (z)
(30)
Then for the additivity property to hold, it is required that y1 + y2 = f (x + z)
(31)
f (x) + f (z) = f (x + z)
(32)
Hence, we obtain
Scaling Property (Homogeneity Property). Let it be the case that y = f (x)
(33)
Then for the scaling property to hold, it is required that α · y = f (α · x)
and α · f (x) = f (α · x)
Because of fuzzification and defuzzification, a fuzzy controller is in fact a crisp transfer element. This crisp TE has a nonlinear transfer characteristic because of the nonlinear character of fuzzification (when performed on nonlinear membership functions), rule firing, and defuzzification. The argument for this is that if one computational step within the computational structure of the TE is nonlinear, then the whole TE is nonlinear as well. Using the additivity and scaling properties of a linear system, we will now establish the linearity, or nonlinearity, of each computational step in the computational structure of a fuzzy controller with respect to these two properties. In what follows, without any loss of generality, we will use a single SISO fuzzy rule such as RC : IF
e = LE
THEN u = LU
(35)
where LE and LU are the fuzzy values taken by the normalized, single controller input e and the normalized, single controller output u, respectively. These two fuzzy values are determined by the membership functions 兰E 애LE(e)/e and 兰U 애LU(u)/u defined on the normalized domains E and U. Here again we only consider normalized domains, fuzzy and crisp values, and thus the lower index N will be omitted from the notation unless there is a need to distinguish between normalized and actual crisp and fuzzy values used within the same expression. Furthermore, let e*1 and e*2 be two normalized crisp controller inputs and u*1 and u*2 be the defuzzified controller outputs corresponding to these normalized controller inputs. Input Scaling and Output Scaling. Input scaling is linear because it simply multiplies each physical controller input e*1 and e*2 with a predetermined scalar Ne (normalization factor) to obtain their normalized counterparts e*1N and e*2N. Thus we have Ne · e∗1 + Ne · e∗2 = Ne · (e∗1 + e∗2 )
(36)
Furthermore, for a given scalar 움 we have (34)
α · Ne · e∗1 = Ne · (α · e∗1 )
(37)
Thus input scaling has the properties of additivity and scaling and is thus a linear computational step. The same is valid for output scaling since it uses N⫺1 e instead of Ne.
y 5/6a
Fuzzification. Let the membership function 兰E 애LE(e)/e defining the normalized fuzzy value LE be, in general, a nonlinear function (e.g., a triangular membership function). The fuzzification of e*1 and e*2 results in finding 애LE(e*1 ) and 애LE(e*2 ). Linearity requires
a c1 A1
A2
c2 A3
A4
A5
x
–a –5/6a
Figure 9. Transfer characteristic of a fuzzy controller. The transfer characteristic is a static input/output mapping of a fuzzy controller.
µLE (e∗1 ) + µLE (e∗2 ) = µLE (e∗1 + e∗2 )
(38)
The preceding equality cannot be fulfilled because the membership function 兰E 애LE(e)/e is, in general, nonlinear. Thus, fuzzification in the case of nonlinear membership functions is a nonlinear computational step. Rule Firing. Let the membership function 兰U 애LU(u)/u defining the normalized fuzzy value LU be, in general, a nonlin-
FUZZY CONTROL
ear function. Then the result of rule firing given the normalized crisp controller input e*1 will be ∀u:µCLU (u) = min(µLU (e∗1 ), µLU (u))
(39)
Similarly, for the normalized crisp controller input e*2 we obtain ∀u:µCLU (u) = min(µLE (e∗2 )µLU (u))
(40)
Linearity requires ∀u:µCLU (u) + µCLU (u) = min(µLE (e∗1 + e∗2 ), µLU (u))
HEURISTIC CONTROL AND MODEL-BASED CONTROL
• 兰U 애LU(u)/u is a nonlinear membership function. • 兰U 애⬘CLU(u)/u and 兰U 애⬙CLU(u)/u are nonlinear membership functions (usually defined as only piecewise linear functions). • the min-operator is nonlinear. Thus rule firing is a nonlinear computational step within the computational structure of a fuzzy controller. Defuzzification. Let defuzzification be performed with the center of gravity method. Furthermore, let u1 and u2 be the normalized defuzzified controller outputs obtained after defuzzification. That is, µCLU (u) · u du U , (42) u1 = µCLU (u) du
u2 =
U
U
µCLU (u) · u du
U
(43) µCLU (u) du
Linearity requires, however, (µCLU (u) + µCLU (u)) · u du U u1 + u2 = (µCLU (u) + µCLU (u)) du
However, in the case of a Takagi Sugeno FC-1, each single fuzzy rule is a linear TE for all controller inputs (state vectors) that belong to the center of the fuzzy region specified by the IF part of this rule. At the same time, for controller inputs outside the center of a fuzzy region, this same fuzzy rule is a nonlinear TE. Because of the latter, the set of all fuzzy rules of a Takagi Sugeno FC-1 defines a nonlinear TE. In the case of a Takagi Sugeno gain scheduler, we have that each fuzzy rule defines a linear TE everywhere in a given fuzzy region.
(41)
but the preceding equality does not hold because
101
Fuzzy control can be classified into the main directions heuristic fuzzy control and model-based fuzzy control. Heuristic control deals with plants that are unsufficiently described from the mathematical point of view, while model-based fuzzy control deals with plants for which a mathematical model is available. In this section we will describe the following control strategies: Mamdani control (MC) Sliding mode fuzzy control (SMFC) Cell mapping control (CM) Takagi Sugeno control (TS1) Takagi Sugeno control (TS2) with Lyapunov linearization The Mamdani Controller This type of fuzzy controller obtains its control strategy from expert knowledge. Since a model of the plant is not available, a simulation of the closed loop cannot be performed. Therefore, the control design is based on trial-and-error strategies, which makes the implementation of the fuzzy controller critical. The crucial point is that the behavior of the plant to be controlled is only reflected through the operator rules. However, from the control point of view this is not a satisfactory situation. Thus, one seeks methods to build qualitative models in terms of fuzzy rules. In the context of heuristic control, the so-called Mamdani control rules are used where both the antecedent and the consequent include fuzzy values. A typical control rule (operator rule) is
(44)
RCi : IF x = LX i
THEN u = LU i
(46)
U
However, the preceding equality cannot be fulfilled since instead of it we have µCLU (u) · u du (µCLU (u) · u du U U + (45) u1 + u2 = µCLU (u) du µCLU (u) du U
U
This shows that the nonlinearity of the computational step of defuzzification comes from the normalization of the products 兰U 애⬘CLU(u) ⭈ udu and 兰U 애⬙CLU(u) ⭈ udu. From all of the foregoing it is readily seen that a fuzzy controller is a nonlinear TE, its sources of nonlinearity being the nonlinearity of membership functions, rule firing, and defuzzification.
For a system with two state variables and one control variable, we have, for example, RCi : IF
x = PS AND x˙ = NB THEN u = PM
(47)
which can be rewritten into RCi :
IF
(x, x) ˙ T = (PS, NB)T
THEN u = PM
with
x = (x, x) ˙ T LX i = (PS, NB)T u=u LU i = PM
(48)
102
FUZZY CONTROL
Even if there is only a little knowledge about the system to be controlled, one has to have some ideas about the behavior of the system state vector x, its change with time x˙, and the control variable u. This kind of knowledge is structural and can be formulated in terms of fuzzy rules. A typical fuzzy rule for a system is RSi : IF x = LX i
THEN x˙ = LX˙ i (49)
AND u = LU i
e e PB
Z
PM
PS
RSi :
IF
THEN
(x, x) ˙ T = (PS, NB)T
AND u = PM
(x, ˙ x) ¨ T = (NM, PM)T
(50)
Z
PS PM PB
NS NS NM NM NB NB Z
NS NS NM NM NB
NM
P N PM PS PS Z NS NS NM Z S PM PM PS PS Z NS NS M PB PM PM PS PS Z NS B
NB
PB PB PM PM PS PS
PS Z NS
For the preceding system with two states and one control variable we have, for example,
NB NM NS
PS PS
Z
NS NS NM NM
— — — — — —
postitve negative zero small medium big
Z
Figure 10. A fuzzy controller in a diagonal form. Diagonal form means that the same fuzzy attributes appear along a diagonal.
with
x = (x, x) ˙ T LX i = (PS, NB)T x˙ = (x, ˙ x) ¨ T LX˙ i = (NM, PM)T u=u LU i = PM Once the qualitative system structure is known, one has to find the corresponding quantitative knowledge. Quantitative knowledge means the following: In general, both control rules and system rules work with normalized domains. The task is to map inputs and outputs of both the controller and the system to normalized domains. For the system, this task is identical with the identification of the system parameters. For the controller, this task is identical with the controller design (namely, to find the proper control gains). Sliding Mode Fuzzy Controller A typical Mamdani controller is the sliding mode fuzzy controller (SMFC) (19–21). Fuzzy controllers for a large class of second-order nonlinear systems are designed by using the phase plane determined by error e and change of error e˙ (22– 25). The fuzzy rules of these fuzzy controllers determine a fuzzy value for the input u for each pair of fuzzy values of error and change of error (that is, for each fuzzy state vector). The usual heuristic approach to the design of these fuzzy rules is the partitioning of the phase plane into two semiplanes by means of a sliding (switching) line. This means that the fuzzy controller has a so-called diagonal form (see Fig. 10). Another possibility is, instead of using a sliding line, to use a sliding curve like a time optimal trajectory (26). A typical fuzzy rule for the fuzzy controller in a diagonal form is IF
e = PS AND e˙ = NB THEN u = PS
(51)
where PS stands for the fuzzy value of error POSITIVE SMALL, NB stands for the fuzzy value change of error NEGATIVE BIG, and PS stands for the fuzzy value POSITIVE SMALL of the input. Each semiplane is used to define only negative or positive fuzzy values of the input u. The magnitude of a specific
positive/negative fuzzy value of u is determined on the basis of the distance 兩s兩 between its corresponding state vector e and the sliding line s ⫽ ⭈ e ⫹ e˙ ⫽ 0. This is normally done in such a way that the absolute value of the required input u increases/decreases with the increasing/decreasing distance between the state vector e and the sliding line s ⫽ 0. It is easily observed that this design method is very similar to sliding mode control (SMC) with a boundary layer (BL), which is a robust control method (12,27). Sliding mode control is applied especially to control of nonlinear systems in the presence of model uncertainties, parameter fluctuations, and disturbances. The similarity between the diagonal form fuzzy controller and SMC enables us to redefine a diagonal form fuzzy controller in terms of an SMC with BL and then to verify its stability, robustness, and performance properties in a manner corresponding to the analysis of an SMC with BL. In the following, the diagonal fuzzy controller is therefore called sliding mode fuzzy control (SMFC). However, one is tempted to ask here, What does one gain by introducing the SMFC type of controller? The answer is that SMC with BL is a special case of SMFC. SMC with BL provides a linear transfer characteristic with lower and upper bounds, while the transfer characteristic of an SMFC is not necessarily a straight line between these bounds, but a curve that can be adjusted to reflect given performance requirements. For example, normally a fast rise time and as little overshoot as possible are the required performance characteristics for the closed-loop system. These can be achieved by making the controller gains much larger for state space regions far from the sliding line than its gains in state space regions close to the sliding line (see Fig. 11). In this connection it has to be emphasized that an SMFC is a state-dependent filter. The slope of its transfer characteristic decides the convergence rate to the sliding line and, at the same time, the bandwidth of the unmodeled disturbances that can be coped with. This means that far from the sliding line higher frequencies are allowed to pass through than in the neighborhood of it. The other function of this state-dependent filter is given by the sliding line itself. That is, the velocity with which the origin is approached is determined by the slope of the sliding line s ⫽ 0. Because of the special form of the rule base of a diagonal form fuzzy controller, each fuzzy rule can be redefined in terms of the fuzzy value of the distance 兩s兩 between the state vector e and the sliding line and the fuzzy value of the input
FUZZY CONTROL
103
the rate with which the origin is approached. A fuzzy rule including this distance is of the form
µ
u1
IF
s = PS AND d = S THEN u = NS
(52)
u2 s1
s2
0
s
– du/ds Gain – du/ds1 – du/ds2 s1
s2
0
s
Figure 11. The adjustable transfer characteristic of an SMFC. The nonlinear input/output mapping of the SMFC provides a nonlinear gain for different input/output pairs.
u corresponding to this distance. This helps to reduce the number of fuzzy rules, especially in the case of higher-order systems. Namely, if the number of state variables is 2 and each state variable has m fuzzy values, the number of fuzzy rules of the diagonal form fuzzy controller is M ⫽ m2. For the same case, the number of fuzzy rules of an SMFC is only m. This is so because the fuzzy rules of the SMFC only describe the relationship between the distance to the sliding line and the input u corresponding to this distance, rather than the relationship between all possible fuzzy state vectors and the input u corresponding to each fuzzy state vector. Moreover, the fuzzy rules of an SMFC can be reformulated to include the distance d between the state vector e and the vector normal to the sliding line and passing through the origin (see Fig. 12). This gives an additional opportunity to affect e⋅
d e⋅ * s
sd =
e*
π
1 + λ2
e s
2
s = λ e + e⋅ = 0
Figure 12. The s and d parameters of an SMFC. s is the distance between the state and the line s ⫽ 0. d is the distance between the state and the line perpendicular to s ⫽ 0.
Despite of the advantages of an SMFC, it poses a number of problems the solutions of which can improve its performance and robustness. One such problem is the addition of an integrator to an SMFC in order to eliminate remaining errors in the presence of disturbances and model uncertainties. There are several ways to accomplish this. One option, for example, is to treat the integration term in the same manner as the other parameters of the IF part of the SMFC’s fuzzy rules. This and other available options will be described later in this article. Another problem is the so-called scaling of the SMFC parameters so that the domains on which their fuzzy values are defined are properly determined and optimized with respect to performance. This problem arises in the context of SMFC since the real physical domains of the SMFC parameters are normalized (i.e., their measured values are mapped on their respective normalized domains by the use of normalization factors). Thus a normalized input u is the result of the computation with SMFC. The normalized u is then consequently denormalized (i.e., mapped back on its physical domain) by the use of a denormalization factor. The determination of the proper scaling factors, via which the normalization and denormalization of the SMFC parameters is performed, is not only part of the design, but is also important in the context of adaptation and on-line tuning of the SMFC. The behavior of the closed-loop system ultimately depends on the normalized transfer characteristic (control surface) of the SMFC. This control surface is mainly determined by the shape and location of the membership functions defining the fuzzy values of the SMFC’s parameters. In this context one need pay attention to the following: 1. The denormalization factor for u influences most stability and oscillations. Because of its impact on stability, the determination of this factor has the highest priority in the design. 2. Normalization factors influence most of the SMFC sensitivity with respect to the proper choice of the operating regions for s and d. Therefore, normalization factors are second in priority in the design. 3. The proper shape and location of the membership functions and, with this, the transfer characteristics of the SMFC can influence positively the behavior of the closed-loop system in different fuzzy regions of the fuzzy state space provided that the operating regions of s and u are properly chosen through well-adjusted normalization factors. Therefore, this aspect is third in priority. A third problem is the design of SMFC for MIMO systems. The design for SISO systems can still be utilized, though some new aspects and restrictions come into play when this design is extended to the case of MIMO systems. First, we assume that the MIMO system has as many input variables ui as it has output variables yi. Second, we assume that the so-called matching condition holds (12). This condition constrains the so-called parametric uncertainties. These are, for example, imprecision on the mass or inertia of a mechanical
104
FUZZY CONTROL x2
outside the finite state space of interest is lumped together into one so-called sink cell. The state of the system of Eq. (53) while in the cell z is represented by the center point xc. Now a cell mapping C is defined by
Sink cell Regular cell xc(tk) z (tk)
x1
z (tk+1 ) = C (zz (tk ))
Image cell z (tk + 1)
Equilbrium cell z (tk ) = z (tk + 1)
Figure 13. Cell mapping principle. The state space is partitioned into a finite set of cells. Cell mapping deals with the transition behavior between cells.
system and inaccuracies on friction functions. Nonparametric uncertainties include unmodeled dynamics and neglected time delays. Let x˙ ⫽ f(x) ⫹ B ⭈ u, y ⫽ C ⭈ x be the nonlinear open-loop system to be controlled, where f is a nonlinear vector function of the state vector x, u is the input vector, B is the input matrix, y is the output vector, and C is the output matrix. Then the matching condition requires that the parametric uncertainties have to be within the range of the input matrix B. CELL MAPPING Cell mapping originates from a computational technique introduced by C. S. Hsu (28) that evaluates the global behavior and the stability of nonlinear systems. It is assumed that the computational (analytical) model of the system is available. Cell mapping was first applied to fuzzy systems by Chen and Tsao (29). The benefits of using cell mapping for fuzzy controlled systems are as follows: • Supporting of self-learning FC strategies • Creating of methodologies for the design of time optimal fuzzy controllers The basic idea of Hsu is as follows: Let a nonlinear system be described by the point mapping x (tk+1 ) = f (xx (tk ), u (tk ))
(54)
which is derived from the point mapping of Eq. (53) by computing the image of a point x(tk) and then determining the cell in which the image point is located. It is clear that not all points x(tk) in cell z(tk) have the same image cell z(tk⫹1). Therefore, only the image cell of the center x c(tk) is considered. A cell that maps to itself is called an equilibrium cell. All cells in the finite state space are called regular cells. The motivation for cell mapping is to obtain an appropriate sequence of control actions u(tk) that drive the system of Eq. (53) to an equilibrium while minimizing a predefined cost function. Therefore, every cell is characterized by the following: • The group number G(z) that denotes cells z belonging to the same periodic domain or domain of attraction • The step number S(z) that indicates the number of transitions needed to transmit from cell z to a periodic cell • The periodicity number P(z) that indicates the number of cells contributing to the periodic motion This characterization is introduced in order to find periodic motions and domains of attractions by a grouping algorithm. Applied to fuzzy control, it is evident that each cell describing the system’s behavior belongs to a corresponding fuzzy system rule. Furthermore, each cell describing a particular control action belongs to a corresponding fuzzy control rule. Smith and Comer developed a fuzzy cell mapping algorithm the aim of which is to calibrate (tune) a fuzzy controller on the basis of the cell state space concept (30). Each cell is associated with a control action and a duration, which map the cell to a minimum cost trajectory (e.g., minimum time). With a given cost function and a plant simulation model, the cell state space algorithm generates a table of desired control actions. The mapping from cell to cell is carried out by a fuzzy controller, which smoothes out the control actions while the transitions between the cells. The cell-to-cell mapping technique has been used to fine-tune a Takagi Sugeno controller (see Fig. 14) (31). Kang and Vachtsevanos developed a phase portrait assignment algorithm that is related to cell-to-cell mapping (32). In
(53)
where tk represent the discrete timesteps over which the point mapping occurs. It has to be emphasized that these timesteps need not to be uniform in duration. If one wants to create a map of the state space taking into account all possible states x and control vectors u, one obtains an infinite number of mappings even for finite domains for x and u, respectively. To simplify this mapping, the (finite) state space is divided into a finite number of cells (see Fig. 13). Cells are formed by partitioning the domain of interest of each axis xi of the state space into intervals of size si that are denoted by an integer valued index zi. Then a cell is an n-tuple (a vector) of intervals z ⫽ (z1, z2, . . ., zn)T. The remainder of the state space
Optimal control table LMS algorithm Input + –
Fuzzy controller
– + Output Plant
Figure 14. Cell mapping by Smith and Comer [Redrawn from Papa et al. 1995 (31)]. Cell mapping is used to fine-tune a Takagi Sugeno controller.
FUZZY CONTROL Search algorithm Optimal criteria Input + –
Fuzzy controller
105
Let the inputs measured be x*1 ⫽ 4 and x*2 ⫽ 60. From Fig. 16 we then obtain
Cell space
µX
Cell space Output
BIG
(x∗1 ) = 0.3 µX
BIG
(x∗2 ) = 0.35
and
Plant
Figure 15. Cell mapping by Kang and Vachtsevanos [Redrawn from Papa et al. 1995 (31)]. Cell mapping is used for construction of an optimal rule base from data.
µX
SMALL
(x∗1 ) = 0.7 µX
TS Model-Based Control Model-based fuzzy controller design starts from the mathematical knowledge of the system to be controlled (8,34). In this connection one is tempted to ask why one should use FC in this particular case while conventional control techniques work well. The reasons that apply FC in analytical known systems are as follows: 1. FC is a user-friendly and transparent control method because of its rule-based structure. 2. FC provides a nonlinear control strategy that is related to traditional nonlinear control techniques. 3. The nonlinear transfer characteristics of a fuzzy controller can be tuned by changing the shape and location of the membership functions so that adaptation procedures can be applied. 4. The approximation property of FC allows the design of a complicated control law with the help of only few rules. 5. Gain scheduling techniques can be transfered to FC. In this connection FC is used as an approximator between linear control laws.
y1 = 4 − 3 · 60 = −176 y2 = 4 + 2 · 4 = 12 So the two pairs corresponding to each rule are (0.3, ⫺176) and (0.35, 12). Thus, by taking the weighted normalized sum we get y=
R2 : if x1 is SMALL and x2 is BIG then y2 = 4 + 2 · x1 .
0.3 · (−176) + 0.35 · 12 = −74.77 0.3 + 0.35
This can be extended to differential equations in the following way: Let a fuzzy region LX i be described by the rule RSi :
IF x = LX i
THEN x˙ = A(xx i ) · x + B(xx i ) · u
(55)
This rule means that IF state vector x is in fuzzy region LXi THEN the system obeys the local differential equation x˙ ⫽ A(x i) ⭈ x ⫹ B(x i) ⭈ u. A summation of all contributing system rules provides the global behavior of the system. In Eq. (55) A(x i) and B(x i) are constant system matrices in the center of fuzzy region LX i that can be identified by classical identification procedures. The resulting system equation is x˙ =
n
wi (xx ) · (A(xx i ) · x + B(xx i ) · u )
(56)
i=1
where wi(x) 僆 [0, 1] are the normalized degrees of satisfaction of a fuzzy region LX i. The corresponding control rule (Takagi Sugeno FC1) is RCi :
Small
0.7 0
R1 : if x1 is BIG and x2 is MEDIUM then y1 = x1 − 3 · x2 .
min(0.7, 0.35) = 0.35
Furthermore, for the consequents of rules R1 and R2 we have
The description of the system starts from a fuzzy model of the system that uses both the fuzzy state space and a crisp description of the system. Let the principle of a Takagi Sugeno system be explained by the following example. Example Consider a TS system consisting of two rules with x1 and x2 as system inputs and y as the system output.
(x∗2 ) = 0.75
For the degree of satisfaction of R1 and R2, respectively, we obtain min(0.3, 0.75) = 0.3
this approach states and control variables are partitioned into different cell spaces. The x-cell space is recorded by applying a constant input to the system being simulated. Then, by means of a searching algorithm, the rule base of the funny controller is constructed such that asymptotic stability is guaranteed. This is performed by determining the optimal control actions regardless of from which cell the algorithm starts its search (see Fig. 15) (31). Hu, Tai, and Shenoi apply genetic algorithms to improve the searching algorithm using cell maps (33). The aim of this method is to tune a Takagi Sugeno controller.
MED
x1 = 4
IF x = LX i
THEN u = K(xx i ) · x
Big
Medium
(57)
Big
1
1
0.75 0.3
0.35
10
0
x2 = 60
100
Figure 16. Fuzzification procedure for a TS controller. The fuzzification procedure is the same as that for a Mamdani controller.
106
FUZZY CONTROL
and the control law for the whole state space is u=
n
wi (xx ) · K(xx i ) · x
(58)
i=1
Together with Eq. (56) one obtains the closed-loop system
x˙ =
n
wi (xx ) · w j (xx ) · (A(xx i ) + B(xx i ) · K(xx j )) · x
(59)
i, j=1
It has to be emphasized that a system described by a set of rules like Eq. (55) is nonlinear even in the vicinity of the center of the region. This is due to the fact that wi(x) depends on the state vector x. Even if wi(x) is a piecewise linear function of x, the product wi(x) ⭈ wj(x) ⭈ (A(x i) ⫹ B(x i) ⭈ K(x j)) ⭈ x in Eq. (58) will always be a nonlinear function. Model-Based Control with Lyapunov Linearization In the following we discuss the case when a mathematical model of the system to be controlled is available and the fuzzy controller is formulated in terms of fuzzy rules (8,21,34,35). In this case system and controller are formulated on different semantic levels. Let the system analysis starts from the mathematical model of the system x˙ = f (xx, u )
(60)
The control surface provides information about local and global properties of the controller. For example, the local gain for a specific state vector can be obtained by means of the tangential plane being attached to the corresponding point in state space. From this information one can conclude whether the controlled system is locally stable. Furthermore, one obtains a geometrical insight into how the control gain changes as the state trajectory moves in the state space. Another aspect is the following. To study the local behavior of the system around specific points in state space, we linearize the system around them and study the closed-loop behavior in the linearized region. Let, for example, the system of Eq. (60) be linearized around a desired state xd and a corresponding state vector ud: x˙ = f (xx d , u d ) + A(xx d , ud ) · (xx − x d ) + B(xx d , ud ) · (u u − ud ) (63) where
A(xx d , ud ) =
∂ f (xx, u ) ∂xx x
IF x = LX i
THEN u = LU i
(61)
To study stability, robustness, and performance of the closedloop system one has to bring system and controller onto the same semantic level. Thus, formally we translate the set of control rules into an analytical structure u = g (xx )
∂ f (xx, u ) u x ∂u
ud d ,u u
are Jacobians. An appropriate control law is
and let the fuzzy controller be formulated in terms of fuzzy rules (Takagi Sugeno FC1) RCi :
,u
d ,u d
and B(xx d , ud ) =
u = ud + K(xd ) · (xx − x d )
(64)
where K(xd) is the gain matrix. Since the system of Eq. (60) changes its behavior with the setpoint xd, the control law of Eq. (64) changes with the setpoint xd as well. To design the controller for the closed-loop system at any arbitrary point xd in advance, we approximate Eq. (63) by a set of TS fuzzy rules
(62)
where, in general, the function g(x) is a nonlinear control surface being a static mapping of the state vector x to the control vector u (see Fig. 17).
RSi : IF x d = LX i u − ud ) THEN x˙ = f (xx d , ud ) + A(xx i , ui ) · (xx − x d ) + B(xx i , ui ) · (u (65) The resulting system equation is (Takagi Sugeno FC2)
x˙ = f (xx d , ud ) +
n
A (xx i , ui ) · (xx − x d ) wi (xx d ) · (A
i=1
µ
(66)
u − ud )) + B (xx , u ) · (u i
i
This is a linear differential equation because the weights wi depend on the desired state vector xd instead of on x. The corresponding set of control rules is RCi : IF x d = LX i x2
x1
THEN u = ud + K(xx i ) · (xx − x d ) (67)
with the resulting control law u=
n
ud + K(xx i ) · (xx − x d ) wi (xx d ) · (u
(68)
i=1
Figure 17. Nonlinear control surface u ⫽ g(x1, x2). Nonlinear mapping is a translation of fuzzy rules into a numerical input/output relation.
Substituting Eq. (68) into Eq. (66), we obtain the equation for the closed-loop system
FUZZY CONTROL
107
be the model of the system and FC
u = g p (xx, p ) p
e xd
+ x
–
u
Controller g(x, p)
System f
x
(72)
be the control law, where p is a parameter vector that has to be determined by the supervisor. The subscript p means that with the change of p the structure of the control law may also change. Then the supervisory law can be written as p = h (xx, c )
Figure 18. Supervisory control. The supervisor changes controller parameters by means of input/output data u and x and desired values xd.
(73)
where c is the vector of conditions. For example, c = (|xx − x d | > K1; |x˙ d | < K2) T
x˙ = f (xx d , ud ) +
n
wi (xx d ) · w j (xx d ) · (A(xx i , ui )
i, j=1 i
i
(69)
j
+ B(xx , u ) · K(xx )) · (xx − x d )
IF x = LX i
Denoting A(x i, ui) ⫹ B(x i, ui) ⭈ K(x j) by Aij, asymptotic stability of x ⫺ xd is guaranteed if there exists a common positive definite matrix P such that the Lyapunov inequalities ATij P + PAi j < 0
(70)
hold, where Aij are Hurwitz matrices (34). With this result one is able to study the stability, robustness, and performance of the closed-loop system around an arbitrary setpoint xd just by considering the system at predefined operating points xi. SUPERVISORY CONTROL A commonly used control technique is supervisory control, which is a method to connect conventional control methods and so-called intelligent control methods (see Fig. 18). This control technique works in such a way that one or more controllers are supervised by a control law on a higher level. Applications to supervisory control for a milling machine and a steam turbine are reported in Refs. 36 and 37. Normally, the low-level controllers perform a specific task under certain conditions. These conditions can be • Keeping a predefined error between desired state and current state • Performing a specific control task (e.g., approaching a solid surface by a robot arm) • Being at a specific location of the state space Usually, supervisors intervene only if some of the predefined conditions fail. If so, the supervisor changes the set of control parameters or switches from one control strategy to another. Often, supervisory algorithms are formulated in terms of IF-THEN rules. Fuzzy IF-THEN rules avoid hard switching between set of parameters or between control structures. It is therefore useful to build fuzzy supervisors in the cases when ‘‘soft supervision’’ is required. A formal approach may be the following. Let x˙ = f (xx, u )
where K1 and K2 are constant bounds. The corresponding supervisory fuzzy rule is
(71)
AND c = LC i
THEN
p = pi
with LCi ⫽ (兩x ⫺ xd兩 ⬎ K1i; 兩x˙d兩 ⬍ K2i)T. Supervision is related to gain scheduling. The distinction between the two is that gain scheduling changes the controller gains with respect to a slowly time varying scheduling variable while the control structure is preserved (38–40). On the other hand, supervision can both change the control gains and the control structure and can deal with fast-changing system parameters as well (41). ADAPTIVE CONTROL Many dynamic systems have a known structure but uncertain or slowly varying parameters. Adaptive control is an approach to the control of such systems. Adaptive controllers, whether designed for linear or nonlinear systems, are inherently nonlinear. We distinguish between direct and indirect adaptive control methods. Direct adaptive methods start with sufficient knowledge about the system structure and its parameters. Direct change of controller parameters optimizes the system’s behavior with respect to a given criterion. In contrast, the basic idea of indirect adaptive control methods is to estimate the uncertain parameters of the system under control (or, equivalently, the controller parameters) on-line, and use the estimated parameters in the computation of the control law. Thus an indirect adaptive controller can be regarded as a controller with on-line parameter estimation. There do exist systematic methods for the design of adaptive controllers for the control of linear systems. There also exist adaptive control methods that can be applied to the control of nonlinear systems. However, the latter methods require measurable states and a linear parametrization of the dynamics of the system under control (i.e., that parametric uncertainty be expressed linearly in terms of a number of adjustable parameters). This is required in order to guarantee stability and tracking convergence. However, when adaptive control of nonlinear systems is concerned, most of the adaptive control methods can only be applied to SISO nonlinear systems. Since robust control methods are also used to deal with parameter uncertainty, adaptive control methods can be considered as an alternative and complimentary to robust control methods. In principle, adaptive control is superior to
108
FUZZY CONTROL
robust control in dealing with uncertainties in uncertain or slowly varying parameters. The reason for this is the learning behavior of the adaptive controller: Such a controller improves its performance in the process of adaptation. On the other hand, a robust controller simply attempts to keep a consistent performance. Furthermore, an indirect adaptive controller requires little a priori information about the unknown parameters. A robust controller usually requires reasonable a priori estimates of the parameter bounds. Conversely, a robust controller has features that an adaptive controller does not possess, such as the ability to deal with disturbances, quickly varying parameters, and unmodeled dynamics. In control with a fuzzy controller, there exist a number of direct adaptive control methods aimed at improving the fuzzy controller’s performance on-line. The FC’s parameters that can be altered on-line are the scaling factors for the input and output signals, the input and output membership functions, and the fuzzy IF-THEN rules. An adaptive fuzzy controller, its adjustable parameters being the fuzzy values and their membership functions, is called a self-tuning fuzzy controller. An adaptive fuzzy controller that can modify its fuzzy IFTHEN rules is called a self-organizing fuzzy controller. Detailed description of the design methods for these two types of direct adaptive fuzzy controllers can be found in Ref. 42. Descriptions of indirect adaptive fuzzy controllers can be found in Ref. 8. The methods for the design of a self-tuning fuzzy controller can be applied independent of whether its fuzzy IF-THEN rules are derived using model-based fuzzy control or a heuristic design approach and are thus applicable to the different types of fuzzy controllers. Since tuning and optimization of controllers is related to adaptive control, we use Fig. 19 to illustrate this relationship. In this scheme an adaptation block is arranged above the controller to force the closed-loop system to behave according to a parallel installed reference model. The task is to change the parameters of the controller by means of the adaptation block. Tuning or optimization is performed with the following steps: 1. Optimization criteria are needed that are sufficient for a relevant improvement of the behavior of the system
under control. One criterion mostly used is the integral criterion
T
u ) dt (eeT Qee + uT Ru
J=
(74)
0
where e ⫽ x ⫺ xd is the error, and Q, R are weighting matrices. Another performance criterion can be formulated by fuzzy rules; for example,
IF rise time = SMALL AND settling time = MEDIUM THEN performance = HIGH 2. The next point is to choose an appropriate optimization technique (e.g., gradient decent with constant searching step width, or Rosenbrock’s method with variable searching step widths). 3. A crucial point is to choose a tuning hierarchy (43) that considers the different impacts of the control parameters on stability, performance, and robustness of the closed-loop system: Tune the output scaling factors. Tune the input scaling factors. Tune the membership functions. BIBLIOGRAPHY 1. L. A. Zadeh, Fuzzy sets, Inf. Control, 8: 338–353, 1965. 2. E. H. Mamdani and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man-Machine Studies, 7 (1): 1–13, 1975. 3. J. J. Østergaard, Fuzzy logic control of a heat exchanger process. In M. M. Gupta (ed.), Fuzzy Automata and Decision Processes, Amsterdam: North-Holland, 1977, pp. 285–320. 4. L. Holmblad and J. J. Østergaard, Control of a cement kiln by fuzzy logic. In M. M. Gupta and E. E. Sanchez (eds.), Fuzzy Information and Decision Processes, Amsterdam: North-Holland, 1982, pp. 389–399. 5. M. Vidyasagar, Nonlinear Systems Analysis, Englewood Cliffs, NJ: Prentice Hall, 1993. 6. W. Pedrycz, Fuzzy Control and Fuzzy Systems, 2nd revised ed., Research Studies, 1992. 7. W. Pedrycz, Fuzzy control engineering: Reality and challenges, IEEE Int. Conf. Fuzzy Syst. 1995, Fuzz-IEEE/IFES’95, Proc., Yokohama, March 1995, pp. 437–446. 8. T. Takagi and M. Sugeno, Fuzzy identification of systems and its applications to modelling and control, IEEE Trans. Syst. Man Cybern., SMC-15: 1, 116–132, 1985.
Model
Optimization xd
+ e x
–
Controller g(x, p)
u
System f
⁄
Adaptation
–x
9. L. Wang, Fuzzy systems are universal approximators, IEEE Int. Conf. Fuzzy Syst. 1992, Fuzz-IEEE’92, Proc., San Diego, March 8–12, pp. 1163–1169.
+x
x
Figure 19. Adaptive control. Using a model of the system the scheme shows an indirect adaptation strategy. Direct adaptation works without an explicit model of the plant.
10. B. Kosko, Fuzzy systems as universal approximators, IEEE Int. Conf. Fuzzy Syst. 1992, Fuzz-IEEE’92, Proc., San Diego, March 8–12, 1992, pp. 1153–1162. 11. L. Koczy and S. Kovacs, Linearity and the cnf property in linear fuzzy rule interpolation, IEEE Int. Conf. Fuzzy Syst. 1994, FuzzIEEE’94, Proc., Orlando, June 26–29, 1994, pp. 870–875. 12. J-J. E. Slotine and W. Li, Applied Nonlinear Control, Englewood Cliffs, NJ: Prentice-Hall, 1991.
FUZZY IMAGE PROCESSING AND RECOGNITION 13. R. M. Tong, Some properties of fuzzy feedback systems, IEEE Trans. Syst. Man Cybern., SMC-10: 327–330, 1980.
109
35. U. Rehfuess and R. Palm, Design of Takagi-Sugeno controllers based on linear quadratic control, Proc. First Int. Symp. Fuzzy Logic, Zurich, Switzerland, May 26–27, 1995, pp. C10–C15.
14. S. Yasunobu and S. Miyamoto, Automatic train operation system by predictive fuzzy control. In M. Sugeno (ed.), Industrial Applications of Fuzzy Control, New York: Elsevier Science, 1985, pp. 1–18.
36. R. H. Haber et al., Two approaches for a fuzzy supervisory control system of a vertical milling machine, VI IFSA Congress, Sao Paulo, Brazil, 1995, pp. 397–400.
15. C. E. Garcı´a, D. M. Prett, and M. Morari, Model predictive control: Theory and practice—a survey, Automatica, 25 (3): 335– 348, 1989.
37. V. V. Badami et al., Fuzzy logic supervisory control for steam turbine prewarming automation, 3rd IEEE Int. Conf. Fuzzy Syst., Orlando, 1994, pp. 1045–1050.
16. J. Valente de Oliveira, Long-range predictive adaptive fuzzy relational control, Fuzzy Sets Syst., 70: 337–357, 1995.
38. W. J. Rugh, Analytical framework for gain scheduling, IEEE Control Syst. Mag., 11 (1): 79–84, 1991.
17. I. Scrjanc, K. Kavsek-Biasizzo, and D. Matko, Fuzzy predictive control based on fuzzy model, EUFIT ’96, Aachen Germany, 1996, pp. 1864–1869.
39. R. A. Nichols, R. T. Reichert, and W. J. Rugh, Gain scheduling for H-infinity controllers: A flight control example, IEEE Trans. Control Syst. Technol., 1: 69–79, 1993.
18. C. W. de Silva and A. G. J. MacFarlane, Knowledge-based control with applications to robots. Lecture Notes in Control and Information Sciences 123, Springer-Verlag, Berlin, 1989.
40. J. S. Shamma, Analysis and design of gain scheduled control systems, Ph.D. thesis No. LIDS-TH-1770, Lab. for Information and Decision Sciences, MIT, Cambridge, MA.
19. G.-C. Hwang and S.-C. Li, A stability approach to fuzzy control design for nonlinear systems, Fuzzy Sets Syst., 48: 279–287, 1992.
41. L-X Wang, Supervisory controller for fuzzy control systems that guarantees stability, 3rd IEEE Int. Conf. Fuzzy Syst., Orlando, 1994, pp. 1035–1039.
20. S. Kawaji and N. Matsunaga, Fuzzy control of VSS type and its robustness, IFSA’91 Brussels, July 7–12, 1991, preprints vol. ‘‘Engineering,’’ pp. 81–88. 21. R. Palm, Sliding mode fuzzy control, IEEE Int. Conf. Fuzzy Syst. 1992, Fuzz-IEEE’92, Proc., San Diego, March 8–12, 1992, pp. 519–526. 22. K. S. Ray and D. D. Majumder, Application of circle criteria for stability analysis of linear SISO and MIMO systems associated with fuzzy logic controller, IEEE Trans. Syst. Man Cybern., 14: 345–349, 1984. 23. K. S. Ray, S. Ananda, and D. D. Majumder, L-stability and the related design concept for SISO linear systems associated with fuzzy logic controller, IEEE Trans. Syst. Man Cyber., 14: 932– 939, 1992. 24. K. L. Tang and R. J. Mulholland, Comparing fuzzy logic with classical control designs, IEEE Trans. Syst. Cybern., SMC-17: 1085–1087, 1987. 25. B. A. M. Wakileh and K. F. Gill, Use of fuzzy logic in robotics, Computers in Industry, 10: 35–46, 1988. 26. S. M. Smith, A variable structure fuzzy logic controller with runtime adaptation, Proc. FUZZ-IEEE’94, Orlando, Florida, July 26– 29, 1994, pp. 983–988. 27. V. J. Utkin, Variable structure systems: A survey, IEEE Trans. Autom. Control, 22: 212–222, 1977. 28. C. S. Hsu, A theory of cell-to-cell dynamical systems, J. Appl. Mech., 47: 940–948, 1980. 29. Y. Y. Chen and T. C. Tsao, A description of the dynamical behavior of fuzzy systems, IEEE Trans. Syst. Man Cybern., 19: 745– 755, 1989. 30. S. M. Smith and D. J. Comer, An algorithm for automated fuzzy logic controller tuning, Proc. IEEE Int. Conf. Fuzzy Syst. 1992, pp. 615–622. 31. M. Papa, H-M. Tai, and S. Shenoi, Design and evaluation of fuzzy control systems using cell mapping, VI IFSA World Congress, Sao Paulo, Brazil, 1995, pp. 361–364. 32. H. Kang and G. Vachtsevanos, Nonlinear fuzzy control based on the vector field of the phase portrait assignment algorithm, Proc. Amer. Control Conf. 1990, pp. 1479–1484. 33. H-T. Hu, H-M. Tai, and S. Shenoi, Incorporating cell map information in fuzzy controller design, 3rd IEEE Int. Conf. Fuzzy Syst., Orlando, 1994, pp. 394–399. 34. K. Tanaka and M. Sugeno, Stability analysis and design of fuzzy control systems, Fuzzy Sets Syst., 45: 135–156, 1992.
42. D. Driankov, H. Hellendoorn, and M. Reinfrank, An Introduction to Fuzzy Control, 2nd ed., Berlin: Springer-Verlag, 1996. 43. R. Palm, Tuning of scaling factors in fuzzy controllers using correlation functions, Proc. FUZZ-IEEE’93, San Francisco, California, March 28–April 1, 1993, pp. 691–696.
RAINER PALM Siemens AG
FUZZY CONTROL. See FUZZY LOGIC; POSSIBILITY THEORY.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3506.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Image Processing and Recognition Standard Article Sankar K. Pal1 1Machine Intelligence Unit, Indian Statistical Institute, Calcutta, IN Copyright © 2007 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3506.pub2 Article Online Posting Date: June 15, 2007 Abstract | Full Text: HTML PDF (1029K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
Abstract This article describes various fuzzy set theoretic tools and explores their effectiveness in representing/describing various uncertainties that might occur in an image recognition system and the ways these uncertainties can be managed in making a decision. Some examples of uncertainties that often develop in the process of recognizing a pattern are given in the next section. The Image Ambiguity and Uncertainty Measures Section provides a definition of image and describes various fuzzy set theoretic tools for measuring information on grayness ambiguity and spatial ambiguity in an image. Concepts of bound functions and spect sets charactering the flexible in membership functions are discussed in Their applications to formulate some low level vision operations (e.g., enhancement, segmentation, skeleton extraction, and edge detection), whose outputs are crucial and responsible for the overall performance of a vision system, are then presented. Some real-life applications (e.g., motion frame analysis, character recognition, remote sensing image analysis, content-based image retrieval, and brain MR image segmentation) of these methodologies and tools are then described. Finally, conclusions and discussion are provided. Introduction Uncertainties in a Recognition System and Relevance of Fuzzy Set Theory Image Ambiguity and Uncertainty Measures Grayness Ambiguity Measures Flexibility in Membership Functions Some Examples of Fuzzy Image Processing Operations
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20...S%20ENGINEERING/24.%20fuzzy%20systems/W3506.htm (1 of 2)17.06.2008 15:57:10
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3506.htm
Some Applications Conclusions and Discussion Acknowledgment Keywords: pattern recognition; machine learning; fuzzy set theory; ambigvity; image processing; vision system About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20...S%20ENGINEERING/24.%20fuzzy%20systems/W3506.htm (2 of 2)17.06.2008 15:57:10
FUZZY IMAGE PROCESSING AND RECOGNITION
INTRODUCTION Pattern recognition and machine learning form a major area of research and development that encompasses the processing of pictorial and other non-numerical information obtained from interaction between science, technology, and society. A motivation for this spurt of activity in this field is the need for the people to communicate with computing machines in their natural mode of communication. Another important motivation is that scientists are also concerned with the idea of designing and making intelligent machines that can carry out certain tasks as we human beings do, the most salient outcome of which is the concept of future generation computing systems. The ability to recognize a pattern is an essential requirement for sensory intelligent machines. Pattern recognition is a must component of the so-called “Intelligent Control Systems,” which involve processing and fusion of data from different sensors and transducers. It is also a necessary function providing “failure detection,” “verification,” and “diagnosis task.” Machine recognition of patterns can be viewed as a two fold task, consisting of learning the invariant and common properties of a set of samples characterizing a class, and of deciding that a new sample is a possible member of the class by noting that it has properties common to those of the set of samples. Therefore, the task of pattern recognition by a computer can be described as a transformation from the measurement space M to the feature space F and finally to the decision space D. When the input pattern is a gray tone image, some processing tasks such as enhancement, filtering, noise reduction, segmentation, contour extraction, and skeleton extraction are performed in the measurement space to extract salient features from the image pattern, which is what is basically known as image processing. The ultimate aim is to make its understanding, recognition, and interpretation from the processed information available from the image pattern. Such a complete image recognition/interpretation system is called a vision system, which may be viewed as consisting of three levels, namely, low level, mid level, and high level, corresponding to M, F, and D with an extent of overlapping among them. In a pattern recognition or vision system, uncertainty can develop at any phase of the aforesaid tasks resulting from the incomplete or imprecise input information, the ambiguity/ vagueness in input image, the ill-defined and/or overlapping boundaries among the classes or regions, and the indefiniteness in defining/extracting features and relations among them. Any decision taken at a particular level will have an impact on all higher level activities. It is therefore required for a recognition system to have sufficient provision for representing these uncertainties involved at every stage, so that the ultimate output (results) of the system can be associated with the least uncertainty (and not be affected or biased very much by the earlier or lower level decisions).
UNCERTAINTIES IN A RECOGNITION SYSTEM AND RELEVANCE OF FUZZY SET THEORY Some of the uncertainties that one encounters often while designing a pattern recognition or vision (1, 2) system will be explained in this section. Let us consider, first of all, the problem of processing and analyzing a gray tone image pattern. A gray tone image possesses some ambiguity within the pixels because of the possible multivalued levels of brightness. This pattern indeterminacy is because of inherent vagueness rather than randomness. The conventional approach to image analysis and recognition consists of segmenting (hard partitioning) the image space into meaningful regions, extracting its different features (e.g., edges, skeletons, centroid of an object), computing the various properties of and relationships among the regions, and interpreting and/or classifying the image. As the regions in an image are not always crisply defined, uncertainty can occur at every phase of the aforesaid tasks. Any decision taken at a particular level will have an impact on all higher level activities. Therefore, a recognition system (or vision system) should have sufficient provision for representing the uncertainties involved at every stage (i.e., in defining image regions, its features and relations among them, and in their matching) so that it retains as much as possible the information content of the original input image for making a decision at the highest level. The ultimate output (result) of the system will then be associated with least uncertainty (and, unlike conventional systems, it will not be biased or affected very much by the lower level decisions). For example, consider the problem of object extraction from a scene. Now, the question is, “How can someone define exactly the target or object region in a scene when its boundary is ill-defined?” Any hard thresholding made for its extraction will propagate the associated uncertainty to the following stages, which might affect its feature analysis and recognition. Similar is the case with the tasks of contour extraction and skeleton extraction of a region. From the aforesaid discussion, it becomes therefore convenient, natural, and appropriate to avoid committing ourselves to a specific (hard) decision (e.g., segmentation/thresholding, edge detection, and skeletonization) by allowing the segments or skeletons or contours to be fuzzy subsets of the image, with the subsets being characterized by the possibility (degree) of a pixel belonging to them. Prewitt (3) first suggested that the results of image segmentation should be fuzzy subsets rather than ordinary subsets. Similarly, for describing and interpreting ill-defined structural information in a pattern, it is natural to define primitives (line, corner, curve, etc.) and relations among them using labels of fuzzy sets. For example, primitives that do not lend them-selves to precise definition may be defined in terms of arcs with varying grades of membership from 0 to 1 representing its belonging to more than one class. The production rules of a grammar may similarly be fuzzified to account for the fuzziness in physical relation among the primitives, thereby increasing the generative power of a grammar for syntactic recognition (4) of a pattern.
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright © 2007 John Wiley & Sons, Inc.
2
Fuzzy Image Processing and Recognition
The incertitude in an image pattern may be explained in terms of grayness ambiguity, spatial (geometrical) ambiguity or both. Grayness ambiguity means “indefiniteness” in deciding a pixel as white or black. Spatial ambiguity refers to “indefiniteness” in shape and geometry (e.g., in defining centroid, sharp edge, perfect focusing, etc.) of a region. Another kind of uncertainty exists that my derive from the subjective judgment of an operator in defining the grades of membership of the object regions. This process is explained in the section on Flexibility in Membership Functions. Let us now consider the problem of determining the boundary or shape of a class from its sampled points or prototypes. Various approaches (5–7) are described in the literature that attempt to provide an exact shape of the pattern class by determining the boundary such that it contains (passes through) some of the sample points, which need not be true. It is necessary to extend the boundaries to some extent to represent the possible uncovered portions by the sampled points. The extended portion should have lower possibility to be in the class than the portions explicitly highlighted by the sample points. The size of the extended regions should also decrease with the increase of the number of sample points, which leads one to define a multivalued or fuzzy (with continuum grade of belonging) boundary of a pattern class (8, 9). Similarly, the uncertainty in classification or clustering of image points or patterns may develop from the overlapping nature of the various classes or image properties. This overlapping may result from fuzziness or randomness. In the conventional classification technique, it is usually assumed that a pattern may belong to only one class, which is not necessarily true. A pattern may have degrees of membership in more than one class. It is therefore necessary to convey this information while classifying a pattern or clustering a data set. In the following section, we explain various fuzzy set theoretic tools for image analysis (which were developed based on the realization that many of the basic concepts in pattern analysis, for example the concept of an edge or a corner, do not lend themselves to precise definition).
light of fuzzy set theory are available in Reference 10. Let us now explain the various image information measures (deriving from both fuzziness and randomness) and tools as well as their relevance to different operations for image processing and analysis. These measures are classified mainly in two groups, namely grayness ambiguity and spatial ambiguity. GRAYNESS AMBIGUITY MEASURES The definitions of some of the measures that were formulated to represent grayness ambiguity in an image X with dimension M × N and levels L (based on individual pixel as well as a collection of pixels) are listed below. rth Order Fuzzy Entropy:
where sir denotes the ith combination (sequence) of r pixels in X, k is the number of such sequences, and µ(sir ) denotes the degree to which the combination sir , as a whole, possesses some image property µ. Hybrid Entropy:
with
Here, µmn denotes the degree of “whiteness” of the (m,n)th pixel; Pw and Pb denote probability of occurrences of white (µmn = 1) and black (µmn = 0) pixels respectively; and Ew and Eb denote the average likeliness (possibility) of interpreting a pixel as white and black, respectively. Correlation:
IMAGE AMBIGUITY AND UNCERTAINTY MEASURES An L level image X (M × N) can be considered as an array of fuzzy singletons, each having a value of membership denoting its degree of possessing some property (e.g., brightness, darkness, edginess, blurredness, texture, etc.) In the notation of fuzzy sets, one may therefore write that
where µX (xmn ) or µmn denotes the grade of possessing such a property µ by the (m, n)th pixel. This property µ of an image may be defined using global information, local information, or positional information, or a combination there of, depending on the problem. Again, the aforesaid information can be used in a number of ways (in their various functional forms), depending on individuals opinion and/or the problem to his hand, to define a requisite membership function for an image property. Basic principles and operations of image processing and pattern recognition in the
with
Here, µ1mn and µ2mn denote the degree of possessing the properties µ1 and µ2 , respectively, by the (m, n)th pixel and C(µ1 , µ2 ) denotes the correlation between two such properties µ1 and µ2 (defined over the same domain). These expressions (eqs. 2–6) are the versions extended to the 2-D image plane from those defined (ll,12) for a fuzzy set. H r (X) gives a measure of the average amount of difficulty in taking a decision whether any subset of pixels of size r possesses an image property. Note that no probabilistic concept is needed to define it. If r = 1, H r (X) reduces to (non-normalized) entropy as defined by De Luca
Fuzzy Image Processing and Recognition
and Termini (13). Hhy (X), on the other hand, represents an amount of difficulty in deciding whether a pixel possesses a certain property µmn by making a prevision on its probability of occurrence it is assumed here that the fuzziness occurs because of the transformation of the complete white (0) and black pixels (1) through a degradation process, thereby modifying their values to lie in the intervals [0,0.5] and [0.5,1], respectively). Therefore, if µmn denotes the fuzzy set “object region”, then the amount of ambiguity in deciding µmn a member of object region is conveyed by the term hybrid entropy depending on its probability of occurrence. In the absence of fuzziness (i.e., with exact defuzzification of the gray pixels to their respective black or white version), Hhy reduces to the two-state classical entropy of Shannon (14), the states being black and white. As a fuzzy set is a generalized version of an ordinary set, the entropy of a fuzzy set deserves to be a generalized version of classical entropy by taking into account not only the fuzziness of the set but also the underlying probability structure. In that respect, µhy can be regarded as a generalized entropy such that classical entropy becomes its special case when fuzziness is properly removed. Note that equations (2) and (3) are defined using the concept of logarithmic gain function. Similar expressions using exponential gain function (i.e., defining the entropy of an n-state system) have been given by Pal and Pal (15– 18).
All these terms, which give an idea of “indefiniteness” or fuzziness of an image, may be regarded as the measures of average intrinsic information that is received when one has to make a decision (as in pattern analysis) to classify the ensembles of patterns described by a fuzzy set. H r (X) has the following properties: Pr 1: Hr attains a maximum if µi = 0.5 for all i. Pr 2: Hr attains a minimum if µi = 0 or 1 for all i. Pr 3: Hr > H∗r , where H∗r is the rth-order entropy of a sharpened version of the fuzzy set (or an image). r r Pr 4: Hr is, in general, not equal to H , where H is the rth-order entropy of the complement set. Pr 5: Hr ≤ Hr +1 when all µi ∈ [0.5,1]. H r ≥ H r+1 when all µi ∈ [0,0.5]. The “sharpened” or “intensified” version of X is such that µx∗ (xmn ) ≥ µx (xmn )
if
µx (xmn ) ≥ 0.5
µx∗ (xmn ) ≤ µx (Xmn ) if
µx (xmn ) ≤ 0.5
(8)
and
When r = 1, the property Pr 4 is valid only with the equal sign. The property Pr 5 (which does not occur for r = 1) implies that Hr is a monotonically non-increasing function of r for µi ∈ [0,0.5] and a monotonically nondecreasing function of r for µi ∈ [0.5,1] (when the “min” operator has been used to get the group membership value). When all µi values are the same, H 1 (X) = H 2 (X) = . . . = r H (X), which is because the difficulty in taking a decision regarding possession of a property on an individual is the
3
same as that of a group selected therefrom. The value of Hr would, of course, be dependent on the µi values. Again, the higher the similarity among singletons (supports), the quicker is the convergence to the limiting value of Hr . Based on this observation, an index of similarity of supports of a fuzzy set may be defined as S = H1 /H2 (when H2 = 0, H1 is also zero and S is taken as 1). Obviously, when µi ∈ [0.5,1] and the min operator are used to assign the degree of possession of the property by a collection of supports, S will lie in [0, 1] as Hr ≤ Hr+1 . Similarly, when µi ∈ [0,0.5], S may be defined as H2 /Hl so that S lies in [0, 1]. The higher the value of S, the more alike (similar) are the supports of the fuzzy set with respect to the fuzzy property µ. This index of similarity can therefore be regarded as a measure of the degree to which the members of a fuzzy set are alike. The details are available in Reference 19. Therefore, the value of first order fuzzy entropy (H1 ) can only indicate whether the fuzziness in a set is low or high. In addition, the value of Hr ,r > 1 also enables one to infer whether the fuzzy set contains similar supports (or elements). The similarity index thus defined can be successfully used for measuring interclass and intraclass ambiguity (i.e., class homogeneity and contrast) in pattern recognition and image processing problems. H1 (X) is regarded as a measure of the average amount of information (about the gray levels of pixels) that has been lost by transforming the classic pattern (two-tone) into a fuzzy (gray) pattern X. Further details on this measure with respect to image processing problems are available in References 10 and 20–22. It is to be noted that H 1 (X) reduces to zero whenever µmn is made 0 or 1 for all (m, n ), no matter whether the resulting defuzzification (or transforming process) is correct. In the following discussion, it will be clear how Hhy takes care of this situation. Let us now discuss some of the properties of Hhy (X). In the absence of fuzziness when MNPb pixels become completely black (µmn = 0) and MNPW pixels become completely white (µmn = 1), then Ew = Pw , Eb = Pb and Hhy boils down to the two-state classical entropy
the states being black and white. Thus Hhy reduces to Hc only when a proper defuzzification process is applied to detect (restore) the pixels. |Hhy − Hc | can therefore be treated as an objective function for enhancement and noise reduction. The lower the difference, the less the fuzziness associated with the individual symbol and the higher the accuracy in classifying them as their original value (white or black). (This property is lacking with the H1 (X) measure and the measure of Xie and Bedrosian (23), which always reduces to zero or some constant value irrespective of the defuzzincation process.) In other words, |Hhy − Hc | represents an amount of information that was lost by transforming a two-tone image to a gray tone. For a given Pw and Pb , (Pw + Pb = 1, 0 ≤ Pw , Pb , ≤ 1), of all possible defuzzifica-tions, the proper defuzzification of the image is the one for which Hhy is minimum.
4
Fuzzy Image Processing and Recognition
and
For example, Hhy takes a constant value and becomes independent of Pw and Pb , which is logical in the sense that the machine is unable to make a decision on the pixels because all µmn values are 0.5. Spatial Ambiguity Measures Based on Fuzzy Geometry of Image Many of the basic geometric properties of and relationships among regions has been generalized to fuzzy subsets. Such an extension, called fuzzy geometry (24–28), includes the topological concept of connectedness, adjacency and surroundedness, convexity, area, perimeter, compactness, height, width, length, breadth, index of area coverage, major axis, minor axis, diameter, extent, elongatedness, adjacency, and degree of adjacency. Some of these geometrical properties of a fuzzy digital image subset (characterized by piece-wise constant membership function µX(xmn ), or simply µ are listed below with illustrations. These properties may be viewed as providing measures of ambiguity in the geometry (spatial domain) of an image. Compactness (24):
and
The length/breadth of an image fuzzy subset gives its longest expansion in the column/row direction. If µ is crisp, µmn = 0 or l, then length/breadth is the maximum number of pixels in a column/row. Comparing equations 17 and 18 with 15 and 16, we notice that the length/breadth takes the summation of the entries in a column/row first and then maximizes over different columns/rows, whereas the height/width maximizes first the entries in a column/row and then sums over different columns/rows. Index of Area Coverage (26, 27):
In the non-fuzzy case, the 10AC has a value of 1 for a rectangle (placed along the axes of measurement). For a circle, this value is πr 2 /(2r ∗ 2r) = π/4. 10AC of a fuzzy image represents the fraction (which may be improper also) of the maximum area (that can be covered by the length and breadth of the image) actually covered by the image. Again, note the following relationships.
and where
and
Here, a(µ) denotes area of µ, and p(µ), the perimeter of µ, is just the weighted sum of the lengths of the arcs A(i,j, k) (24) along which the region µ(i) and µ(j) meet, weighted by the absolute difference of these values. Physically, compactness means the fraction of maximum area (that can be encircled by the perimeter) actually occupied by the object. In the non-fuzzy case, the value of compactness is maximum for a circle and is equal to 1/4π. In the case of the fuzzy disc, where the membership value is only dependent on its distance from the center, this compactness value is ≥ 1/4π. Of all possible fuzzy discs, compactness is therefore minimum for its crisp version. Height and Width (24):
and
So, height/width of a digital picture is the sum of the maximum membership values of each row/column. Length and Breadth (26, 27):
When equality holds for equation (20), the object is either vertically or horizontally oriented. Similarly, major axis, minor axis, center of gravity, and density are also defined in Reference 27. Degree of Adjacency (27): The degree to which two crisp regions S and T of an image are adjacent is defined as
Here, d(p) is the shortest distance between p and q, q is a border pixel (BP) of T, and p is a border pixel of S. The other symbols have the same meaning as in the previous discussion. The degree of adjacency of two regions is maximum (= 1) only when they are physically adjacent (i.e., d(p) = 0) and their membership values are also equal [i.e., µ(p) = r(q)]. If two regions are physically adjacent, then their degree of adjacency is determined only by the difference of their membership values. Similarly, if the membership values of two regions are equal, their degree of adjacency is determined by their physical distance only. The readers may note the difference between equation (22) and the adjacency definition given in Reference 24. FLEXIBILITY IN MEMBERSHIP FUNCTIONS As the theory of fuzzy sets is a generalization of the classic set theory, it has greater flexibility to capture faithfully the various aspects of incompleteness or imperfection (i.e.,
Fuzzy Image Processing and Recognition
deficiencies) in information of a situation. The flexibility of fuzzy set theory is associated with the elasticity property of the concept of its membership function. The grade of membership is a measure of the compatibility of an object with the concept represented by a fuzzy set. The higher the value of membership, the less the amount (or extent) to which the concept represented by a set needs to be stretched to fit an object. As the grade of membership is both subjective and dependent on context, some difficulty of adjudging the membership value still remains. In other words, the problem is how to assess the membership of an element to a set, which is an issue where opinions vary, giving rise to uncertainties. Two operators, namely “Bound Functions” (29) and “Spectral Fuzzy Sets” (30), have been defined to analyze the flexibility and uncertainty in membership function evaluation. These operators are explained below along with their significance in image analysis and pattern recognition problems. Consider, for example, a “bright image,” which may be considered as a fuzzy set. It is represented by an S-type function that is a nondecreasing function of gray value. Now, the question is, “can any such nondecreasing function be taken to represent the above fuzzy set?” Intuitively, the answer is “no.” Bounds for such an S-type membership function µ have been reported (29) based on the properties of fuzzy correlation (11). The correlation measure between two membership functions µ1 and µ2 relates the variation in their functional values. The significance of the bound functions in selecting an S-type function µ for the image segmentation problem has been reported in detail in Reference 31. It has been shown that, for detecting a minimum in the valley region of a histogram, the window length w of the function µ: [0, w] → [0,1] should be less than the distance between two peaks around that valley region. The ability to make the fuzzy set theoretic approach flexible and robust will be demonstrated further in the next section. The concept of spectral fuzzy sets is used where, instead of a single unique membership function, a set of functions reflecting various opinions on membership elements is available so that each membership grade is attached to one of these functions. By giving due respect to all the opinions available for further processing, it reduces the difficulty (ambiguity) in selecting a single function. A spectral fuzzy subset F having n supports is characterized by a set or a band (spectrum) of r membership functions (reflecting r opinions) and may be represented as
5
The (dis)similarity between the concept of spectral fuzzy sets and those of the other tool such as probabilistic fuzzy set, interval-valued fuzzy set, fuzzy set, of type 2, or ultra fuzzy set (32–36) (which have also considered the difficulty in settling a definite degree of fuzziness or ambiguity), has been explained in Reference 30. The concept has been found to be significantly useful (30) in segmentation of ill-defined regions where the selection of a particular threshold becomes questionable as far as its certainty is concerned. In other words, questions may develop like, “where is the boundary?” or “what is the certainty that a level 1, say, is a boundary between object and background?” The opinions on these queries may vary from individual to individual because of the differences in opinion in assigning membership values to the various levels. In handling this uncertainty, the algorithm gives due respect to various opinions on membership of gray levels for object region, minimizes the image ambiguity d(= d1 + d2 ) over the resulting band of membership functions, and then makes a soft decision by providing a set of thresholds (instead of a single one) along with their certainty values. A hard (crisp) decision obviously corresponds to one with maximum d value (i.e. the level at which opinions differ most). The problems of edge detection and skeleton extraction (where incertitude occurs from ill-defined regions and various opinions on membership values) and any expert system-type application (where differences in experts’ opinions leads to an uncertainty) may also be similarly handled within this framework.
SOME EXAMPLES OF FUZZY IMAGE PROCESSING OPERATIONS Let us now describe some algorithms to show how the aforesaid information measures and geometrical properties can be incorporated in handling uncertainties in various operations (e.g., gray level thresholding, enhancement, contour detection and skeletonization by avoiding hard decisions, and providing output in both fuzzy and nonfuzzy (as a special case) versions). is to be noted that these low level operations (particularly image segmentation and object extraction) play a major role in an image recognition system. As mentioned earlier, any error made in this process might propagate to feature extraction and classification.
Enhancement in Property Domain
where r, the number of membership functions, may be called the cardinality of the opinion set. µiF (x j ) denotes the degree of belonging of Xj to the set F according to the ith membership function. The various properties and operations related to it have been defined by Pal and Das Gupta (30). The incertitude or ambiguity associated with this set is two-fold, namely ambiguity in assessing a membership value to an element (d1 ) and ambiguity in deciding whether an element can be considered to be a member of the set (d2 ).
The objective of enhancement techniques is to process a given image so that the result is more suitable than the original for a specific application. The term “specific” is, of course, problem-oriented. The techniques used here are based on the modification of pixels in the fuzzy property domain of an image (10, 20, 2l). The contrast intensification operator on a fuzzy set A generates another fuzzy set A = INT(A) in which the fuzziness is reduced by increasing the values of µA (xmn ) that are above 0.5 and decreasing those that are below it. Define this INT operator by a transformation T1 of the member-
6
Fuzzy Image Processing and Recognition
ship function µmn as T1 (µmn ) = T1 (µmn ) = 2µ2mn , 0 ≤ µmn ≤ 0.5 = T1 (µmn ) = 1 − 2(1 − µmn )2 , 0.5 ≤ µmn m = 1, 2, . . . M, n = 1, 2, . . . N
(21)
In general, each µmn in X (Eq. 1) may be modified to µmn to enhance the image X in the property domain by a transformation function Tr where µmn = Tr (µmn ) = Tr (µmn ), 0 ≤ µmn ≤ 0.5 (22) = Tr (µmn ), 0.5 ≤ µmn ≤ 1 r = 1, 2, . . . The transformation function Tr is defined as successive applications of T1 by the recursive relationship (20)
and T1 (Pmn ) represents the operator INT denned in equation (24). As r increases, the enhancement function (curve) in µmn − µmn plane tends to be steeper because of the successive application of INT. In the limiting case, as r →∞, Tr produces a two-level (binary) image. It is to be noted here that, corresponding to a particular operation of T , one can use any of the multiple operations of T , and vice versa, to attain a desired amount of enhancement. Similarly, some other enhancement functions can be used independently instead of those used in equation (24). The membership plane µmn for enhancing contrast around a cross-over point may be obtained from References 11 and 20.
where the position of cross-over points bandwidth, and hence the symmetry of the curve, are determined by the fuzzifiers Fe and Fd . When ªx = xmax (maximum level in X), µmn represents an S-type function. When ªx = any arbitrary level l,µmn represents a π-type function. After enhancement in the fuzzy property domain, the enhanced spatial domain xmn may be obtained from
where α is the value of µmn when xmn = 0. Note that the aforesaid method provides a basic module of fuzzy enhancement. In practice, one may use it with other smoothing, noise cleaning, or enhancement operations Tor resulting in desired outputs. An extension of this concept to enhance the contrast among various ill-defined regions using multiple applications of π and (1 − π) functions has been described in References 21 and 37 for edge detection of X-ray images. The edge detection operators involve max and min operations. Reference 38 demonstrates, in this regard, an attempt to use a relaxation (iterative) algorithm for fast image enhancement using various orders of S functions; convergence has also been analyzed. Fuzzy image enhancement technique has been applied by Krell et al. (39) for enhancing the quality of images taken by electronic postal imaging device needed by clinicians to verify the shape and the location of “therapy beam” with respect to the patients anatomy. Lukac et al. (40) performed cDNA microarray image processing using fuzzy vector filtering framework. Various other fuzzy enhancement operators have been developed to reduce degradation
in images (41–48). Reference 49 uses a fuzzy regularization approach to carry out blind image deconvolution. Recently, fuzzy techniques have also been used in impulse noise detection and reduction (50). Furthermore, the concept fuzzy transformation has been developed for low level image processing applications (51). Optimum Enhancement Operator Selection When an image is processed for visual interpretation, it is ultimately up to the viewers to judge its quality for a specific application and how well a particular method works. The process of evaluation of image quality therefore becomes subjective, which makes the definition of a wellprocessed image an elusive standard for comparison of algorithm performance. Again, it is customary to have an iterative process with human interaction to select an appropriate operator for obtaining the desired processed output. For example, consider the case of contrast enhancement using a nonlinear functional mapping. Not every kind of nonlinear function will produce a desired (meaningful) enhanced version. The questions that automatically develop are “Given an arbitrary image, which type of nonlinear functional form will be best suited without prior knowledge on image statistics (e.g., in remote applications like space autonomous operations where frequent human interaction is not possible) for highlighting its object?” and “Knowing the enhancement function, how can one quantify the enhancement quality for obtaining the optimal one?” Regarding the first question, even if the image statistics are given, it is possible only to estimate approximately the function required for enhancement and the selection of the exact functional form still needs human interaction in an iterative process. The second question, on the other hand, needs individual judgment, which makes the optimum decision subjective. The method of optimization of the fuzzy geometrical properties and entropy has been found (52) to be successful, when applied on a set of different images, in providing quantitative indices to avoid such human iterative interaction in selecting an appropriate nonlinear function and to make the task of subjective evaluation objective. Threshold Selection (Fuzzy Segmentation) Given an L level image X of dimension M × N with minimum and maximum gray values lmin and lmax , respectively, the algorithm for its fuzzy segmentation into object and background may be described as follows: Step 1: Construct the membership plane using the standard S function as
or
(depending on whether the object regions possess higher or lower gray values) with cross-over
Fuzzy Image Processing and Recognition
point b and band width b = b − a = c − b. Step 2: Compute the parameter I(X) where I(X) represents either grayness ambiguity or spatial ambiguity, as stated earlier, or both. Step 3: Vary b between lmin and lmax and select those b for which I(X) has local minima or maxima depending on I(X). (Maxima correspond to the correlation measure only.) Among the local minima/maxima, let the global one have a cross-over point at s.
The level s, therefore, denotes the cross-over point of the fuzzy image plane µmn , which has minimum grayness and/or geometrical ambiguity. The µmn plane then can be viewed as a fuzzy segmented version of the image X. For the purpose of nonfuzzy segmentation, we can take s as the threshold (or boundary) for classifying or segmenting an image into object and background. Faster methods of computation of the fuzzy parameters are explained in Reference 27. Note that w = 2b is the length of the window (such that [0, w] → [0,1]). that was shifted over the entire dynamic range. As w decreases, the possibility of detecting some undesirable thresholds (spurious minima) increases because of the smaller value of b. On the other hand, an increase in w results in a higher value of fuzziness and thus leads toward the possibility of losing some of the weak minima. The criteria regarding the selection of membership functions and the length of window (i.e., w) have been reported in References 29 and 31 assuming continuous functions for both histogram and membership function. It is shown that µ should satisfy the bound criteria derived based on the correlation flexibility in membership functions (section). Another way of handling this uncertainty using spectral fuzzy sets for providing a soft decision is explained in Referece 30. Let us now describe another way of extracting an object by minimizing higher order entropy (Eq. 2) of both object and background regions using an inverse π function as shown by the solid line in Fig. 1. Unlike the previous algorithm, the membership function does not need any parameter selection to control the output. Suppose s is the assumed threshold so that the gray level ranges [1, s] and [s + 1,L] denote, respectively, the object and background of the image X. The inverse π-type function to obtain µmn values of X is generated by taking the union of S[x; s − (L −s),s,L] and 1 − S(x; l,s, (s + s + s −1)], where S denotes the standard S function. The resulting function as shown by the solid line makes µ lie in [0.5,1]. As the ambiguity (difficulty) in deciding a level as a member of the object or the background is maximum for the boundary level s, it has been assigned a membership value of 0.5. Ambiguity decreases as the gray value moves away from s on either side. The µmn thus obtained denotes the degree of belonging of a pixel xmn to either object or background. As is not necessarily the mid point of the entire gray scale, the membership function may not be a symmetric one. Therefore, the task of object extraction is to:
7
Step 1: Compute the rth-order fuzzy entropy of the object HOr and the background HBr considering only the spatially adjacent sequences of pixels present within the object and background, respectively. Use the “min” operator to get the membership value of a sequence of pixels. Step 2: Compute the total rth-order fuzzy entropy of the partitioned image as Hsr = HOr + HBr . Step 3: Minimize Hsr with respect to s to get the threshold for object background classification. Referring back to the section on Grayness Ambiguity Measures, it is seen that H 2 reflects the homogeneity among the supports in a set in a better way than H 1 does. The higher the value of r, the stronger is the validity of this fact. Thus, considering the problem of object–background classification, the improper selection of the threshold is more strongly reflected by Hr than H r−1 . The methods of object extraction (or segmentation) described above are all based on gray level thresholding. Another way of doing this task is by pixel classification. The details on this technique, using fuzzy c-means, fuzzy isodata, fuzzy dynamic clustering, and fuzzy relaxation, are available in References (2, 10, and 53–60). The fuzzy cmeans (FCM) algorithm is a well-known clustering algorithm used for pixel classification. Here, we describe it in brief. Fuzzy segmentation results in fuzzy partitions of X = {x1 , x2 , . . . , xn }, where X denotes a set of n unlabeled column vectors in RP (i.e., each element of X is a p-dimensional feature vector). A fuzzy c-partition (c is an integer, 1 ≤ c ≤ n) is the matrix U = [µik ], i = 1, 2, . . . , c, k = 1, . . . , n that satisfies the following constraints:
Here, the kth column of U represents membership values of xk to the c fuzzy subsets and µik = µi (xk ) denotes the grade of membership of xk in the ith fuzzy subset. The FCM algorithm searches the local minimum of the following objective function:
where U isafuzzy c-partition of X, · A is any inner product norm, V = {v1 , v2 , . . . , vc } is a set of cluster centers, vi ∈ R p , and m ∈ [1, ∞] is the weighting exponent on each fuzzy membership. For m > 1 and xk = vi for all i, k, it has been shown that Jm (U, V) may be minimized only if
and
The FCM algorithm, when Euclidian distance norm is considered, can only be used for hyperspherical clusters with approximately equal dimensions. To cope with clusters
8
Fuzzy Image Processing and Recognition
having large variability in cluster shapes, densities, and the number of data points in each cluster, Gustafson and Kessel (61) used the scaled Mahalanobis distance in the FCM algorithms. By the use of a distance measure derived from maximum likelihood estimation methods, Gath and Geva (62) obtained an algorithm that is eifective even when the clusters are ellipsoidal in shape and unequal in dimension. As the value of c (i.e., the number of clusters) is not always known, several cluster validity criteria have been suggested in the literature to find the optimum number of clusters. These criteria include partition coefficient, classification entropy, properties coefficient, total within-class distance of clusters, total fuzzy hyper volume of clusters, and partition density of clusters (60–63). Generalizing the FCM algorithm further, Dave (64) proposed the fuzzy c shells (FCS) algorithm to search for clusters that are hyper ellipsoidal shells. One of its advanced versions is believed to be better than Hough transformation (in terms of memory and speed of computation) when used for ellipse detection. It is also shown (64) that the use of fuzzy memberships improves the ability to attain global optima compared with the use of hard membership. For the same purpose, Krishnapuram et al. (65) proposed another algorithm that is claimed to be less time consuming than that of Dave. For further information, readers may consult References 66–69. The article in Reference 66 describes a modified version of the FCM, which incorporates supervised training data. The article of Cannon et al. (67) describes an approach that reduces the computation required for the FCM, by using look up tables, by a factor of six. Another simplified form of FCM in this line is mentioned in Reference 68. The authors in Reference (69) have proposed a new heuristic fuzzy clustering technique and have referred to it as the Fuzzy J-Means (FJM). Soft decision making has been used to develop many other segmentation algorithms for various applications such as document image processing, ultrasound image processing, satellite image analysis, MR image analysis, and remote sensing (70–78). Algorithms for applications such as classification of MR brain images (79) and microcalcification detection (80) have been succesfully implemented.using fuzzy techniques. Before leaving this section, we mention the work in Reference (81), which defines the concept of fuzzy objects and describes algorithms for their extraction.
Contour Detection Edge detection is also an image segmentation technique where the contours/boundaries of various regions are extracted based on the detection of discontinuity in grayness. Here we present a method for fuzzy edge detection using an edginess measure based on H 1 (Eq. 2), which denotes an amount of difficulty in deciding whether a pixel can be 3 called an edge (19). Let Nxy be a 3× 3 neighborhood of a E pixel at (x,y). The edge–entropy Hxy of the pixel (x,y), giving a measure of edginess at (x,y), may be computed as follows. For every pixel (x,y), compute the average, maximum, and 3 minimum values of gray levels over Nxy . Let us denote the average, maximum, and minimum values by Avg, Max, and Min, respectively. Now define the following parameters.
A π-type membership function (Fig. 2) is then used to 3 compute µxy for all (x, y) ∈ Nxy such that µ(A) = µ(C) = 0.5 and µ(B) = 1. It is to be noted that µxy ≥ 0.5. Such a µxy , therefore, characterizes a fuzzy set “pixel intensity close 3 to its average value,” averaged over Nx,y . When all pixel 3 values over Nx,y are either equal or close to each other (i.e., they are within the same region), such a transformation will make all µxy = 1 or close to 1. In other words, if no edge exists, pixel values will be close to each other and the µ values will be close to one (1); thus resulting in a low value of H 1 . On the other hand, if an edge does exist 3 (dissimilarity in gray values over Nx,y ), then the µ values will be more away from unity; thus resulting in a high value 3 of H 1 . Therefore, the entropy H 1 over Nx,y can be viewed as E a measure of edginess (Hx,y ) at the point (x,y). The higher E the value of Hx,y , the stronger the edge intensity and the easier its detection. Such an entropy plane will represent the fuzzy edge detected version of the image. The proposed entropic measure is less sensitive to noise because of the use of a dynamic membership function based on a local neighborhood. The method is also not sensitive to the direction of edges. Other edginess measures and algorithms based on fuzzy set theory are available in References 10, 21, and 37.
Figure 1. Inverse π function (solid line) for computing object and background entropy.
Fuzzy Image Processing and Recognition
9
Figure 2. π Function for computing edge entropy.
Fuzzy Skeleton Extraction Let us now explain two methods for extracting the fuzzy skeleton (skeleton having an ill-defined boundary) of an object from a gray tone image without getting involved into its (questionable) hard thresholding. The first one is based on minimization of the parameter 1OAC (Eq. 19) or compactness (Eq. 12) with respect to α-cuts (α-cut of a fuzzy set A comprises all elements of X whose membership value is greater than or equal to α, 0 < α ≤ 1) over a fuzzy “core line” (or skeleton) plane. The membership value of a pixel to the core line plane depends on its property of possessing maximum intensity, and property of occupying vertically and horizontally middle positions from the -edges (pixels beyond which the membership value in the fuzzy segmented image becomes less than or equal to , > 0) of the object (82). If a nonfuzzy (or crisp) single-pixel-width skeleton is deserved, it can be obtained by a contour tracing algorithm (83) that takes into account the direction of contour. Note that the original image cannot be reconstructed, like the other conventional techniques of gray skeleton extraction (2–85) from the fuzzy skeleton obtained here. The second method is based on fuzzy medial axis transformation (FMAT) (28) using the concept of fuzzy disks. A fuzzy disk with center P is a fuzzy set in which membership depends only on the distance from P. For any fuzzy set f, a maximal fuzzy disk gP f ≤ f exists centered at every point P, and f is the sup of the gP f s. (Moreover, if f is fuzzy convex, so is every gP f , but not conversely.) Let us call a set Sf of points f-sufficient if every gP f ≤ gQ f for some set of Q in Sf ; evidently f is then the sup of the gQ f s. In particular, in a digital image, the set of Q’s at which gf is a (non-strict) local maximum is f-sufficient. This set is called the fuzzy medial axis of f, and the set of gQ f s is called the fuzzy medial axis transformation (FMAT) of f. These definitions reduce to the standard one if f is a crisp set. For a gray tone image X (denoting the non-normalized fuzzy “bright image” plane), the FMAT algorithm computes, first of all, various fuzzy disks centered at the pixels and then retains a few (as small as possible) of them, as designated by gQ’s, so that their union can represent the entire image X. That is, the pixel value at any point t can be obtained from a union operation, as t has membership value equal to its own gray value (i.e., equal to its nonnormalized membership value to the bright image plane) in one of those retained disks. Note that the above representation is redundant (i.e., some more disks can further be deleted without affecting the reconstruction). The redundancy in pixels (fuzzy disks) from the fuzzy medial axis output can be reduced by considf ering the criterion gP f (t) ≤ sup gQi (t), i = 1, 2, . . . instead f f of gP (t) ≤ gQ (t). In other words, eliminate many other
gP f s for which there exists a set of gQ f s whose sup is greater than or equal to gP f . Let RFMAT denote the FMAT after reducing its redundancy. The fuzzy medial axis is seen to provide a good skeleton of the darker (higher intensity) pixels in an image apart from its exact representation. FMAT of an image can be considered as its core (prototype) version for the purpose of image matching. It is to be mentioned here that such a representation may not be economical in a practical situation. The details on this feature and the possible approximation to make it practically feasible are available in Reference (86) Note that the membership values of the disks contain the information of image statistics. For example, if the image is smooth, the disk will not have abrupt change in its values. On the other hand, it will have abrupt change in case the image has salt and pepper noise or edginess. The concept of fuzzy MAT can therefore be used as spatial filtering (both high pass and low pass) of an image by manipulating the disk values to the extent desired and then putting them back while reconstructing the processed image. A gray-scale thinning algorithm is described in References 60 and 87 based on the concept of fuzzy connectedness between two pixels; the dark regions can be thinned without ever being explicitly segmented. SOME APPLICATIONS Here we provide a few applications of the methodologies and tools described before. Motion Frame Analysis and Scene Abstraction With rapid advancements in multimedia technology, it is increasingly common to have time-varied data like video as computer data types. Existing database systems do not have the capability of search within such information. It is a difficult problem to automatically determine one scene from another because no precise markers exist that identify where they begin and end. Moreover, divisions of scenes can be subjective, especially if transitions are subtle. One way to estimate scene transitions is to approximate the change of information between each of two successive frames by computing the distance between their discriminatory properties. A solution is provided in Reference 88 to the problem of scene estimation/abstraction of motion video data in the fuzzy set theoretic framework. Using various fuzzy geometrical and information measures (see image Ambiguity and uncertainty measures section) as image features, an algorithm is developed to compute the change of information in each of two successive frames to classify scenes/frames.
10
Fuzzy Image Processing and Recognition
Frame similarity is measured in terms of weighted distance in fuzzy feature space. This categorization process of raw input visual data can be used to establish structure for correlation. The investigation not only attempts to determine the discrimination ability of the fuzziness measures for classifying scenes, but also enhances the capability of nonlinear, frame-accurate access to video data for applications such as video editing and visual document archival retrieval systems in multimedia environments. Such an investigation is carried out in NASA Johnson Space Center, Texas (88). A set of digitized videos of previous space shuttle missions obtained from NASA/JSC was used (Fig. 3). The scenes were named payload deployment, onboard astronaut, remote manipulator arm, and mission control room. Experiments were conducted for various combinations of uncertainty, orientation, and shape measures. As an illustration, Fig. 4 shows a result when entropy, compactness, length/height was considered as a feature set for computing distance between two successive frames. Here the abscissa represents the total number of frame distances in the sampled time series, and the ordinate is the compound distance value between two successive images. Each scene consists of six frames. Therefore, a change of scene occurs at every sixth index on the abscissa. The scene separation is denoted with vertical grid lines. The effectiveness of the aforesaid fuzzy geometrical parameter is also demonstrated (89) for recognizing overlapping fingerprints with a multilayer perceptron. In the last decade, substantial advancement has occurred in video and motion analysis using fuzzy sets. Recently, a new video shot boundary detection technique using fuzzy logic has been proposed in Reference 90. The authors in Reference 91 used fuzzy C-planes clustering to propose a motion estimation technique, which is an important block in most of the video processing systems. Other applications such as traffic handling (92) in video processing have also been implemented using fuzzy techniques. Handwritten Character Recognition Handwritten characters, like all patterns of human origin, are examples of ill-defined patterns. Hence, the recognition of handwritten characters is a very promising field for the application of pattern recognition techniques using the fuzzy approach. It has been claimed that the concept of vagueness underlying fuzzy theory is more appropriate for describing the inherent variability of such systems than the probabilistic concept of randomness. An important application of handwriting recognition is to build efficient man-machine interface for communicating with the computer by human beings. Several attempts have been made for handwritten character recognition in different languages. Here we mention a pioneering contribution of Kickert and Koppelaar (93), the subsequent developments based on their work, and then an attempt made for fuzzy feature description in this context. The 26 capital letters of the English alphabet constituting the set
are seen to be composed of the elements of the following set of “ideal” elements (93)
where ∈ is a null segment whose use will be explained shortly. Also, a set P exits of 11 ordered recognition routines capable of recognizing the “ideal” segments. Each element of P can be considered as a portion of a context-free grammar having productions of the form
with Vn being the non-terminal elements of the grammar. Each of the 11 recognition routines is applied sequentially to any unknown pattern S to be recognized as one of the members of L. Each routine attempts to recognize a given segment in a given structural context. If successful, the application of the rules in P results in a parsing of S as a vector of segments S = (x1 , x2 , . . . , xn ), where xi ∈ VT . Each letter, then, is defined by its vector of segments. Let us assume that the vectors are padded out with null segments ∈ so that all letters are defined by vectors of equal length. Each letter, therefore, can be defined as follows:
where
The element of fuzziness is introduced by associating with each segment ai ∈ VT a fuzzy set on the actual pattern space. With each ai is associated a fuzzy membership function µai so that, given a segment xi of a pattern S, µa j (xi ) is a measure of the degree to which the segment xi corresponds to the ideal segment aj . The recognition procedure is now simply explained. The sequence of recognition rules is executed, evaluating all possible parsings of the input pattern. For each letter Hk for which a parse can be made, the result is a sequence (xi , x2 , . . . , xn ) of segments. The membership of S in Hk is the intersection in the sense of fuzzy sets of the memberships of the segments xi
Finally, the pattern is recognized as letter Hm if
This approach was criticized by Stallings (94), who developed a Bayesian hypothesis-testing scheme for the same problem. Given a pattern S, hypothesis Hk is that the writer intended letter Hk Associated with each decision is a cost Cij , which is the cost of choosing Hi when Hj is true. The parsing of the pattern is performed as before. Only a probability is associated with each segment for a given letter. Regarding unknown densities, stallings (94) suggests the use of maximum likelihood tests. As both membership function and probability density functions are maps into the interval [0,1], the only difference is the use of min/max operators, where, the author argues, the “min” operator loses a lot of information and is drastically affected by one low value. The author claims that a though frequentistic
Fuzzy Image Processing and Recognition
11
Figure 3. A payload deployment sequence of four scenes as input data.
Figure 4. Distances between successive frames with feature set (entropy, compactness, length/height).
probabil-ity is not appropriate in dealing with pattern variability, subjective probability is perfectly suitable and more intuitively obvious than “grade of membership.” In a rejoinder (95), it is argued that fuzzy set theory is more flexible than is assumed in Reference 94, where all arguments are directed against a particular case (93). Recalling the idea of collectives (from property sets), where the arithmetic average replaces “min,” there remains little difference between the schemes in References 93 and 94. In a reply, Stalling insisted that the Bayesian approach is superior because offers a convenient way for assignment of costs to errors and gains to correct answers. For the recognition of handwritten English capital letters, the readers may also refer to the work described in Reference 96. Existing computational recognition methods use feature extraction to assign a pattern to a prototype class. Therefore, the recognition ability depends on the selection procedure. To handle the inherent uncertainties/imprecision in handwritten characters, Malaviya and
Peters (97) have introduced fuzziness factor in the definition of selected pattern features. The fuzzy features are confined to their meaningfulness with the help of a multistage feature aggregation, which can be combined in a set of linguistic rules that form the fuzzy rule-base for handwritten information recognition. Note that the concept of introducing fuzziness in the definition and extraction of features and in their relations is not new. A detailed discussion is available in Reference 61 and 98 by Pal and others, for extraction of primitives for X-ray identification and character recognition in terms of gentle, fair, and sharp curves. A similar interpretation of the shape parameters of triangle, rectangle, and quadrangle in terms of membership for “approximate isosceles triangles,” “approximate equilateral triangles,” “approximate right triangle,” and so on has also been made (99) for their classification in a color image. However, the work in Reference (97) is significant from the point that it has described many global, positional, and geometral features to account for the variabilities in
12
Fuzzy Image Processing and Recognition
patterns, which are supported with experimental results. To represent the uncertainty in physical relations among the primitives, the production rules of a formal grammar are fuzzified to account for the fuzziness in relation among the primitives, thereby increasing the generative power of a grammar. Such a grammar is called fuzzy grammar (100–102). It has been observed (98) that the incorporation of the element of fuzziness in defining “sharp,” “fair,” and “gentle” curves in the grammars enables one to work with a much smaller number of primitives. By introducing fuzziness in the physical relations among the primitives, it was also possible to use the same set of production rules and nonterminals at each stage, which is expected to reduce, to some extent, the time required for parsing in the sense that parsing needs to be done only once at each stage, unlike the case of the non-fuzzy approach, where each string has to be parsed more than once, in general, at each stage. However, this merit has to be balanced against the fact that the fuzzy grammars are not as simple as the corresponding nonfuzzy grammars. In recent times, the use of fuzzy theory in various kinds of recognition tasks has increased significantly. Complex fuzzy systems have been designed to recognize gestures (103) and describe relative positions in images (104). The authors in Reference (105) have extended the application of fuzzy logic to recognize olfactory (smell) signals. Detecting Man-Made Objects from Remote Sensing Images In a remotely sensed image, the regions (objects) are usually ill-defined because of both grayness and spatial ambiguities. Moreover, the gray value assigned to a particular pixel of a remotely sensed image is the average reflectance of different types of ground covers present in the corresponding pixel area (36.25m–36.25m for the Indian Remote Sensing (IRS) imagery). Therefore, a pixel may represent more than one class with a varying degree of belonging. A multivalued recognition system (6,7) formulated based on the concept of fuzzy sets has been used for detecting curved structure from IRS images (108). The system is capable of handling various imprecise inputs and in providing multiple class choices corresponding to any input. Depending on the geometric complexity (8, 9) and the relative positions of the pattern classes in the feature space, the entire feature space is decomposed into some overlapping regions. The system uses Zadeh’s compositional rule of inference (109) to recognize the samples. The recognition system is initially applied on an IRS image to classify (based on the spectral knowledge of the image) its pixels into six classes corresponding to six land cover types, namely pond water, turbid water, concrete structure, habitation, vegetation, and open space. The green and infrared band information, being sensitive than other band images to discriminate various land cover types, are used for the classification. The clustered images are then processed for detecting the narrow concrete structure curves. These curves include basically the roads and railway tracks. The width of such attributes has an upper bound, which was considered there
to be three pixels for practical reasons. So all the pixels lying on the concrete structure curves with width not more than three pixels were initially considered as the candidate set for the narrow curves. As a result of the low pixel resolutions (36.25m — 36.25m for IRS imagery) of the remotely sensed images, all existing portions of such real curve segments may not be reflected as concrete structures and, as a result, the candidate pixel set may constitute some broken curve segments. To identify the curves in a better extent, a traversal through the candidate pixels was used. Before the traversing process, one also needs to thin the candidate curve patterns so that a unique traversal can be made through the existing curve segments with candidate pixels. Thus, the total procedure to find the narrow concrete structure curves consists of three parts: 1) selecting the candidate pixels for such curves, 2) thinning the candidate curve patterns, and 3) traversing the thinned patterns to make some obvious connections between different isolated curve segments. The multiple choices provided by the classifier in making a decision are used to a great extent in the traversal algorithm. Some of the movements are governed by only the second and combined choices. After the traversal, the noisy curve segments (i.e., with insignificant lengths) are discarded from the curve patterns. The residual curve segments represent the skeleton version of the curve patterns. To complete the curve pattern, the concrete structure pixels lying in the eight neighboring positions corresponding to the pixels on the above obtained narrow curve patterns are now put back. This resultant image represents the narrow concrete structure curves corresponding to an image frame (108). The results are found to agree well with the ground truths. The classification accuracy of the recognition system (107, 108) is not only found to be good, but also its ability of providing multiple choices in making decisions is found to be very effective in detecting the road-like structures from IRS images. Content-Based Image Retrieval (CBIR) In the last few years, researchers have witnessed an upsurge of interest in content-based image retrieval (CBIR), which is a process of selecting similar images from a collection by measuring similarities between the extracted features from images themselves. Real-life images are inherently fuzzy because of several uncertainties developing in the imaging process. Moreover, measuring visual similarities between images highly depends on subjectivity of human perception of image content. As a result, fuzzy image processing for extracting visual features finds an important place in image pretrieval applications. Let us explain here, in brief, an investigation carried out in Reference 110 on image retrieval is based on fuzzy geometrical features. Here a fuzzy edge map is extracted for each image. Using the edge map, a fuzzy compactness vector is computed that is subsequently used for measuring the similarity between the query and the database image. The process involves extracting the possible edge candidates using the concept of Top and Bottom of the intensity surface of a smoothened image. The extracted edge candidates are assigned gradient membership value µm (P)
Fuzzy Image Processing and Recognition
within (0.0 to 1.0) computed from the pixel contrast ratio over a fixed window. The selected points are categorized as weak, medium, and strong edge pixels based on their gradient membership value µm (P). Multilevel thresholding is performed by using (α – cut) to segregate the edge pixels. Fuzzy edge maps snα consisting of different types of edge pixels are obtained from the candidate set sn by varying µm (P), from which the connected subsets snα as shown in Fig. 5(b) and (c) are obtained. Fuzzy compactness value is computed from the fuzzy edge map snα , obtained at different (α – cut) to index an image of the database. snα = {(P ∈ sn : µm (P) ≥ α)}
(32)
where 0.5 ≥ α ≥ 1. This geometrical feature is invariant to rotation translation and scaling by definition. It physically means the maximum area that can be encircled by the perimeter. The similarity between the feature vectors of two images are computed by the widely used Euclidean distance metric. The retrieval results are shown in Fig. 6. From the experimental results of Fig. 6, it is seen that images are retrieved with fairly satisfactory precision. Some other significant work on image retrieval are available in Reference (111–113). The authors in Reference (111) propose an image retrieval system using texture similarity, whereas the authors in Reference (112) present a novel information fusion approach for use in content-based image retrieval. Retrieval of color images has been investigated in Reference (113). Recently, similarity-based online feature selection was applied to bridge the gap between high level semantic concepts and low level visual features in content-based image retrieval (114). Note that all these methods mentioned above use fuzzy theory to handle various kinds of ambiguities. Segmentation of Brain Magnetic Resonance Image Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of magnetic resonance (MR) images. A robust segmentation technique based on fuzzy set theory for brain MR images is proposed in Reference (115). The method proposed in Reference (115) is based on a fuzzy measure to threshold the image histogram. The image is thresholded based on a criterion of similarity between gray levels. The second-order fuzzy correlation is used for assessing such a concept. The local information of the given image is extracted through a modified cooccurrence matrix. The technique proposed here consists of two linguistic variables bright, dark modeled by two fuzzy subsets and a fuzzy region on the gray level histogram. Each of the gray levels of the fuzzy region is assigned to both defined subsets one by one and the second-order fuzzy correlation using modified co-occurrence matrix is calculated. First, let us define two linguistic variables dark, bright modeled by two fuzzy subsets of X, denoted by A and B, respectively. The fuzzy subsets A and B are associated with the histogram intervals [xmin , x p ] and [xmax , xq ], respectively, where x p and xg are the final and initial gray-level limits for these subsets, and xmin and xmax are the lowest and highest gray levels of the image, respectively.
13
Next, we calculate CA (xmin : x p ) and CB (x p : xmax ), where CA (xmin : x p ) is the second-order fuzzy correlation of fuzzy subset A and its two-tone version and CB (x p : xmax ) is the second-order fuzzy correlation of fuzzy subset B and its two-tone version using modified co-occurrence matrix. The second-order fuzzy correlation can be expressed in the following way: 4
C(µ1 , µ2 ) = 1 −
L L i=1
j=1
[µ1 (i, j) − µ2 (i, j)]2 ti j Y1 + Y 2
(33)
where tij is the frequency of occurrence of the gray level i followed by j; that is, T = [tij ] is the modified co-occurrence matrix, which is given by
ti j =
a ∈ X,b ∈ ag
δ (1 + ||2 )
(34)
where b ∈ ag = {(m, n − 1), (m, n + 1), (m + 1, n), (m − 1, n), (m − 1, n − 1), (m − 1, n + 1), (m + 1, n − 1), (m + 1, n + 1)} =
1 max{|xm−1,n + xm−1,n+1 + xm,n + xm,n+1 − xm+1,n 4 − xm+1,n+1 − xm+2,n − xm+2,n+1 |, |xm,n−1 + xm,n + xm+1,n−1 + xm+1,n − xm,n+1 − xm,n+2 − xm+1,n+1 − xm+1,n+2 |} 1 δ={
if gray level value of a is i and that of b is j 0 otherwise
and Yk =
L L i=1
[2µk (i, j) − 1]2 ti j ; k = 1, 2
j=1
To calculate correlation between a gray-tone image and its two-tone version, µ2 is considered as the nearest two-tone version of µ1 . That is, µ2 (x) = {
0 if µ2 (x) ≤ 0.5 1 otherwise
(35)
As the key of the proposed method is the comparison of fuzzy correlations, we have to normalize those measures, which is done by computing a normalizing factor α according to the following relation: α=
CA (xmin : x p ) CB (xq : xmax )
(36)
To obtain the segmented version of the gray-level histogram, we add to each of the subsets A and B a gray-level xi picked up from the fuzzy region and form two fuzzy subsets A and B that are associated with the histogram intervals [xmin , xi ] and [xi , xmax ], where x p < xi < xq . Then we calculate CA (xmin : xi ) and CB (xi : xmax ). The ambiguity of the gray value of xi is calculated as follows: A(xi ) = 1 −
|CA (xmin : xi ) − α · CB (xi : xmax )| (1 + α)
(37)
Finally, applying this procedure for all gray levels of the fuzzy region, we calculate the ambiguity of each gray level.
14
Fuzzy Image Processing and Recognition
Figure 5. (a) Original image. Fuzzy edge map for candidates with (b) µm P ≥ 0.6 (c) µm P ≥ 0.8.
Figure 6. Retrieved result (from fuzzy edge map), (a) general purpose database (b) logo retrieval from (USPTO) database, with top left image as the query image.
The process is started with xi = x p + 1, and xi is incremented one by one until xi < xq . In other words, we calculate the ambiguity by observing how the introduction of a gray level xi of the fuzzy region affects the similarity measure among gray levels in each of the modified fuzzy subsets A and B . The ambiguity A is maximum for the gray level xi in which the correlations of two modified fuzzy subsets are equal. The threshold level (T) for segmentation corresponds to gray value with maximum ambiguity A. That is, A(T ) = max arg{A(xi )}; ∀ x p < xi < xq
(38)
As an example, we explain the merits of the proposed method in Figs. (7) and (8). Figure (7) shows the original MR images and their gray-value histograms, whereas Fig. 8 represents the fuzzy second-order correlations CA (xmin : xi ) and CB (xi : xmax ) of two modified fuzzy subsets A and B with respect to the gray level xi of the fuzzy region and the ambiguity of each gray level xi . The value of α is also given here. Figure (8) depicts the segmented image of the proposed method. The thresholds are determined according to the strength of ambiguity.
Fuzzy Image Processing and Recognition
15
Figure 7. Original image and corresponding histogram.
Figure 8. Correlations of two fuzzy subsets, measure of ambiguity and segmented image (proposed).
Other Advances Over the years, applications of fuzzy theory in image processing and recognition has developed extensively in many other domains. Fuzzy morphology is a tool that has received considerable attention among researchers in the field of image processing (116, 117). Bloch (118, 119) used fuzzy theory to define spatial positioning of objects in images. A fuzzy error diffusion method has been proposed in Reference (120) to perform dithering to hide quantization errors in images. Zahlmann et al. (121) applied a hybrid fuzzy image, processing system to assess the damage to the blood vessels in the retina because of diabetis. A fractal coding scheme using a fuzzy image metric has been proposed in Reference (122). Adaptive schemes of digital watermarking in images and videos using fuzzyadaptive resonance theory (fuzzy-ART) classifier has been given in Reference 123. In Reference 124, fuzzy theory has been used to represent the uncertain location of a normal Eucledian point, and its application in doppler image sequence processing has been demonstrated. Fuzzy theory
have also been used in intelligent Web image retrieval purposes (125–127). In Reference 126, an image search engine named (STRICT) has been designed using fuzzy similarity measures. The authors in Reference 127 combine fuzzy text and image retrieval techniques to present a comprehensive image search engine. For an image, the histogram, which gives the frequency (probability) of occurrence of each gray value, and the co-occurrence matrix, which gives the frequency (jointprobability) of occurrence of two gray vajues seperated by a specific distance, are the first- and second-order statistics. In Reference 128, the authors used fuzzy theory to explain the inherent imprecision in the gray values of an image and defined the first- and second-order fuzzy statistics of digital images, namely, fuzzy histogram and fuzzy co-occurrence matrix, respectively. Fuzzy theory has also been used in various other applications such as automatic taxget detection and tracking, stereovision matching, urban structure detection in synthetic aperture radar (SAR) images, and image reconstrution (129–132).
16
Fuzzy Image Processing and Recognition
CONCLUSIONS AND DISCUSSION The problem of image processing and recognition under fuzziness and uncertainty has been considered. The role of fuzzy logic in representing and managing the uncertainties in these tasks was explained. Various fuzzy set theoretic tools for measuring information on grayness ambiguity and spatial ambiguity in an image were listed along with their characteristics. Some examples of image processing operations (e.g., segmentation, skeleton extraction, and edge detection), whose outputs are responsible for the overall performance of a recognition (vision) system, were considered to demonstrate the effectiveness of these tools in providing both soft and hard decisions. The significance of retaining the gray information in the form of class membership for a soft decision is evident. Uncertainty in determining a membership function in this regard and the tools for its management were also stated. Finally, a few real-life applications of these methodologies are described. In conclusion, gray information is expensive and informative. Once it is thrown away, there is no way to get it back. Therefore, one should try to retain this information as long as possible throughout the decision-making tasks for its full use. When it is required to make a crisp decision at the highest level, one can always throw away or ignore this informaion. Most of the algorithms and tools described here were developed by the author with his colleagues. Processing of color images has not been considered here. Some significant results on color image information and processing in the notion of fuzzy logic are available in References 133– 137. Note that fuzzy set theory has led to the development of the concept of soft computing as a foundation for the conception and design of a high Machine IQ (MIQ) system. The merits of fuzzy set theory have also been integrated with those of other soft computing tools (e.g., artificial neural networks, genetic algorithms, and rough sets) with a hope of building more efficient processing and recognition systems. ACKNOWLEDGMENT The author acknowledges Mr. Debashis Sen for his assistance in preparing the manuscript. BIBLIOGRAPHY 1. Gonzalez, R. C.; Wintz, P. Digital Image Processing, 2nd ed.; Addison-Wesley: Reading, MA, 1987. 2. Rosenfeld, A.; Kak, A. C. Digital Picture Processing; Academic Press: New York, 1982. 3. Prewitt, J. M. S. Object Enhancement and Extraction, In Picture Processing and Psycho-Pictorics, Lipkin, B. S.;A., Rosenfeld Eds.; Academic Press: New York, 1970, pp 75–149. 4. Fu, K. S. Syntactic Pattern Recognition and Application; Academic Press: London, 1982. 5. Murthy, C. A. On Consistent Estimation of Classes in R2 in the Context of Cluster Analysis;Ph.D Thesis, Indian Statistical Institute, Calcutta, 1988.
6. Edelsburnner, H.; Kirkpatrik, D. G.; Seidel, R. On the Shape of a Set of Points in a Plane. IEEE Trans. Inform. Theory, 1983, 29, p 551. 7. Tousant, G. T. Proc. 5th Int. Conf. Patt. Recog. Miami Beach, FL, 1980, pp 1324–1347. 8. Mandal, D. P.; Murthy, C. A.; Pal, S. K. Determining the Shape of a Pattern Class From Sampled Points in R2 . Int. J. General Syst. 1992, 20, pp 307–339. 9. Mandal, D. P.; Murthy, C. A.; Pal, S. K. Determining the Shape of a Pattern Class: Extension to RN . Int. J. General Syst. 1997, 26(4), pp 293–320. 10. Pal, S. K.; Dutta Majumder, D. Fuzzy Mathematical Approach to Pattern Recognition; Wiley: Halsted, NY, 1986. 11. Murthy, C. A.; Pal, S. K.; Dutta Majumder, D. Correlation Between Two Fuzzy Membership Functions. Fuzzy Sets Syst. 1985, 17, pp 23–38. 12. Pal, N. R.; Pal, S. K. Higher Order Fuzzy Entropy and Hybrid Entropy of a Set. Inform. Sci. 1992, 61 pp 211–231. 13. De Luca, A.; Termini, S. A Definition of a Non Probabilistic Entropy in the Setting of Fuzzy Set Theory. Inform. Control 1972, 20, pp 301–312. 14. Shannon, C. E. A Mathematical Theory of Communication. Bell. Syst. Tech. J. 1948, 27, p 379. 15. Pal, N. R.; Pal, S. K. Entropic Thresholding. Signal Proc. 1989, 16, pp 97–108. 16. Pal, N. R.; Pal, S. K. Image Model, Poisson Distribution and Object Extraction. Int. J. Patt. Recog. Artific. Intell., 1991, 5, pp 459–483. 17. Pal, N. R.; Pal, S. K. Entrophy: A New Definition and Its Applications. IEEE Trans. Syst, Man Cyberns. 1991, 21, pp 1260–1270. 18. Pal, N. R.; Pal, S. K. Some Properties of the Exponential Entropy. Inform Sci. 1992, 66, pp 119–137. 19. Pal, S. K. Fuzzy Set Theoretic Tools for Image Analysis. In Advances in Electronics and Electron Physics, Vol.88,P., Hawkes Ed.; Academic Press: New York, 1994, pp 247–296. 20. Pal, S. K.; King, R. A. Image Enhancement Using Smoothing with Fuzzy Set. IEEE Trans. Syst. Man Cyberns. 1981, 11, pp 494–501. 21. Pal, S. K.; King, R. A. Histogram Equalisation with S and II Functions in Detecting X-ray Edges. Electron. Lett. 1981, 17, pp 302–304. 22. Pal, S. K. A. Note on the Quantiative Measure of Image Enhancement Through Fuzziness. IEEE Trans. Pattern Anal. Machine Intell. 1982, 4, pp 204–208. 23. Xie, W. X.; Bedrosian, S. D. Experimentally Derived Fuzzy Membership Function for Gray Level Images. J. Franklin Inst. 1988, 325, pp 154–164. 24. Rosenfeld, A. The Fuzzy Geometry of Image Subsets. Patt. Recog. Lett. 1984, 2, pp 311–317. 25. Pal, S. K.; Rosenfeld, A. Image Enhancement and Thresholding by Optimization of Fuzzy Compactness. Pattern Recog. Lett. 1988, 7, pp 77–86. 26. Pal, S. K.; Ghosh, A. Index of Area Coverage of Fuzzy Image Subsets and Object Extraction. Pattern Recog. Lett. 1990, 11, pp 831–841. 27. Pal, S. K.; Ghosh, A. Fuzzy Geometry in Image Analysis. Fuzzy Sets Syst. 1992, 48(1) pp 23–40. 28. Pal, S. K.; Rosenfeld, A. A Fuzzy Medical Axis Transformation Based on Fuzzy Disk. Pattern Recog. Lett. 1991, 12, pp 585–590.
Fuzzy Image Processing and Recognition 29. Murthy, C. A.; Pal, S. K. Fuzzy Thresholding: Mathematical Framework, Bound Functions and Weighted Moving Average Technique. Pattern Recog. Lett. 1990, 11 pp 197–206. 30. Pal, S. K.; Dasgupta, A. Spectral Fuzzy Sets and Soft Thresholding. Inform. Sci. 1992, 65, pp 65–97. 31. Murthy, C. A.; Pal, S. K. Histogram Thresholding by Minimizing Gray Level Fuzziness. Inform. Sci. 1992, 60, pp 107– 135. 32. Klir, J. G.; Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applications Prentice Hall: Englewood Cliffs, NJ, 1995. 33. Hirota, K. Concepts of Probabilistic Sets. Fuzzy Sets Syst. 1981, 5, pp 31–46. 34. Turksen, I. B. Interval Values Fuzzy Sets Based on Normal Forms. Fuzzy Sets Syst. 1986, 20, pp 191–210. 35. Mizumoto, M.; Tanaka, K. Some Properties of Fuzzy Sets of Type 2. Inform. Control 1976, 31, pp 312–340. 36. Zadeh, L. A. Making Computers Think Like People. IEEE Spectrum 1984, August, pp 26–32. 37. Pal, S. K.; King, R. A. On Edge Detection of X-ray Images Using Fuzzy Set. IEEE Trans. Pattern Anal. Machine Intell. 1983, 5, pp 69–77. 38. Li, H.; Yang, H. S. Fast and Reliable Image Enhancement Using Fuzzy Relaxation Technique. IEEE Trans. Syst. Man Cyber. 1989, 19, pp 1276–1281. 39. Krell, G.; Tizhoosh, H. R.; Lilienblum, T.; Moore, C. J.; Michaelis, B. Fuzzy Image Enhancement and Associative Feature Matching in Radio Therapy. Proc. Int. Conf. Neural Networks (ICNN’97); 1997, 3, 1490–1495. 40. Lukac, R.; Plataniotis, K. N.; Smolka, B.; Venetsanopoulos, A. N. cDNA Microarray Image Processing using Fuzzy Vector Filtering Frameworks. Fuzzy Sets Syst. 2005, 152 (1), pp 17–35. 41. Russo, F.; Ramponi, G. Nonlinear Fuzzy Operators for Image Processing. Signal Proc. 1994, 38, pp 429–440. 42. Russo, F.; Ramponi, G. A fuzzy Operator for the Enhancement of Blurred and Noise Images. IEEE Trans. Image Proc. 1995, 4(8), pp 1169–1174. 43. Chatzis, V.; Pitas, I. Fuzzy Scalar and Vector Median Filters based on Fuzzy Distances. IEEE Trans. Image Proc. 1999, 8(5), pp 731–734. 44. Senel, H. G.; Peters II, R. A. Dawant, B. Topological Median Filters. IEEE Trans. Image Proc. 2002, 11(2), pp 89–104. 45. Van De Ville, D.; Nachtegael, M.; Van der Weken, D.; Kerre, E. E.; Philips, W.; Lemanieu, I. Noise Reduction by Fuzzy Image Filtering. IEEE Trans. Fuzz. Syst. 2003, 11(4), pp 429–436. 46. Shao, M.; Barner, K. E. Optimization of Partition-based Weighted Sum Filters and Their Application to Image Denoising. IEEE Trans. Image Proc. 2006, 15(7), pp 1900– 1915. 47. Lee, C.-S.; Kuo, Y.-H.; Yu, P.-T. Weighted Fuzzy Mean Filters for Image Processing. Fuzzy Sets Syst. 1997, 89(2), pp 157–180. 48. Russo, F. FIRE Operators for Image Processing. Fuzzy Sets Syst. 1999 103(2), pp 265–275. 49. Chen, L.; Yap, K.-H. A Soft Double Regularization Approach to Parametric Blind Image Deconvolution. IEEE Trans. Image Proc. 2005, 14(5), pp 624–633. 50. Schulte, S.; Nachtegael, M.; De Witte, V.; Van der Weken, D.; Kerre, E. E. A. Fuzzy Impulse Noise Detection and Reduction Method. IEEE Trans. Image Proc. 2006, 15(5), pp 1153– 1162.
17
51. Nie,Y.; Barner, K. E. The Fuzzy Transformation and its Applications in Image Processing. IEEE Trans. Image Proc. 2006, 15(4), pp 910–927. 52. Kundu, M. K.; Pal, S. K. Automatic Selection of Object Enhancement Operator with Quantiative Justification Based on Fuzzy Set Theoretic Measures. Pattern Recog. Lett. 1990, 11, pp 811–829. 53. Bezdek, J. C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, 1981. 54. Kandel, A. Fuzzy Techniques in Pattern Recognition; Wiley Interscience: New York, 1982. 55. Pedrycz, W. Fuzzy Sets in Pattern Recognition: Methodology and Methods. Pattern Recog. 1990, 23, pp 121–146. 56. Lim, Y. W.; Lee, S. U. On the Color Image Segmentation Algorithm Based on the Thresholding and the Fuzzy c-Means Technique. Pattern Recog. 1990, 23, pp 935–952. 57. Pal, S. K.; Mitra, S. Fuzzy Dynamic Clustering Algorithm. Pattern Recog. Lett. 1990, 11 pp 525–535. 58. Dave, R.; Bhaswan, K. Adaptive Fuzzy c-shells Clustering, Proc. NAFIPS 91, University of Missouri, Columbia, MO, 1991, pp 195–199. 59. Pal, N. R.; Pal, S. K. A Review on Image Segmentation Techniques. Pattern Recog. 1993, 26, pp 1277–1294. 60. Bezdek, J. C.;Pal, S. K. Eds.; Fuzzy Models for Pattern Recognition: Methods that Search for Structures in Data; IEEE Press: New York, 1992. 61. Gustafon, D. E.; Kessel, W. C. Fuzzy Clustering with a Fuzzy Covariance Matrix; Proc. IEEE CDC; San Diego, CA, 1979, pp 761–766. 62. Gath, I.; Geva, A. B. Unsupervised Optimal Fuzzy Clustering. IEEE Trans. Patt. Anal. Machine Intell. 1989 11, pp 773– 781. 63. Backer, E.; Jain, A. K. A Clustering Performance Measure Based on Fuzzy Set Decomposition. IEEE Trans. Patt. Anal. Machine Intell. 1981, 3, pp 66–75. 64. Dave, R. N.; Bhaswan, K. Adaptive Fuzzy c-shells Clustering and Detection of Ellipses. IEEE Trans. Neural Networks 1992, 3, pp 643–662. 65. Krishnapuram, R.; Nasraoui, O.; Frigui, H. The Fuzzy C Spherical Shells Algorithm: A New Approach IEEE Trans. Neural Networks 1992, 3, pp 663–671. 66. Bensaid, A. M.; Hall, L. O.; Bezdek, J. C.; Clarke, L. P. Partially Supervised Clustering for Image Segmentation. Pattern Recog. 1993, 29, pp 1033–1048. 67. Cannon, R. L.; Dave, J. V.; Bezdek, J. C. Efficient Implementation of the Fuzzy c-Means Clustering Algorithms. IEEE Trans. Patt. Anal. Mach. Learn. 1986, 8, pp 248–255. 68. Smith, N. R.; Kitney, R. I. Fast Fuzzy Segmentation of Magnetic Resonance Images: A Prerequisite for Real-Time Rendering Proc. SPIE; 1997, 3034(2), pp 1124–1135. 69. Belacel, N.; Hansen, P.; Mladenovic, N. Fuzzy J-Means:A New Heuristic for Fuzzy Clustering. Pattern Recog. 2002, 35(10), pp 2193–2200. 70. Etemad, K.; Doermann, D.; Chellappa, R. Multiscale Segmentation of Unstructured Document Pages using Soft Decision Integration. IEEE Trans. Patt. Anal. Mach. Intell. 1997, 19(1) pp 92–96. 71. Cheng, H.-D.; Cheng, C. H.; Chiu, H. H.; Xu, H. Fuzzy Homogeneity Appraoch to Multilevel Thresholding. IEEE Trans. Image Proc. 1998, 7 pp 1084–1086. 72. Solaiman, B.; Debon, R.; Pipelier, F.; Cauvin, J.-M.; Roux, C. Information Fusion, Application to Data and Model Fusion
18
73.
74.
75.
76.
77.
78.
79.
80.
81.
82. 83.
84.
85.
86.
87.
88.
89.
90.
91.
Fuzzy Image Processing and Recognition for Ultrasound Image Segmentation. IEEE Trans. Biomed. Eng. 1999, 46(10), pp 1171–1175. Intajag, S.; Paithoonwatanakij, K.; Cracknell, A. P. Iterative Satellite Image Segmentation by Fuzzy Hit-or-Miss and Homogeneity Index. IEE Proc. Vision, Image and Signal Processing; 2006, 153(2), pp 206–214. Pednekar, A. S.; Kakadiaris, I. A. Image Segmentation Based on Fuzzy Connectedness Using Dynamic Weights. IEEE Trans. Image Proc. 2006, 15(6), pp 1555–1562. Pednekar, A.; Kurkure, U.; Muthupillai, R.; Flamm, S.; Kakadiaris, I. A. Automated Left Ventricular Segmentation in Cardiac MRI. IEEE Trans. Biomed. Eng. 2006, 53(7), pp 1425–1428. Pal, S. K.; Ghosh, A.; Uma Shankar, B. Segmentation of Remote Sensing Image with Fuzzy Thresholding, and Quantiative Index. Int. J. Remote Sensing 2000, 21(11), pp 2269–2300. Saha, P. K.; Udupa, J. K. Optimum Image Thresholding via Class Uncertainity and Region Homogeneity. IEEE Trans. Patt. Anal. Mach. Intell. 2001, 23(7), pp 689–706. Caillol, H.; Pieczynski, W.; Hilton, A. Estimation of Fuzzy Gaussian Mixture and Unsupervised Statistical Image Segmentation. IEEE Trans. Image Proc. 1997, 6(3), pp 425– 440. Algorri, M.-E.; Flores-Mangas, F. Classification of Anatomical Structures in MR Brain Images Using Fuzzy Parameters. IEEE Trans. Biomed. Eng. 2004, 51(9), pp 1599–1608. Cheng, H.-D.; Lui, Y. M.; Freimanis, R. I. A Novel Approach to Microcalcification Detection Using Fuzzy Logic Technique. IEEE Trans. Med. Imag. 1998, 17(3) pp 442–450. Udupa, J. K.; Samarasekera, S. Fuzzy Connectedness and Object Definition: Theory, Algorithms and Applications in Image Segmentation. Graphic. Models Image Proc. 1996 58, pp 246–261. Pal, S. K. Fuzzy Skeletonization of Image. Pattern Recog. Lett. 1989, 10, pp 17–23. Pal, S. K.; King, R. A.; Hashim, A. A. Image Description and Primitive Extraction Using Fuzzy Sets. IEEE Trans. Syst. Man Cyberns. 1983, 13, pp 94–100. Peleg, S.; Rosenfeld, A. A Min-max Medical Axis Transformation. IEEE Trans. Patt. Anal. Mach. Intell 1981, 3, pp 208–210. Salari, E.; Siy, P. The Ridge-seeking Method for Obtaining the Skelecton of Digital Images. IEEE Trans. Syst. Man Cybern. 1984, 14, pp 524–528. Pal, S. K.; Wang, L. Fuzzy Medical Axis Transformation (FMAT): Practical Feasibility. Fuzzy Sets Syst. 1992, 50, pp 15–34. Dyer, C. R.; Rosenfeld, A. Thinning Algorithms for Gray-Scale Pictures. IEEE Trans. Patt. Anal. Mach. Intell. 1979, 1, pp 88–89. Pal, S. K.; Leigh, A. B. Motion Frame Analysis and Scene Abstraction: Discrimination Ability of Fuzziness Measures. J. Intel. Fuzzy Syst. 1995, 3, pp 247–256. Pal, S. K.; Mitra, S. Noisy Fingerprint Classification Using Multi Layered Perceptron with Fuzzy Geometrical and Textual Features. Fuzzy Sets Syst. 1996, 80, pp 121–132. Fang, H.; Jiang, J.; Feng, Y. A Fuzzy Logic Approach fo rDetection of Video Shot Boundaries. Pattern Recog. 2006, 39(11), pp 2092–2100. Erdem, C. E.; Karabulut, G. Z.; Yammaz, E.; Anarim, E. Motion Estimation in the Frequency Domain Using Fuzzy C-
92.
93.
94. 95.
96.
97. 98.
99.
100. 101.
102.
103.
104.
105.
106.
107.
108.
109. 110.
111.
Planes Clustering. IEEE Trans. Image Proc. 2001, 10(12), pp 1873–1879. Haag, M.; Nagel, H.-H. Incremental Recognition of Traffic Situations from Video Image Sequences. Image Vision Comput. 2000, 18(2) pp 137–153. Kickert, W. J. M.; Koppelaar, H. Application of Fuzzy Set Theory to Syntactic Pattern Recognition of Handwritten Capitals. IEEE Trans. Syst. Man Cyber. 1976, 6, pp 148–151. Stallings, W. Fuzzy Set Theory Versus Bayesian Statistics. IEEE Trans. Syst. Man Cyber. 1977, 7, pp 216–219. Jain, R. Comments on Fuzzy Set Theory Versus Bayesian Statistics. IEEE Trans. Syst. Man Cyber. 1978 8, pp 332– 333. Chatterjee, B. N. Recognition of Handwritten Characters by a Fuzzy Set Theoretic Approach. Proc. Int. Conf. Advances Inform. Sci. Tech. Calcutta, 1982, pp 166–172. Malaviya, A.; Peters, L. Fuzzy Feature Description of Handwritten Patterns. Pattern Recog. 1997, 30, pp 1591–1604. Pathak, A.; Pal, S. K. Fuzzy Grammar in Syntactic Recognition of Skeletal Maturity from X-ray. IEEE Trans. Syst. Man Cyberns. 1986, 16, pp 657–667. Huntsberger, T. L.; Rangarajan, C.; Jayaramamurthy, S. N. Representation of Uncertainty in Computer Vision Using Fuzzy Set., IEEE Trans. Comp. 1986, 35, pp 145–156. Lee, E. T.; Zadeh, L. A. Note on Fuzzy Languages. Inform. Sci. 1969, 1, pp 421–434. Thomason, M. G. Finite Fuzzy Automata, Regular Fuzzy Languages, and Pattern Recognition. Pattern Recog. 1973, 5, pp 383–390. DePalme, G. F.; Yan, S. S. Fractionally Fuzzy Grammars with Applications to Pattern Recognition. In Fuzzy Sets and Their Applications to Cognitive and Decision Processess, Zadeh, L. A.; et al., Eds.; Academic Press: London, 1975. Wachs, J. P.; Stern, H.; Edan, Y. Cluster Labeling and Parameter Estimation for the Automated Setupt of a Hand-Gesture Recognition System. IEEE Trans. Syst. Man Cyberns. A 2005, 35(6), pp 932–944. Matsakis, P.; Keller, J. M.; Wendling, L.; Marjamaa, J.; Sjahputera, O. Linguistic Description of Relative Position in Images. IEEE Trans. Syst. Man Cyberns. B 2001, 31(4), pp 573–588. Lazzerini, B.; Maggiore, A.; Marcelloni, F. FROS: A Fuzzy Logic-based Recogniser of Olfactor Signals. Pattern Recog. 2001, 34(11), pp 2215–2226. Mandal, D. P.; Murthy, C. A.; Pal, S. K. Formulation of a Multivalued Recognition System. IEEE Trans. Syst. Man Cyberns. 1992, 22, pp 607–620. Mandal, D. P.; Murthy, C. A.; Pal, S. K. Theoretical Performance of a Multivalued Recognition System. IEEE Trans. Syst. Man Cyberns. 1994, 24 pp 1001–1021. Mandal, D. P.; Murthy, C. A.; Pal, S. K. Analysis of IRS Imagery for Detecting Man-made Objects with a Multivalued Recognition System. IEEE Trans. Systm. Man Cyberns. A 1996, 26, pp 241–247. Zadeh, L. A. Fuzzy Logic and Approximate Reasoning. Synthese 1977, 30, pp 407–428. Banerjee, M.; Kundu, M. K. Edge based Features for Content Based Image Retrieval. Pattern Recog. 2003, 36(11), pp 2649–2661. Qi, X.; Han, Y. A Novel Fusion Approach to Content-based Image Retrieval. Pattern Recog. 2005, 38(12), pp 2449– 2465.
Fuzzy Image Processing and Recognition 112. Huang, P. W.; Dai, S. K. Image Retrieval by Texture Similarity. Pattern Recog. 2003, 36(3) pp 665–679. 113. Han, J.; Ma, K.-K. Fuzzy Color Histogram and its Use in Color Image Retrieval. IEEE Trans. Image Proc. 2002, 11(8), pp 944–952. 114. Jiang, W.; Er, G.; Dai, Q.; Gu, J. Similarity-based Online Feature Selection in Content-Based Image Retrieval. IEEE Trans. Image Proc. 2006, 15(3), pp 702–712. 115. Maji, P.; Kundu, M. K.; Chanda, B. Segmentation of Brain MR Images Using Fuzzy Sets and Modified Co-occurrence Matrix; IEEE Proc. International Conference on Visual Information Engineering (VIE-06);India, September, 2006. 116. Sinha, D.; Sinha, P.; Dougherty, E. R.; Batman, S. Design and Analysis of Fuzzy Morphological Algorithm for Image Processing. IEEE Trans. Fuzz. Syst. 1997, 5(4), pp 570–584. 117. Gasteroatos, A.; Andreadis, I.; Tsalides, P. Fuzzy Soft Mathematical Morphology. IEE Prc.- Vision, Image and Signal Processing; 1998, 145(1), pp 41–49. 118. Bloch, I. Fuzzy Relative Position between Objects in Image Processing: A Morphological Approach. IEEE Trans. Patt. Anal. Mach. Intell. 1999, 21(7), pp 657–664. 119. Bloch, I. Fuzzy Spatial Relationships for Image Processing and Interpretation: A Review. Image Vision Comput. 2005, 23(2), pp 89–110. 120. Ozdenir, D.; Akarun, L. Fuzzy Error Diffusion. IEEE Trans. Image Proc. 2000, 9(4), pp 683–690. 121. Zahlmann, G.; Kochner, B.; Ugi, I.; Schuhmann, D.; Liesenfeld, B.; Wegner, A.; Obermaier, M.; Mertz, M. Hybrid Fuzzy Image Processing for Situation Assessment [Diabetic Retinopathy]. IEEE Eng. Med. Biol. Mag. 2000, 19(1), pp 76–83. 122. Li, J.; Chen, G.; Chi. Z. A Fuzzy Image Metric with Application to Fractal Coding. IEEE Trans. Image Proc. 2002, 11(6), pp 636–643. 123. Chang, C.-H.; Ye, Z.; Zhang, M. Fuzzy-ART Based Adaptive Digital Watermarking Scheme. IEEE Trans. Circuits Syst. Video Technol. 2005, 15(1), pp 65–81. 124. Mercer, R. E.; Barron, J. L.; Bruen, A. A.; Cheng, D. Fuzzy Points: Algebra and Applications. Pattern Recog. 2002, 35(5), pp 1153–1166. 125. Shahabi, C.; Chen, Y.-S. A Unified Framework to Incorporate Soft Query into Image Retrieval Systems, Proc. International Conference on Enterprise Information Systems; Setbal, Portugal, July 2001, pp 216–224. 126. Omhover, J.-F.; Detyniecki, M. STRICT: An Image retrieval Platform for queries Based on Regional Content Proc. Third International Conference Image and Video Retrieval 2004, pp 473–482. 127. Omhover, J.-F.; Detyniecki, M. Combining Text and Image Retrieval; Proc. EUROFUSE Workshop on Data and Knowledge Engineering; Warsaw, Poland, 2004, pp 388–398. 128. Jawahar, C. V.; Ray, A. K. Fuzzy Statistics of Digital Images. IEEE Sig. Proc. Lett. 1966, 3(8), pp 225–227. 129. Kim, B.-G.; Park, D.-J. Novel Target Segmentation and Tracking Based on Fuzzy Membership Distribution for Visionbased Target Tracking System. Image Vision Com. 2006 press. 130. Pajares, G.; de la Cruz, J. M. Fuzzy Cognitive Maps for Stereovision Matching. Pattern Recog. 2006, 39(11), pp 2101– 2114. 131. Dell’ Acqua, F.; Gamba, P. Detection of Urban Structures in SAR Images by Robust Fuzzy Clustering Algorithms: The Ex-
132.
133.
134.
135. 136.
137.
19
ample of Street Tracking. IEEE Trans. Trans. Geosci. Remote Sensing 2001, 39(10), pp 2287–2297. Rovetta, S.; Masulli, F. Vector Quantization and Fuzzy Ranks for Image Reconstruction. Image Vision Comput. 2006, press. Pal, S. K.; Fuzzy Sets and Decision Making in Color Image Processing. In Artificial Intelligence and Applied CyberneticsGhosal, A. South Asian Publishers: New Delhi, 1989, pp 89–98. Lim, Y. W.; Lee, S. U. On the Color Image Segmentation Algorithm Based on the Thresholding and the Fuzzy c-Means Techniques. vol. 23 1990, pp 935–952. Xie, W. X. An Information Measure for a Color Space. Fuzzy Sets Syst. 1990, 36, pp 157–165. Hildebrand, L.; Fathi, M. Knowledge-based Fuzzy Color Processing. IEEE Trans. Syst. Man Cyberns. 2004, 34(4) pp 449–505. Seaborn, M.; Hepplewhite, L.; Stonham, J. Fuzzy Colour Category Map for the Measurement of Colour Similarity and Dissimilarity. Pattern Recogn. 2005, 38(2), pp 165–177.
SANKAR K. PAL Machine Intelligence Unit, Indian Statistical Institute, Calcutta, IN
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3508.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Information Retrieval and Databases Standard Article Frederick E. Petry1 1Tulane University, New Orleans, LA Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3508 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (136K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Fuzzy Databases Fuzzy Information Retrieval Future Directions About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3508.htm17.06.2008 15:57:46
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
FUZZY INFORMATION RETRIEVAL AND DATABASES
123
predicates:
X θ Y ≡ ⊥ if X or Y is null and θ is <, ≤, =, =, ≥, > ⊥ ∈ S ≡ ⊥ for any set S S ⊇ {⊥} ≡ ⊥ for any set S
FUZZY INFORMATION RETRIEVAL AND DATABASES Information processing and management has become one of the topics that has stimulated great interest over the past several years. The technological advances in databases and retrieval systems and the ability to access such data over the Internet has focused developments in this area. Information systems are designed to model, store, and retrieve large amounts of information effectively. From a developmental point of view, the management of unstructured information (texts), on one hand, and structured information (formatted data representing factual business information), on the other, have given rise to two different lines of research and products: information retrieval systems and database management systems. Being able to naturally handle the imprecision and vagueness that we experience in the real world of information systems is very desirable. Fuzzy set theory has proven to be a very powerful tool to handle this sort of uncertainty in many areas. In information systems, the two main issues in which uncertainty should be reflected are the representation scheme and the querying mechanism; these are discussed here. FUZZY DATABASES The earliest attempt to represent inexact data in databases was the introduction of the concept of null values by Codd (1). The first extensions of the relational data model that incorporated nonhomogeneous domain sets did not use fuzzy set theory. Rather, they attempted to represent null values and intervals. The ANSI/X3/SPARC report of 1975 (2), for instance, notes more than a dozen types of null. At one end of the spectrum, null means completely unknown. For example, a null value in the current salary of an employee could mean the actual value is any one of the permissible values for the salary domain set. Without resorting to fuzzy measures, a user can specify some information about a value that further restricts it. A subset or range of values of the domain set may be described within which the actual attribute value must lie. The user or the system (via functional dependencies) may specify subsets or subranges within which the actual value must not lie. Yet another option is to label null values in a manner that requires distinct nulls in different portions of the database to have a particular actual value relationship (usually equality) if they have the same label. The semantics of the null value range from ‘‘unknown’’ (e.g., the current salary of an employee) to ‘‘not applicable’’ (e.g., subassembly number of a part that is not a subassembly) to ‘‘does not exist’’ (e.g., middle name of a person). These last two meanings, however, are not related to uncertainty. Codd proposes a three-value logic using T, F, and ⬜ (null in the sense of unknown) in conjunction with the following
There is a problem. Because of the variety of meanings possible for null values, they cannot discriminate well enough (i.e., they are ‘‘overloaded’’ in the programming language sense). Two possible solutions are to maintain multiple nulls or to provide semantic interpretation external to the database. Range Values Approach As discussed, it is possible to have a variety of nulls with different semantics. However, these are not adequate to represent the possibility of a range of values. For example, we may not know exactly the age of a house, but we know it is in the range of 20 to 30 years. So we have an interval of values and know one is correct but do not necessarily know exactly which one. An early development in this area by Grant (3) extended the relational model to allow range values. Basically three types of values are allowed: a single number for the case of complete information; a pair of numbers (i, j) [i ⱕ j] for the case of partial information in the form of a range of possible values; and finally a null value in the case of no information. To deal with comparisons of such values for purposes of defining relations and relational operators, true and maybe predicates are defined where the maybe predicate means that it is true or maybe true. For example, consider a relation R with three tuples. For an attribute Years, the values for each tuple are: 15; 8; (20,30). It is definite that 15 僆 R, but it is not certain if 25 is in R, so we have 25 僆M (maybe an element of) R. Note that, by the definition of the maybe predicate, we also have 15 僆M R. The basis of the relational model is set theoretic, so we can view a relation as a set of tuples. In a set there should not normally be duplicate elements, and the issue of elimination of duplicate tuples plays a significant role in inexact and imprecise models of data. For several fuzzy database models, the elimination of redundant tuples requires careful consideration. In the case of a range of values, we can see some of the issues that will arise in the case of fuzzy databases. In particular, duplicate tuples are allowed because, even if they appear to be identical, they may actually stand for different values (i.e., have different interpretations). Consider the possibility of the tuple (20,30) appearing twice in the preceding relation R. In one case, it may stand for the actual value 25 and in the other, 28. If the set of possible interpretations for this range comprises the 11 values: 20, 21, . . ., 30, then there can be at most 11 occurrences of the tuple value (20,30) without violating the ‘‘no duplicate tuples’’ rule. Lipski’s Generalized Approach to Uncertainty Lipski (4) proposed a more general approach. He does not, for instance, assume that null means that a value is completely unknown. Given that there may be labeled or restricted value nulls, let 储Q储 denote all real-world objects that a query Q could represent. Let T be a database object and 储T 储 be all
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
124
FUZZY INFORMATION RETRIEVAL AND DATABASES
real-world objects it could represent. These are also known as external and internal interpretations. Assume a relation EMPLOYEE with domains NAME and AGE. The database object T ⫽ [Bob 30–35] could represent six real-world objects (one for each year in the age range). A query Q places each database object in one of three categories.
T ∈ {surely set} T ∈ {possible set} T ∈ {eliminated set}
if Q ⊃ T if Q ∩ T = if Q ∩ T =
For instance, the query, EMPLOYEE [AGE ⬎ 32], places T in the possible set, while EMPLOYEE [AGE ⬎ 25] EMPLOYEE [AGE ⬍ 40] places T in the surely set. The first two categories are also known as the lower value 储Q储* and upper value 储Q储*, and these limiting interpretations are characterized in this approach. A number of relationships that assist in evaluating this sort of query have been developed. It should be noted that because the representation of inexact data is sufficiently generalized, it becomes intimately related to the uncertainty data modeling using fuzzy sets, which we will be describing shortly. Statistical and Probabilistic Databases The main work in the area of statistical approaches is that of Wong (5) in which he handles a large class of uncertainty cases by statistical inference. This formulation approaches the uncertainty of the real-world data by assuming an ideal world of perfect information to which the incomplete data may be statistically compared. The prior information from this comparison is represented either as a distortion function or a conditional distribution. Missing and combined attributes can be dealt with by distortion functions. The more direct method of dealing with uncertainty and incompleteness is to specifically use a probabilistic data model, and the most completely developed approach is that in which probabilities are associated with the values of the attributes (6). In this model, because each stochastic attribute is treated as a discrete probability distribution function, the probabilities for each attribute (in a tuple) must be normalized (sum to 1.0). However, it may be difficult to ascertain exact probabilities for all possible domain values. As a result, they developed the concept of missing probabilities to account for such incompletely specified probability distributions. It permits the probabilistic model to capture uncertainty in data values as well as in the probabilities. When updating or entering data into a probabilistic relation, it is not necessary to have all information before some tuple can be entered, allowing a natural use of such uncertain information. Fuzzy Databases Models of Imprecision The relational model has been the dominant database model for a considerable period of time, and so it was naturally used by researchers to introduce fuzzy set theory into databases. Much of the work in the area has been in extending the basic model and query languages to permit the representation and retrieval of imprecise data. A number of related issues such as functional dependencies, security, and implementation considerations have also been investigated (7).
Two major approaches have been proposed for the introduction of fuzziness in the relational model. The first one uses the principle of replacing the ordinary equivalence among domain values by measures of nearness such as similarity relationships (8), proximity relationships (9), and distinguishability functions (10). The second major effort involves a variety of approaches that directly use possibility distributions for attribute values (11,12). There have also been some mixed models combining these approaches (13,14). We can also characterize these approaches relative to their extensions of the relational model. As we have seen in capturing incompleteness or uncertainty, it is necessary to extend the basic relational model by using non-first normal forms. In the first approach using nearness measures, the imprecision of the actual data values is implicit, using a separate relation or table for the similarity or proximity relationship. Generally with the use of possibility distributions, most approaches have some imprecise description of the data explicitly or directly represented in the basic attribute values of the relation. We characterize these approaches as being either homogeneous or heterogeneous representations. The distinguishing characteristic of an ordinary relational database (or ordinary databases of other forms) is the uniformity or homogeneity of the represented data (15). For each domain, there is a prescribed set of values from which domain values may be selected. Furthermore, each element of the domain set is of the same structure (e.g., integers, real numbers, or character strings). With the use of similarity or proximity relationships, the imprecision in domain values is implicit, and so the representation remains homogeneous. These approaches are thus closer to ordinary crisp relational models and can be shown to have properties that closely follow those of conventional relational models. To more directly represent uncertainty within the domain values themselves requires departure from homogeneity of representation. These models based on possibility theory provide the ability to model more forms of uncertainty. As would be expected from the increased power of representation, there is a tradeoff in more complexity of implementation. The more complex extensions of the basic relational model lead us to classify them using a heterogeneous representation. This is just a matter of degree, and some approaches may be more heterogeneous than others. Membership Values Models. The simplest form for a fuzzy database is the attachment of a membership value (numeric or linguistic) to each tuple. This permits maintenance of homogeneous data domains and strongly typed data sets. However, the semantic content of the fuzzy membership domain is used during query processing. We will consider examples that illustrate two distinct semantics for the membership domain. In the first relation, Investment_Sites, we have tuples with attributes of [site-id, classification, membership value]: 兵[12, residential-1, 1.0], [14, residential-2, 0.7], [79, light-commercial, 0.85], . . .其. The membership value here denotes the degree to which the tuple belongs within the relation (16). The second example is the relation Resume_Analysis, which represents the analysis criteria of potential employees: 兵[physics, science, 1.0], [botany, science, 0.7], [statistics, analysis, 0.8], . . .其. In the relation, the membership value denotes the strength of the dependency between the key attribute, Subject, and the attribute Classification (17).
FUZZY INFORMATION RETRIEVAL AND DATABASES
Similarity-Based Fuzzy Models. In the late 1970s, Buckles and Petry (8) were the first to use similarity relationships in a relational model. Their approach attempted to generalize the concept of null and multiple-valued domains for implementation within an operational environment consistent with the relational algebra. In fact, the nonfuzzy relational database is a special case of their fuzzy relational database approach. For each domain j in a relational database, a domain base set Dj is understood. Domains for fuzzy relational databases are either discrete scalars or discrete numbers drawn from either a finite or infinite set. An example of a finite scalar domain is a set of linguistic terms. For example, consider a set of terms that can be used for subjective evaluation of a patient’s health: 兵critical, severe, poor, so-so, average, good, excellent其. The fuzzy model uses a similarity relationship to allow the comparison of these linguistic terms. The domain values of a particular tuple may also be single scalars or numbers (including null) or a sequence of scalars or numbers. Consider, for example, the assessments made in the triage database to permit ranking of patient treatment. If we include linguistic descriptions of the severity of patients and combine these with procedure time estimates, we have tuples in the relation such as: 兵[p1, 兵so-so, average其, 兵20, 30其], [p2, poor, 兵20, 50其], [p3, 兵poor, severe其, 兵80–120其], . . .其 The identity relation used in nonfuzzy relational databases induces equivalence classes (most frequently singleton sets) over a domain D, which affects the results of certain operations and the removal of redundant tuples. The identity relation is replaced in this fuzzy relational database by an explicitly declared similarity relation (18) of which the identity relation is a special case. A similarity relation s(x, y) for given domain D is a mapping of every pair of elements in the domain onto the unit interval [0, 1] with the following three properties, x, y, z 僆 D: 1. Reflexive: sD(x, x) ⫽ 1 2. Symmetric: sD(x, y) ⫽ sD(y, x) 3. Transitive: sD(x, z) ⱖ Max(Min[sD(x, y), sD(y, z)]) Next the basic concepts of fuzzy tuples and interpretations must be described. A key aspect of most fuzzy relational databases is that domain values need not be atomic. A domain value di, where i is the index of the attribute in the tuple, is defined to be a subset of its domain base set Di. That is, any member of the power set may be a domain value except the null set. Let P(Di) denote the power set of Di ⫺ . A fuzzy relation R is a subset of the set cross product P(D1) ⫻ P(D2) ⫻ ⭈ ⭈ ⭈ ⫻ P(Dm). Membership in a specific relation r is determined by the underlying semantics of the relation. For instance, if D1 is the set of major cities and D2 is the set of countries, then (Paris, Belgium) 僆 P(D1) ⫻ P(D2)—but it is not a member of the relation A (capital-city, country). A fuzzy tuple t is any member of both r and P(D1) ⫻ P(D2) ⫻ ⭈ ⭈ ⭈ ⫻ P(Dm). An arbitrary tuple is of the form ti ⫽ [di1, di2, . . ., dim] where dij 債 Dj. An interpretation 움 ⫽ [a1, a2, . . ., am] of a tuple ti ⫽ [di1, di2, . . ., dim] is any value assignment such that aj 僆 dij for all j. In summary, the space of interpretations is the set cross product D1 ⫻ D2 ⫻ ⭈ ⭈ ⭈ ⫻ Dm. However, for any particular
125
relation, the space is limited by the set of valid tuples. Valid tuples are determined by an underlying semantics of the relation. Note that in an ordinary relational database, a tuple is equivalent to its interpretation. Some aspects of the max-min transitivity in a similarity can cause difficulty in modeling the relationship between domain elements. It can be difficult to formulate the transitive property of the relationship correctly. Furthermore at some 움 level, domain elements only weakly related can be forced together in a merged set of retrieved values. The essential characteristic that produces the desirable properties of uniqueness and well-defined operations is partitioning of the attribute domains by the similarity relationship. Shenoi and Melton (9) show how to use proximity relations (nontransitive) for the generation of partitions of domains. The fuzzy relational model is extended by replacing similarity relations with proximity relations on the scalar domains. Recall that a proximity relation P(x, y) is reflexive and symmetric but not necessarily transitive. This can also be related to a more generalized approach to equivalence relations for a fuzzy database model (19). Possibility Theory-Based Database Models. In the possibility theory-based approach (11,20), the available information about the value of a single-valued attribute A for a tuple t is represented by a possibility distribution 앟A(t) on D 傼 兵e其 where D is the domain of the attribute A and e is an extra-element that stands for the case when the attribute does not apply to t. The possibility distribution 앟A(t) can be viewed as a fuzzy restriction of the possible value of A(t) and defines a mapping from D 傼 兵e其 to [0, 1]. For example, the information ‘‘Paul has considerable experience’’ (앟e(p)) will be represented by (᭙d 僆 D): πe( p) (e) = 0 and πe( p) (d) = µc (d) Here 애c is a membership function that represents the vague predicate ‘‘considerable’’ in a given context, such as the number of years of experience or the number of years of education. It is important to notice that the values restricted by a possibility distribution are considered as mutually exclusive. The degree 앟A(t)(d) rates the possibility that d 僆 D is the correct value of the attribute A for the tuple t. Note that 앟A(t)(d) ⫽ 1 only means that d is a completely possible value for A(t), but it does not mean that it is certain that d is the value of A for the tuple (or in other words that d is necessarily the value of A for t), unless ∀d = d, πA(t ) (d ) = 0 Moreover, the possibility distribution 앟A(t) should be normalized on D 傼 兵e其 (i.e., ᭚d 僆 D such that 앟A(t)(d) ⫽ 1 or 앟A(t)(e) ⫽ 1). This means that it must be the case that at least one value of the attribute domain is completely possible or that the attribute does not apply. The following null value situations may be handled in this framework: 1. Value of A for t is completely unknown: ᭙d 僆 D, 앟A(t)(d) ⫽ 1, 앟A(t)(e) ⫽ 0. 2. The attribute A does not apply for the tuple t: ᭙d 僆 D, 앟A(t)(d) ⫽ 0, 앟A(t)(e) ⫽ 1.
126
FUZZY INFORMATION RETRIEVAL AND DATABASES
3. It is not clear whether situation 1 or 2 applies: ᭙d 僆 D, 앟A(t)(d) ⫽ 1, and 앟A(t)(e) ⫽ 1.
respect to a relation r having domain sets D1, D2, . . ., Dm, each factor Vj must be
Thus, such an approach is able to represent, in a unified manner, precise values (represented by singletons), null values, and ill-known values (imprecise ones represented by crisp sets or vague ones represented by fuzzy sets). In this approach, multiple-valued attributes can be formally dealt with in the same manner as single-valued ones, provided that possibility distributions defined on the power set of the attribute domains rather than on the attribute domains themselves are used. Indeed, in the case of multiple-valued attributes, the mutually exclusive possibilities are represented by subsets of values.
1. a domain element a, a 僆 Dj, where Dj is a domain set for r, or 2. a domain element modified by one or more linguistic modifiers (e.g., NOT, VERY, MORE-OR-LESS).
Possibility and Necessity Measures. If two values a and b are described by their respective possibility distributions 앟a and 앟b, then they can be compared according to the extension principle (21). This leads to two degrees, expressing the extent to which the values possibly and necessarily satisfy the comparison relation. For equality, these degrees are given by
poss(a = b) = supx,y (min(πa (x), πb ( y), µ = (x, y))) nec(a = b) = 1 − supx,y (min(πa (x), πb ( y), µ = (x, y))) = infx,y (max(1 − πa (x), 1 − πb ( y), µ = (x, y))) Of course, when a and b are precisely known, these two degrees collapse (and take their value in 兵0, 1其) because there is no uncertainty. Otherwise, the fact that two attribute values (in the same tuple or in two distinct tuples) are represented by the same possibility distribution does not imply that these values must be equal. For instance, if John’s experience is ‘‘considerable’’ and Paul’s experience is also ‘‘considerable,’’ John and Paul may still have different amounts (e.g., years) of experience. This point is just a generalization of what happens with null values (if John’s experience and Paul’s experience are completely unknown, both are represented by a null value, whatever its internal representation, even though their years of experience are potentially distinct). The equality of two incompletely known values must be made explicit and could be handled in the relational model in extending the notion of marked nulls. Querying Fuzzy Relational Databases In systems that are relationally structured and use fuzzy set concepts, nearly all developments have considered various extensions of the relational algebra. Its syntactic structure is modified to the extent that additional specifications are required. Use of the relational calculus with a similarity model has also been studied (22). The relational calculus provides a nonprocedural specification for a query and can be extended more easily to a higher-level query language. Similarity-Based Querying. To illustrate the process of query evaluation for similarity databases, we examine a generalized form of Boolean queries that may also be used to retrieve information (23). The details of query evaluation can be seen more easily in this sort of query. A query Q (ai, ah, . . ., ak) is an expression of one or more factors combined by disjunctive or conjunctive Boolean operators: Vi op Vh op ⭈ ⭈ ⭈ op Vk. In order to be well formed with
The relation r may be one of the original database relations or one obtained as a result of a series of fuzzy relational algebra operations. Fuzzy semantics apply to both operators and modifiers. An example query is MORE-OR-LESS big and NOT VERY VERY heavy where ‘‘big’’ is an abbreviation of the term (SIZE ⫽ big) in a relation having domain called SIZE. The value ‘‘heavy’’ is likewise an abbreviation. The linguistic hedge VERY can be interpreted as CON(F), concentration, and MORE-OR-LESS as DIL(F), dilation. A membership value of a tuple in a response relation r is assigned according to the possibility of its matching the query specifications. Let a 僆 Dj be an arbitrary element. The membership value 애a(b), b 僆 Dj, is defined based on the similarity relation sj(a, b) over the domain. The query Q( ⭈ ) induces a membership value 애Q(t) for a tuple t in the response r as follows: 1. Each interpretation I ⫽ [a⬘1, a⬘2, . . ., a⬘m] of t determines a value 애aj (a⬘j ) for each domain element aj, of Q (ai, ah, . . ., ak). 2. Evaluation of the modifiers and operators in Q( ⭈ ) over the membership values 애aj (a⬘j ) yields 애Q(I), the membership value of the interpretation with respect to the query. 3. Finally, 애Q(t) ⫽ maxI of t兵애Q(I)其. In short, the membership value of a tuple represents the best matching interpretation. The response relation is then the set of tuples having nonzero membership values. In practice, it may be more realistic to consider only the tuple with the highest value. Possibility-Based Framework for Querying. There are several approaches for querying relational databases where some incompletely known attribute values are represented by possibility distributions. One may distinguish between an approach that is set in a pure possibilistic framework (11) (approximate reasoning under uncertainty) and others that do not use such a strict theoretic framework (24–26). According to the possibilistic view (11), when a condition applies to imperfectly known data, the result of a query evaluation can no longer be a single value. Because the precise values of some attributes for some items are not known, the fact that these items do or do not satisfy the query (to some degree) may be uncertain. This is why the two degrees attached to two points of view are used: the extent to which it is possible (resp. certain) that the condition is satisfied. From the possibility distributions 앟A(t) and a subset P (ordinary or fuzzy), one can compute the fuzzy set ⌸P (resp. NP) of the items whose A-value possibly (resp. necessarily) satis-
FUZZY INFORMATION RETRIEVAL AND DATABASES
fies the condition P. The membership degrees of a tuple t to ⌸P and NP are, respectively, given by (27)
µP (t) = (P; A(t)) = supd∈D min(µP (d), πA(t ) (d)) µNP (t) = N(P; A(t)) = 1 − (P; A(t)) = 1 − supd∈D∪{e} min(µP (d), πA(t ) (d)) = infd∈D∪{e} max(µP (d), 1 − πA(t ) (d)) ⌸(P; A(t)) estimates to what extent at least one value restricted by 앟A(t) is compatible with P, and N(P; A(t)) estimates to what extent all the values more or less possible for A(t) are included in P. It can be shown that ⌸P and NP always satisfy the inclusion relation ⌸P 傶 NP (i.e., ᭙t, 애NP(t) ⱕ 애⌸P(t)), provided that 앟A(t) is normalized. If John’s age and the fuzzy predicate ‘‘middle-aged’’ are represented according to a possibility distribution, the evaluation of the condition: John’s age ⫽ ‘‘middle-aged’’ is based on the computation of the values: min(π ja (u), µma (u)) and max(1 − π ja (u), µma (u)) Thus, in case of incomplete information, it is possible to compute the set of items that more or less possibly satisfy an elementary condition and to distinguish the items that more or less certainly satisfy this condition. FUZZY INFORMATION RETRIEVAL Information retrieval systems (IRS) are concerned with the representation, storage, and accessing of a set of documents. These documents are often in the form of textual information items or records of variable length and format, such as books and journal articles (28). The specific aim of an IRS is to evaluate users’ queries for information based on a content analysis of the documents stored in the archive. In response to a user query, the IRS must identify what documents deal with the information being requested via the query and retrieve those that satisfy the query. Fuzzy IR models have been defined to overcome the limitations of the crisp Boolean IR model so as to deal with 1. discriminated (and possibly ranked) answers reflecting the variable relevance of the documents with respect to queries 2. imprecision and incompleteness in characterizing the information content of documents 3. vagueness and incompleteness in the formulation of queries Fuzzy extended Boolean models constitute a superstructure of the Boolean model by means of which existing Boolean IRSs can be extended without redesigning them completely. The softening of the retrieval activity in order to rank the retrieved items in decreasing order of their presumed relevance to a user query can greatly improve the effectiveness of such systems. This objective has been approached by extending the Boolean models at various levels. 1. Fuzzy extension of document representation—The aim here is to provide more specific and exhaustive repre-
127
sentations of the documents’ information content in order to lower the imprecision and incompleteness of the Boolean indexing. This is done by incorporating significance degrees, or index term weights, in the representation of documents (29). 2. Fuzzy generalization of Boolean query language—The objective here is to render the query language more expressive and natural than crisp Boolean expressions in order to capture the vagueness of user needs as well as simplify the user system interaction. This is carried out at two levels. The first is through the definition of more expressive, as well as soft, selection criteria that allow the specification of different importance levels of the search terms. Query languages based on numeric query term weights with different semantics have been presented as an aid to define more expressive selection criteria (30,31). Also, an evolution of these approaches introduced linguistic query weights specified by fuzzy variables (e.g., important or very important) to express different levels of importance for query terms (32). Incorporating fuzzy representations for documents in a Boolean IRS is a sufficient condition to improve the system with the ranking ability. As a consequence of this extension, the exact matching applied by a Boolean system can be softened to a partial matching mechanism, evaluating, for each document, the anticipated degree of satisfaction of the document with regard to a user’s query. The value thus generated is called a retrieval status value (RSV) and is used as the basis for ranking the documents. This ranking is used for retrieval and display of those documents. Fuzzy knowledge-based IRS models have been defined to index and retrieve documents in specific subject areas. To date, it has been found that IRSs are not adequate to deal with general collections. Reference 33 uses rules to represent semantic links between concepts; the nature of the links (e.g., synonymous terms, broader terms, narrower terms) and the strength of the links (represented by weights) are stored in the knowledge base and are defined by experts in the field. This is used to expand the query evaluation, by applying an inference process that allows one to find information that the user did not explicitly request but that is deemed ‘‘likely’’ to be of interest. Fuzzy Indexing Procedures In an information retrieval system, the generation of a representation of each document’s subject content is called indexing. The basic problem is to capture and synthesize the meaning of a document written in natural language. In defining an indexing procedure (which can be either manual or automatic), one must first consider retrieval performance, via a document representation that allows the IRS to be able to retrieve all the relevant documents and none of the nonrelevant documents in response to a user query and then also consider exhaustivity (describing fully all aspects of a document’s contents). The Boolean retrieval model can be associated with automatic text indexing. This model provides a crisp representation of the information content of a document. A document is
128
FUZZY INFORMATION RETRIEVAL AND DATABASES
formally represented by the set of its index terms: R(d) = {t t ∈ T, F (d, t) > 0} for d ∈ D in which the indexing (membership) function F correlating terms and documents is restricted to 兵0, 1其. Of course, F(d, t) ⫽ 1 implies the presence of term t in document d; and F(d, t) ⫽ 0 implies the absence of the term in the document. To improve the Boolean retrieval with a ranking ability, the Boolean representation has been extended within fuzzy set theory by allowing the indexing function F to take on values in the unit interval [0, 1]. Here, the index term weight F(d, t) represents the degree of significance of the concept as represented by term t in document d. This value can be specified between no significance [F(d, t) ⫽ 0] and full significance [F(d, t) ⫽ 1] and allows a ranking of the retrieval output, providing improved user satisfaction and system performance. Consequently, a document is represented as a fuzzy set of terms R(d) = {t, µd(t ) t ∈ T}
for d ∈ D
in which 애d(t) ⫽ F(d, t). This implies that F is a fuzzy set membership function, measuring the degree to which term t belongs to document d (34). Through this extension, the retrieval mechanism can compute the estimated relevance of each document relative to the query, expressed by a numeric score called a retrieval status value. The RSV denotes how well a document seemingly satisfies the query (35,36). The definition of the criteria for an automatic computation of F(d, t) is a crucial aspect; generally this value is defined on the basis of statistical measurements with the aim of optimizing retrieval performance. Fuzzy Associations. Another concept linked to automatic indexing to enhance the retrieval of documents is that based on fuzzy associations, named fuzzy associative information retrieval models (37–40). These associative information retrieval models work by retrieving additional documents that are not indexed by the terms in a given query but are indexed by other terms, associated descriptors, that are related to the query terms. Fuzzy association in information retrieval generally refers to the use of fuzzy thesauri where the links between terms are weighted to indicate strength of association. Moreover, this notion includes generalizations such as fuzzy pseudothesauri (41) and fuzzy associations based on a citation index (42). Ogawa et al. (43) propose a keyword connection matrix to represent similarities between keywords so as to reduce the difference between relationship values initially assigned using statistical information and a user’s evaluation. Generally, a fuzzy association between two sets X ⫽ 兵x1, . . ., xm其 and Y ⫽ 兵y1, . . ., yn其 is formally defined as a fuzzy relation: f : X × Y → [0, 1] By varying the semantics of the sets X and Y in information retrieval, different kinds of fuzzy associations can be derived.
Fuzzy Querying Two factors have been independently taken into account to extend the Boolean query language, making the selection criteria more powerful, and softening and enriching the aggregation operators. First, consider the basic query processing model. The main aim in extending the selection criteria is to provide users with the possibility of specifying differing importances of terms in order to determine which documents should be relevant. This has been achieved by preserving the Boolean structure of the query language and by associating with each term a numeric value to synthesize importance. Now, let’s define for Q ⫽ 兵a set of user queries for document其, a(q, t): Q ⫻ T 씮 [0, 1], where a(q, t) is the importance of term t in describing the query q and is called a query term weight. It is here that one begins to introduce problems in terms of maintaining the Boolean lattice (44). Because of that, certain mathematical properties can be imposed on F, but more directly on a and on the matching procedure. Moreover, there is a problem in developing a mathematical model that will preserve the semantics (i.e., the meaning) of the user query. The weight a can be interpreted as an importance weight, as a threshold, or as a description of the ‘‘perfect’’ document. Let g: [0, 1] ⫻ [0, 1] 씮 [0, 1] [i.e., g(F, a) is the RSV for a query q of one term t, with query weight a, with respect to a given document d, which has index term weight F(d, t) for the same term t]. This function g can be interpreted as the evaluation of the document in question along the dimension of the term t if the actual query has more than one term. It has been suggested that terms be evaluated from the bottom up, evaluating a given document against each term in the query and then combining those evaluations according to the query structure (45). Reference 46 shows that this criterion for a g function, called separability (47), preserves a homomorphism between the document evaluations for singleterm queries and the document evaluations for complex Boolean queries. A first formulation of the g function treats the a values as relative importance weights; for example, one could specify g ⫽ F*a. However, this can lead to problems, such as when using an AND (44). In this case, a very small value of a for one of the terms in an AND query will dominate the min function and force a decision based on the least important (smallest a) term, which is just the opposite of what is desired by the user. This problem is precisely what prompted some researchers to consider g functions that violate separability (31,48). To achieve consistency in the formalization of weighted Boolean queries, some approaches do not maintain all the properties of the Boolean lattice: Kantor (49) generates a mathematical formulation of the logical relationships between weighted queries, using a vapid query with all zero weights. FUTURE DIRECTIONS Several specialized aspects not covered in this article are of increasing research importance. Fuzzy functional dependencies relate to several issues for fuzzy databases including database design and integrity management (50,51). The actual
FUZZY INFORMATION RETRIEVAL AND DATABASES
application of uncertainty in deployed database systems is following two directions. The first is the addition of uncertainty in object oriented databases (52,53). This is due to newer developments in object-oriented databases and their inherent capabilities such as encapsulated methods. Another direction is that of fuzzy-front end querying (54,55). This approach allows a general use with existing databases and also permits fuzzy querying of crisp data. A good general survey of some of the issues in these directions is (56).
BIBLIOGRAPHY 1. E. Codd, Extending the database relational model to capture more meaning, ACM Trans. Database Sys., 4 (2): 156–174, 1979. 2. Anonymous, American National Standards Institute study group on database management: Interim report, ACM SIGMOD Rec., 7: 25–56, 1975. 3. J. Grant, Incomplete information in a relational database, Fundamenta Informaticae, 3 (4): 363–378, 1980. 4. W. Lipski, On semantic issues connected with incomplete information databases, ACM Trans. Database Syst., 4 (3): 262–296, 1979. 5. E. Wong, A statistical approach to incomplete information in database systems, ACM Trans. Database Systems, 7 (4): 479–488, 1982. 6. D. Barbara, H. Garcia-Molina, and D. Porter, The management of probabilistic data, IEEE Trans. Knowl. Data Eng., 4 (4): 487– 502, 1992. 7. F. Petry, Fuzzy Databases: Principles and Applications, Boston: Kluwer, 1996. 8. B. Buckles and F. Petry, A fuzzy model for relational databases, Fuzzy Sets and Systems, 7 (3): 213–226, 1982. 9. S. Shenoi and A. Melton, Proximity relations in fuzzy relational databases, Fuzzy Sets and Systems, 31 (2): 287–296, 1989. 10. M. Anvari and G. Rose, Fuzzy relational databases, in J. Bezdek (ed.), The Analysis of Fuzzy Information Vol. II, Boca Raton FL: CRC Press, 1987, pp. 203–212. 11. H. Prade and C. Testemale, Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries, Information Sci., 34 (2): 115–143, 1984.
129
20. H. Prade, Lipski’s approach to incomplete information databases restated and generalized in the setting of Zadeh’s possibility theory, Information Systems, 9 (1): 27–42, 1984. 21. L. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems, 1 (1): 3–28, 1978. 22. B. Buckles, F. Petry, and H. Sachar, A domain calculus for fuzzy relational databases, Fuzzy Sets and Systems, 29 (4): 327–340, 1989. 23. B. Buckles and F. Petry, Query languages for fuzzy databases, in J. Kacprzyk and R. Yager (eds.), Management Decision Support Systems Using Fuzzy Sets and Possibility Theory, Koln GR: Verlag TUV Rheinland, 1985, pp. 241–252. 24. M. Umano, Retrieval from fuzzy database by fuzzy relational algebra, in E. Sanchez and M. Gupta (eds.), Fuzzy Information, Knowledge Representation and Decision Analysis, New York: Pergamon, 1983, pp. 1–6. 25. Y. Takahashi, A fuzzy query language for relational database, IEEE Trans. Syst. Man. Cybern., 21 (6): 1576–1579, 1991. 26. H. Nakajima, T. Sogoh, and M. Arao, Fuzzy database language and library—fuzzy extension to SQL, Proc. Second International Conference on Fuzzy Systems, Los Alamitos, CA: IEEE Computer Society Press, 1993, pp. 477–482. 27. D. Dubois and H. Prade (with the collaboration of H. Farreny, R. Martin-Clouaire, and C. Testemale), Possibility Theory: An Approach to Computerized Processing of Uncertainty, New York: Plenum, 1988. 28. G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, New York: McGraw-Hill, 1983. 29. T. Radecki, Fuzzy set theoretical approach to document retrieval, Information Processing and Management, 15 (5): 247–260, 1979. 30. D. A. Buell and D. H. Kraft, A model for a weighted retrieval system, J. Amer. Soc. Information Sci., 32 (3): 211–216, 1981. 31. A. Bookstein, Fuzzy requests: An approach to weighted Boolean searches, J. Amer. Soc. Information Sci., 31 (4): 240–247, 1980. 32. G. Bordogna and G. Pasi, A fuzzy linguistic approach generalizing Boolean information retrieval: A model and its evaluation, J. Amer. Soc. Information Sci., 44 (2): 70–82, 1993. 33. D. Lucarella, Uncertainty in information retrieval: An approach based on fuzzy sets, Ninth Annual Int. Phoenix Conference on Computers and Communications, Los Alamitos CA: IEEE Computer Society Press, 1990, pp. 809–814.
12. M. Zemankova and A. Kandel, Implementing imprecision in information systems, Information Sci., 37 (1): 107–141, 1985.
34. L. J. Mazlack and L. Wonboo, Identifying the most effective reasoning calculi for a knowledge-based system, IEEE Trans. Syst. Man. Cybern., 23 (5): 404–409, 1993.
13. E. Rundensteiner, L. Hawkes, and W. Bandler, On nearness measures in fuzzy relational data models, Int. J. Approximate Reasoning 3 (4): 267–298, 1989.
35. D. A. Buell, A general model of query processing in information retrieval systems, Information Processing and Management, 17 (5): 236–247, 1981.
14. J. Medina, O. Pons, and M. Vila, Gefred: A generalized model to implement fuzzy relational databases, Information Sci., 47 (5): 234–254, 1994.
36. C. V. Negoita, On the notion of relevance in information retrieval, Kybernetes, 2 (3): 112–121, 1973.
15. B. Buckles and F. Petry, Uncertainty models in information and database systems, J. Information Sci.: Principles and Practice, 11 (1): 77–87, 1985. 16. C. Giardina, Fuzzy databases and fuzzy relational associative processors, Technical Report, Hoboken NJ: Stevens Institute of Technology, 1979. 17. J. Baldwin, Knowledge engineering using a fuzzy relational inference language, Proc IFAC Symp. on Fuzzy Information Knowledge Representation and Decision Analysis, pp. 15–21, 1983. 18. L. Zadeh, Similarity relations and fuzzy orderings, Information Sci., 3 (3): 177–200, 1971. 19. S. Shenoi and A. Melton, An extended version of the fuzzy relational database model, Information Sci., 51 (1): 35–52, 1990.
37. S. Miyamoto, Fuzzy sets in Information Retrieval and Cluster Analysis. Boston: Kluwer, 1990. 38. S. Miyamoto, Two approaches for information retrieval through fuzzy associations, IEEE Trans. Syst. Man. Cybern., 19 (1): 123– 130, 1989. 39. E. Neuwirth and L. Reisinger, Dissimilarity and distance coefficients in automation-supported thesauri, Information Systems, 7 (1): 54–67, 1982. 40. T. Radecki, Mathematical model of information retrieval system based on the concept of fuzzy thesaurus, Information Processing and Management, 12 (5): 298–317, 1976. 41. S. Miyamoto and K. Nakayama, Fuzzy information retrieval based on a fuzzy pseudothesaurus. IEEE Trans. Syst. Man. Cybern., 16 (2): 237–243, 1986.
130
FUZZY LOGIC
42. K. Nomoto et al., A document retrieval system based on citations using fuzzy graphs, Fuzzy Sets and Systems, 38 (2): 191–202, 1990. 43. Y. Ogawa, T. Morita, and K. Kobayashi, A fuzzy document retrieval system using the keyword connection matrix and a learning method, Fuzzy Sets and Systems, 39 (2): 163–179, 1991. 44. D. H. Kraft and D. A. Buell, Fuzzy sets and generalized Boolean retrieval systems, Int. J. Man-Machine Studies, 19 (1): 45–56, 1983. 45. S. C. Carter and D. H. Kraft, A generalization and clarification of the Waller-Kraft wish-list, Information Processing and Management, 25 (1): 15–25, 1989. 46. M. Bartschi, An overview of information retrieval subjects, Computer, 18 (5): 67–74, 1985. 47. W. G. Waller and D. H. Kraft, A mathematical model of a weighted Boolean retrieval system, Information Processing and Management, 15 (3): 235–245, 1979. 48. R. R. Yager, A note on weighted queries in information retrieval systems, J. Amer. Soc. Information Sci., 38 (1): 47–51, 1987. 49. P. B. Kantor, The logic of weighted queries, IEEE Trans. Syst. Man. Cybern., 11 (12): 151–167, 1981. 50. G. Chen, E. Kerre, and J. Vandenbulcke, A computational algorithm for the FFD transitive closure and a complete axiomatization of fuzzy functional dependency, Int. J. of Intelligent Systems, 9 (3): 421–440, 1994. 51. P. Saxena and B. Tyagi, Fuzzy functional dependencies and independencies in extended fuzzy relational database models, Fuzzy Sets and Systems, 69 (1): 65–89, 1995. 52. P. Bosc and O. Pivert, SQLf: A relational database language for fuzzy querying, IEEE Trans. Fuzzy Syst., 3 (1): 1–17, 1995. 53. J. Kacprzyk and S. Zadrozny, FQUERY for ACCESS: Fuzzy querying for Windows-based DBMS, in P. Bosc and J. Kacprzyk (eds.) Fuzziness in Database Management Systems, Heidelberg GR: Physica-Verlag, 1995, pp. 415–435. 54. R. George et al., Uncertainty management issues in the objectoriented data model, IEEE Trans. Fuzzy Syst., 4 (2): 179–192, 1996. 55. V. Cross, R. DeCaluwe, and N. VanGyseghem, A perspective from the Fuzzy Object Data Management Group, Proc. 6th Int. Conf. on Fuzzy Systems, Los Alamitos, CA: IEEE Computer Society Press, 1997, pp. 721–728. 56. V. Cross, Fuzzy information retrieval, J. Intelligent Information Syst., 11 (3): 115–123, 1994.
FREDERICK E. PETRY Tulane University
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3501.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Model Fundamentals Standard Article Fabrizio Russo1 1University of Trieste, Italy Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3501 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (2041K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Fuzziness and Uncertainty Fuzzy Sets Arithmetic of Fuzzy Numbers Fuzzy Relations Fuzzy Aggregation Connectives Linguistic Variables and Fuzzy Systems Parameterized Membership Functions About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3501.htm17.06.2008 15:58:09
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
FUZZY MODEL FUNDAMENTALS The concept of the fuzzy set was introduced in 1965 by Zadeh (1). After this important event, a large number of theoretical contributions were proposed and the formal framework of fuzzy set theory grew fast. For several years, fuzzy models were mainly devoted to specific problems in the areas of pattern recognition and decision-making (2, 3). In the mid-1980s, the successful development of fuzzy controllers opened up new vistas in the application of fuzzy models to engineering problems. Rule-based approaches emerged, in particular, as a powerful and general methodology for information processing. As a result, fuzzy systems became very attractive and the number of applications increased very rapidly in different fields (4–6). During the first half of the 1990s, important relationships with artificial neural networks were established. Fuzzy and neural techniques were presented from a common perspective (7), and new structures able to combine the advantages of fuzzy and neural paradigms were proposed (8–16). Fuzzy set computing is now a well-established problemsolving technology which aims at replacing (or improving) classical methods in a growing number of research and application areas including control systems, pattern recognition, data classification, signal processing, and low-level and high-level computer vision (17–28). The aim of this article is not to provide a thorough description of all concepts of fuzzy models. There is a large body of fuzzy literature devoted to this purpose. This article rather aims at presenting an up-to-date selection of most useful concepts from an electronic engineering perspective. For this reason, theoretical aspects and mathematical formalism will be kept to a minimum.
FUZZINESS AND UNCERTAINTY One of the key features of fuzzy models is their ability to deal with the uncertainty which typically affects physical systems and human activities. Unlike classical methods which resort to a crisp Yes/No approach, fuzzy models adopt a gradual approach which deals with degrees (or grades) of certainty. Let us focus on a simple example. If we observe the object A depicted in Fig. 1, we can easily see that it represents a square. How do we describe the object B in the same figure? It is more or less a square. It does not belong to the (crisp) class of squares, because it possesses round corners. However, it may partially belong to a fuzzy class of squares. Its degree of membership could be, for example, 0.8 (where unity denotes full membership). Conversely, object D is more or less a circle. It does not belong to the (crisp) class of circles, because it possesses straight lines. However, it may belong to a fuzzy class of circles to a certain extent. Depending on their shapes, all objects in Fig. 1 possess degrees of membership to both fuzzy classes. This simple example also highlights the difference between fuzziness and probability. This important subject has been addressed by different authors in the literature (3–17). It suffices here to observe that probability is related to the occurrence of events, whereas fuzziness is not. Again, let us focus on the object B in Fig. 1. A sentence like “Is it
probably a square?” is quite inappropriate to address the uncertainty which affects our process of characterizing the object. The object is not exactly a square. It is more or less a square. Fuzzy concepts represent the basis of human thinking and decision-making. Sentences are very often characterized by vagueness and linguistic imprecision. As an example, if we are driving a car, we could act according to the following statement: “If the speed is low and the vehicle ahead is more or less far away, then moderately increase the speed.” Despite their vague appearance, fuzzy concepts represent a powerful way to condense information about real life. The great success of fuzzy models is the result of combination of the following key features: 1. Effectiveness in representing the knowledge about a problem 2. Effectiveness in processing this knowledge by adopting a numerical framework FUZZY SETS A fuzzy set can be considered a generalization of a classical (“crisp”) set. In classical set theory, the degree of membership of an element to a set is either zero (no membership) or unity (full membership). The membership of an element to a crisp set, say A, is described by the characteristic function χA :
No partial membership is allowed. Fuzzy set theory permits us to deal with partial membership. A fuzzy set F is indeed represented as a set of ordered pairs (2):
where U is the universe of discourse (i.e., the collection of objects where the fuzzy set is defined) and µF (x) is the membership function that maps U to the real interval [0, 1]:
For each element x ∈ U, the function µF (x) yields a real number which represents the degree (or grade) of membership of x to the fuzzy set F (0 ≤ µF (x) ≤ 1). As an example, let us consider the fuzzy set: F = numbers close to 4. A possible membership function µF describing this fuzzy set is represented in Fig. 2. It can be observed that the maximum degree of membership is obtained for x = 4: µF (4) = 1. The closer the number to 4 the more is the membership to F. On the contrary, a number very different from 4 is assigned a low (or zero) membership degree, as it should be. The difference between fuzzy and crisp sets is graphically highlighted in the same figure which shows the characteristic function χA of the crisp set A = real numbers between 3 and 5. According to the “crisp” nature of set A, we observe a hard transition from full membership to no membership and vice versa. As a second example, let U = {0, 1, 2, . . . , 255} be the set of integers ranging from 0 to 255. Such a universe may
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright © 2007 John Wiley & Sons, Inc.
2
Fuzzy Model Fundamentals
Figure 1. Example of crisp and fuzzy classes.
Figure 2. Example of membership and characteristic functions. Figure 3. Example of fuzzy sets dark (DK), medium (MD), and bright (BR).
represent the set of possible gray levels (or luminances) of a digitized image, as represented in Fig. 3 (0 = black, 255 = white). Let us define three fuzzy sets labeled dark (DK), medium (MD), and bright (BR) by means of the membership functions µDK , µMD , and µBR depicted in the same figure. It is worth pointing out that almost all pixel luminances possess a nonzero degree of membership to more than one fuzzy set. For example, if we choose a pixel luminance x = 135 as shown in Fig. 3, we have µDK (x) = 0.08, µMD (x) = 0.91 and µBR (x) = 0.24. The concept of membership function plays a key role in fuzzy modeling. Indeed, properties and operators dealing with fuzzy sets can be easily defined in terms of membership functions. The use of linguistic labels to identify fuzzy sets is also quite common. Linguistic labels are often associated with simple operations which change or modify the “shape” of a fuzzy set. Complement of a Fuzzy Set. The complement F¯ of fuzzy set F is described by the membership function:
The linguistic label which is usually adopted is “NOT.” As an example, the membership function of fuzzy set NOT DARK is represented in Fig. 4. Union of Fuzzy Sets. The union Fun = F1 ∪ F2 of fuzzy sets F1 and F2 is described by the membership function:
The commonly used linguistic label is “OR.” The membership function of fuzzy set DARK OR MEDIUM is shown in Fig. 5. Intersection of Fuzzy Sets. The intersection Fint = F1 ∩ F2 of fuzzy sets F1 and F2 is described by the membership function:
Fuzzy Model Fundamentals
3
in Fig. 7 too. Other fuzzy modifiers can be found in Refs. 2 and 17. We previously used Eq. (2) to generically represent a fuzzy set F. When the universe U is continuous, the following expression is also adopted in the fuzzy literature (17): Figure 4. Complement of fuzzy set DARK.
On the contrary, when U is discrete, fuzzy set F is often expressed in the following form:
Figure 5. Union of fuzzy sets DK and MD.
Figure 6. Intersection of fuzzy sets DK and MD.
Figure 7. Examples of concentration and dilation.
The associated label is “AND.” The membership function of fuzzy set DARK AND MEDIUM is represented in Fig. 6. It should be noted that the above definitions are generalizations of the corresponding definitions for crisp sets. Linguistic Modifiers. Linguistic modifiers (also called linguistic hedges) operate on membership functions in order to modify the meaning of the corresponding fuzzy set. Two popular modifiers are described here. Concentration is a modifier that operates on the membership function of a fuzzy set F in order to decrease values smaller than unity. A commonly used definition (2) is:
(Remember that 0 ≤ µF (x) ≤ 1.) The typical linguistic label is “VERY.” The membership function of fuzzy set VERY DARK is depicted in Fig. 7. Dilation is a modifier that operates on the membership function of a fuzzy set F in order to increase values smaller than unity. A typical definition is yielded by the following relationship:
The associated label is “MORE OR LESS.” The membership function of fuzzy set MORE OR LESS DARK is represented
Of course, integral and summation symbols in the above expressions do not mean integration and arithmetic addition. They are used to denote the collection of all elements x ∈ U. The slash symbol is also typically adopted to associate x with the corresponding degree of membership. Let us introduce some specific terminology. Support. The support of a fuzzy set F on the universe U is the crisp set S(F) formed by the elements having nonzero degree of membership:
Crossover Point. The crossover point of a fuzzy set F is an element xc with membership degree µF (xc ) = 0.5. Fuzzy Singleton. A fuzzy singleton is a fuzzy set whose support is a single element x with µF (x) = 1. Normal Fuzzy Set. A fuzzy set F is said to be normal if maxx∈U {µF (x)} = 1. α-Level Set. The α-level set (α-cut) of fuzzy set F is the crisp set defined by the following relationship (2):
The strong α-level set is defined as:
A more general definition resorts to the concept of αlevel set. A fuzzy set is convex if all its α-level sets are convex (as crisp sets). Convex Fuzzy Set. A fuzzy set F is said to be convex (4) if its support is a set of real numbers and the following relation applies for all x ∈ [x1 , x2 ] over any interval [x1 , x2 ]:
Extension Principle. The extension principle is commonly used to generalize crisp mathematical concepts to fuzzy sets (2). Let F be a fuzzy set on U and let y = f(x) denote a function from U to V (f:U → V). By extending the function f, the fuzzy set f(F) of V is defined as follows (4):
4
Fuzzy Model Fundamentals
Figure 9. Example of extended addition.
Figure 8. Example of a fuzzy relation.
The fuzzy set f(F) is also expressed by
A simple example is depicted in Fig. 8.
ARITHMETIC OF FUZZY NUMBERS Fuzzy Numbers. A fuzzy number is a normal and convex fuzzy set such that (2): 1. 2.
Only one element (called the mean value) has membership degree equal to unity. Its membership function is piecewise continuous.
In practice the above definition is often modified in order to include trapezoid-shaped fuzzy sets. Fuzzy arithmetic resorts to the extension principle in order to extend algebraic operations from crisp to fuzzy numbers. Since computational efficiency is an element of paramount importance for many applications, a simplified representation of a fuzzy number, called “LR representation,” is often adopted. A fuzzy number is of LR type (2) if its membership function is defined by means of two reference functions L (left) and R (right):
Figure 10. (a) Signal corrupted by impulse noise. (b) Result of fuzzy filtering.
A fuzzy interval is symbolically denoted by (xm , x m , α, β)LR . As mentioned above, the extension principle is used to extend some algebraic operations to fuzzy numbers. Let F1 and F2 be two fuzzy numbers of LR type: F1 = (xm1 , α1 , β1 )LR , F2 = (xm2 , α2 , β2 )LR . The following relations can be used to define extended addition and subtraction (2)
where xm is the mean value and α(α > 0) and β(β > 0) are called the left and right spreads, respectively. A fuzzy number of LR type is symbolically denoted by (xm , α, β)LR . The choice of functions L(u) and R(u) depends on the context. A fuzzy interval of LR type is very similarly defined by the membership function: As an example, let us consider the fuzzy numbers: “about 5” = (5, 3, 3)LR and “about 10” = (10, 3, 3)LR . In order to give fuzzy numbers a triangular shape, we adopt the following reference functions: L(u) = R(u) = max{0, 1 − u}. The result yielded by the extended addition (5, 3, 3)LR + (10, 3, 3)LR = (15, 6, 6)LR is depicted in Fig. 9.
Fuzzy Model Fundamentals
FUZZY RELATIONS A fuzzy relation R between sets U and V is a fuzzy set characterized by a membership function µR :U × V → [0, 1] and is expressed by (2):
As an example, let U = V be a set of real numbers. The relation: “x is much larger than y” can be described by the membership function:
If sets U and V represent finite sets U = {x1 , x2 , . . . , xm } and V = {y1 , y2 , . . . , yn }, a fuzzy relation R can be described by an m × n matrix (4):
where ai,j = µR (xi , yj ) represents the strength of association between a pair of elements.
5
the real interval [0, 1]. Fuzzy aggregation connectives can be grouped into the following classes (13,16,29): 1. Union connectives 2. Intersection connectives 3. Compensative connectives Union Connectives The simplest aggregation connective of union type is the mentioned “Max” operator. A useful generalization is represented by the family of union aggregators defined by Yager (30):
It can be observed that limp→∞ yU (µ1 , µ2 , . . . , µn ) = max(µ1 , µ2 , . . . , µn ). Thus, the range of this connective is between max and unity. In this respect, this aggregation connective is more optimistic than the MAX operator (13). By varying the value of parameter p from zero to +∞, different aggregation strategies can be realized. Intersection Connectives
Composition. Let R1 and R2 be two fuzzy relations defined in different product spaces:
The simplest aggregation connective of intersection type is the very popular “min” operator. A useful generalization is represented by the family of intersection aggregators defined by (30)
The above relations can be combined by means of the operation “composition.” A variety of methods have been proposed in the literature (2). For example, we could be interested in combining the relations R1 (patients, symptoms) and R2 (symptoms, diseases) in order to discover relationships between patients and diseases. The so-called max–min composition yields a resulting fuzzy relation described as follows:
It can be observed that limp→∞ yI (µ1 , µ2 , . . . , µn ) = min(µ1 , µ2 , . . . , µn ). Thus, the range of this connective is between min and zero. This aggregation connective is more pessimistic than the min operator. As in the previous case, different aggregation strategies can be realized by suitably varying the value of parameter p. Compensative Connectives
The max–∗ composition is a more general definition of composition (2). It is defined by the following membership function:
FUZZY AGGREGATION CONNECTIVES Minimum and maximum operators represent the simplest way to aggregate different degrees of membership. More sophisticated choices are available in the literature. They resort to fuzzy aggregation connectives. Fuzzy aggregation connectives are (possibly nonlinear) functions that map a set of membership (or certainty) values µ1 , µ2 , . . . , µN to
Compensative connectives can be categorized into the following classes depending on their aggregation structure: 1. Mean operators 2. Hybrid operators Mean Connectives. A mean connective is a mapping m: [0, 1] × [0, 1] → [0, 1] such that 1. m(µ1 , µ2 ) ≥ m(µ3 , µ4 ) if µ1 ≥ µ3 and µ2 ≥ µ4 2. min(µ1 , µ2 ) ≤ m(µ1 , µ2 ) ≤ max(µ1 , µ2 ) A useful mean connective is the generalized mean (31). By using this connective, different degrees of certainty (or criteria) can be suitably weighted in order to take care of their
6
Fuzzy Model Fundamentals
relative importance:
where n i=1 wi = 1. It is worth pointing out that this connective yields all values between min and max by varying the parameter p between p → −∞ and p → +∞. Hybrid Connectives. Hybrid connectives combine outputs of union and intesection operators (29). This combination is generally performed using a multiplicative or an additive model as follows:
where yU and yI denote the outputs of union and intersection operators. The degree of compensation between these components depends on the value of the parameter γ. The multiplicative γ-model proposed by Zimmermann and Zysno (32) adopts union and intersection components based on products:
If x is low and y is medium, then z is large. The typical IF–THEN structure of a fuzzy rule includes a group of antecedent clauses which define conditions and a consequent clause which identifies the corresponding action. In general, fuzzy systems adopt rules to map fuzzy sets to fuzzy sets (7). Many engineering applications, however, require techniques which map scalar inputs to scalar outputs. We can address this issue by adding an input fuzzifier and an output defuzzifier to the classical model (17). The result is a very important class of fuzzy systems which are able to map scalar inputs to one (or more) scalar output(s). Since the successful application of these systems is playing a key role in the widespread diffusion of fuzzy techniques, we shall decribe their structure in details. Let us consider a fuzzy system which maps M input variables x1 , x2 , . . . , xM to one output variable y by means of N fuzzy rules R1, R2, . . . , RN. Such a system can be expressed in the following form:
where n i=1 wi = n and 0 ≤ γ ≤ 1. The additive γ-model is defined by
The additive γ-model adopting Yager’s union and intersection is defined by
LINGUISTIC VARIABLES AND FUZZY SYSTEMS As mentioned in the section entitled “Fuzzy Set,” fuzzy models permit us to express concepts in a way that is very close to human thinking. In fact, linguistic labels can be associated with fuzzy sets in order to form sentences like “the pixel luminance is very bright,” “the voltage is low,” “the temperature is high,” and so on. In this respect, quantities such as pixel luminance, voltage, and temperature can be interpreted as linguistic variables—that is, variables whose values are words or sentences (17). For example, the linguistic variable pixel luminance can be decomposed into a set of terms such as dark, medium, and bright (Fig. 3) which correspond to fuzzy sets in its universe of discourse. Fuzzy rules permit us to express a processing strategy in a form that mimics human decision making. For example:
where Ai,j (1 ≤ i ≤ M, 1 ≤ j ≤ N) is the fuzzy set associated with the ith input variable in the jth rule and Bj is the fuzzy set associated with the output variable in the same rule. The set of fuzzy rules as a whole is called a rulebase. Since the fuzzy rulebase contains the necessary information to process the data, it represents the knowledge base of the system. The knowledge base is numerically processed by the fuzzy inference mechanism. For a given set of input data, the inference mechanism evaluates the degrees of activation of the component rules and then combines their resulting effects. More precisely, let λj be the degree of activation (or satisfaction) of the jth rule. This degree can be evaluated by using the following relation:
where µAi , j denotes the membership function of fuzzy set Ai,j . It should be noticed that the choice of an intersection connective to aggregate membership degrees depends on the presence of the “AND” for combining the antecedent clauses in each fuzzy rule. Of course, different aggregation connectives (see the section entitled “Fuzzy Aggregation Connectives”) can be adopted depending on the specific problem. The degree of activation λj yields the following effect on fuzzy set Bj which identifies the consequent action of the jth rule. A new fuzzy set B j is generated, whose member-
Fuzzy Model Fundamentals
7
ship function is defined by
Two different inference schemes are commonly used. If the correlation-product inference is adopted (7), symbol “∗” denotes the product operator. If, on the other hand, the correlation-minimum inference is chosen, symbol “∗” denotes the minimum operator. Fuzzy sets B j (j = 1, . . . , N) are then combined in order to obtain a resulting fuzzy set B. If we resort to the union, the corresponding membership function µB (u) is yielded by
If we adopt the additive model (7), on the contrary, we obtain
where K is a scaling factor that limits the degree of membership to unity. As a final step, we want to derive a scalar value from the fuzzy set B. A very popular technique is the so-called “centroid” or “center of gravity” method which yields the output y as follows:
where V denotes the support of fuzzy set B. (If this support is discrete, summation should replace the integral symbol. Of course, integral and summation symbols here denote integration and arithmetic addition.) Notice that if we adopt the additive scheme, we can evaluate the output y by means of the centroids y j of the component fuzzy sets B j :
where
Let us adopt correlation-product inference. Relations (42) and (43) become:
Figure 11. Fuzzy sets positive (PO), zero (ZE), and negative (NE).
Thus, we can express relation (41) as follows:
Relation (46) is very attractive from the point of view of computational efficiency. In fact, the component terms wj and yj do not depend on λj . If all consequent fuzzy sets Bj have the same shape, i.e., wj = w(j = 1, . . . , N), we finally obtain
In this case, the final output only depends on the degrees of activation of fuzzy rules and on the centroids of the original consequent fuzzy sets. Let us consider a simple example. Let {sk } be the digitized signal in the range [0, L − 1] depicted in Fig. 10(a). This signal represents a staircase waveform corrupted by impulse noise. Suppose we want to design a filter able to reduce (or possibly cancel) the noise pulses (33). Let sk be the sample to be processed at the time k. Let k−1 = sk − sk−1 and k+1 = sk − sk+1 be the amplitude differences between this element and the neighboring samples sk−1 and sk+1 , respectively. In order to estimate the noise amplitude nk , we may use the following fuzzy system:
where PO (positive), ZE (zero), and NE (negative) are triangular fuzzy sets represented in Fig. 11. The first fuzzy rule (R1) aims at detecting a positive noise pulse (i.e., a noise pulse whose amplitude is higher than the one of the neighborhood). The second fuzzy rule (R2) aims at detecting a negative noise pulse (i.e., a noise pulse whose amplitude is lower than the one of the neighborhood). The third fuzzy rule (R3) deals with the absence of any noise pulse (i.e., with the case of an uncorrupted sample). Formally, we have A1,1 = PO, A2,1 = PO, A1,2 = NE, A2,2 = NE, A1,3 = ZE, A2,3 = ZE, B1 = PO, B2 = NE, B3 = ZE. The degrees of activation λ(k) 1 , λ(k) 2 , λ(k) 3 of three rules at the time k are evaluated by
8
Fuzzy Model Fundamentals
Figure 12. Fuzzy sets zero (ZE) and nonzero (NZ).
Suppose we adopt correlation-product inference and the additive model. Since all fuzzy sets have the same shape, the output is yielded by relation (47). We observe, in particular, that the centroids have the following values (Fig. 11): yPO = L − 1, yZE = 0 and yNE = −L + 1. Thus, we have
Figure 13. Resulting edge map.
data. In general, neuro-fuzzy models can be successfully adopted to find the most appropriate rulebase for a given application. PARAMETERIZED MEMBERSHIP FUNCTIONS
Let s k = sk − nk be the output of the filter. The result of the application is shown in Fig. 10(b). As a second example, let us consider the digitized image in Fig. 3. Let xi,j be the pixel luminance at location (i, j). Let i,j −1 = xi,j − xi,j −1 and i−1,j = xi,j − xi−1,j be the luminance differences between this element and the neighboring pixels at locations (i, j − 1) and (i − 1, j), respectively. Let us suppose we want to detect edges in the image—that is, possible object borders (34). Our goal is to produce another image (called “edge map”) where dark pixels denote uniform regions and bright pixels denote possible object contours. In order to perform this task, we define a pair of fuzzy rules as follows:
where yi,j is the luminance of the pixel at location (i, j) in the edge map. Zero (ZE) and nonzero (NZ) are fuzzy sets in the interval [−L + 1, L − 1] (Fig. 12). White (WH) and black (BL) are fuzzy singletons centered on L − 1 and zero. We can evaluate the degrees of activation λ(i,j ) 1 and λ(i,j ) 2 by using simple intersection and union aggregators:
Fuzzy systems are powerful tools for data processing. However, it is not always necessary to express fuzzy reasoning in form of rules. Sometime one (ore more) parameterized fuzzy sets suffice. As an example, let us consider the filtering of Gaussian noise in digital images. It is known that noise having Gaussian-like distribution is very often encountered during image acquisition. Our goal is to reduce the noise without (significantly) blurring the image details. A simple idea is to adopt a fuzzy weighted mean filter for this purpose (39, 40). Again, let us suppose we deal with digitized images having L gray levels (typically L = 256). Let xi,j be the pixel luminance at location (i, j) in the noisy image and let i+m,j+n = xi,j − xi+m,j+n be the luminance difference between this element and the neighboring pixel at location (i+m, j+n). The output yi,j of the fuzzy weighted mean filter is defined by the following relationships: yi, j =
N N
wi+m, j+n xi+m, j+n
(55)
m=−N n=−N
wi+m, j+n = N m=−N
µSM (i+m, j+n )
N
n=−N
µSM (i+m, j+n )
(56)
where µSM (u) is the membership function of fuzzy set small. Let us define this set by resorting to a bell-shaped parameterized function: The output yi,j is yielded by
The result is shown in Fig. 13. Fuzzy inference schemes different from that described above are also possible. As an example, the well-known Takagi–Sugeno Model (4, 35) found wide application in the design of fuzzy controllers. More sophisticated approaches are also available in the literature (36–38). In any case an appropriate choice of fuzzy sets and rules plays a key role in determining the desired behavior of a fuzzy system. If we adopt parameterized membership functions, we can try to acquire the optimal fuzzy set shapes from a set of training
u 2 µSM (u) = exp{−( ) } c
(57)
A graphical representation of µSM (u) is depicted in Fig. 14 for three different values of the parameter c (u≥0). According to (58–59), the algorithm performs a weighted mean of the luminance values in a (2N+1) × (2N+1) window around xi,j . The weights are chosen according to a simple fuzzy model: small luminance differences (possibly) denote noise, while large luminance differences denote object contours. Thus, when i+m,j+n is small, the corresponding wi+m,j+n is large and vice versa. As a result, the processing gradually excludes pixel luminances that are different from xi,j in order to preserve image details. The value of the parameter c mainly depends upon the variance of the Gaussian noise. Typically, this value is chosen so that a suitable per-
Fuzzy Model Fundamentals
9
Figure 14. Graphical representation of the membership function µSM (u) for three different values of the parameter c.
formance index is maximized, for example, the well-known peak signal-to-noise ratio (PSNR), which is defined as:
PSNR = 10log 10 ( i
i
j j
(L − 1)2
(yi, j − si, j )2
)
(58)
where si,j and yi,j denote the pixel luminances of the original noise-free image and the filtered image, respectively, at location (i,j). This procedure is briefly depicted in Fig. 15. An example of processed data is also reported in Fig. 16. We generated the picture in Fig. 16a by adding Gaussian noise with variance σ 2 =100 to the original noise-free image. The result of the application of the fuzzy filter is reported in Fig. 16b (N=2). Details of the noisy and the processed images are respectively depicted in Fig 16c and 16d for visual inspection. The noise reduction is apparent, especially in the uniform regions of the image. According to our previous observation, we chose the parameter value that gives the maximum PSNR (Fig. 17). Larger values would increase the image blur, smaller values would leave some noise unprocessed. It is worth pointing out that we can define the same filtering operation by resorting to the concept of fuzzy relation. In an equivalent way, we can formally define the weights of the filter as follows: wi+m, j+n = N m=−N
µEQ (xi, j , xi+m, j+n )
N
n=−N
µEQ (xi, j , xi+m, j+n )
(59)
where mEQ (u,v) is the parameterized membership function that describes the fuzzy relation “u is equal to v”: µEQ (u, v) = exp{−(
u−v 2 ) } c
(60)
A graphical representation of mEQ (u,v) is shown in Fig. 18 (c=40). The fuzzy weighted mean filter is not the only available scheme for reducing Gaussian noise. Other approaches are possible (40). For example, we can adopt fuzzy models to estimate the noise amplitude gi,j and then subtract it from
Figure 16. (a) Image corrupted by Gaussian noise, (b) filtered image, (c) detail of the noisy image, (d) detail of the filtered image.
10
Fuzzy Model Fundamentals
Figure 15. Block diagram of the procedure for parameter tuning.
> xi,j−1 , xi,j > xi−1,j , xi,j < xi,j−1 , xi,j < xi−1,j . As a result, µDI (xi,j , xi,j−1 ) ≈1 and/or µDI (xi,j , xi−1,j ) ≈1, and the output of the edge detector is yi,j ≈ L−1. Conversely, in the presence of an uniform region, we have xi,j ≈ xi,j−1 and xi,j ≈ xi−1,j . Thus the output becomes yi,j =0, as it should be. The parameters b and c control the actual behavior of the edge detector. Large values of these parameters can be chosen to decrease its sensitivity to fine details and to noise.
Figure 17. Filtering performance.
the pixel luminance xi,j , as follows: 1 (xi, j − xi+m, j+n ) µSI (xi, j , xi+m, j+n ) 8 1
gi, j =
1
(61)
m=−1n=−1
yi, j = xi, j − gi, j
(62)
where µSI (u,v) is the parameterized membership function of fuzzy relation “u is similar to v”. A possible definition is given by the following relationship (Fig. 19): 1 5c − |u − v| |u − v|
(63)
Often, more parameters can increase the effectiveness of the fuzzy processing. For example, let us define the fuzzy relation “u is different from v” by adopting the following two-parameters function: 0
|u − v|
1 − exp{−(
yi, j = (L − 1) MAX{µDI (xi, j , xi, j−1 ), µDI (xi, j , xi−1, j )} (65) The operation is very simple. In the presence of an object border, at least one of the following inequalities occurs: xi,j
Fuzzy Model Fundamentals
Figure 19. Graphical representation of the membership function µSI (u,v) describing the fuzzy relation “u is similar to v”.
11
12
Fuzzy Model Fundamentals
Figure 20. Graphical representation of the membership function µDI (u,v) describing the fuzzy relation “u is different from v”.
BIBLIOGRAPHY 1. L. A. Zadeh Fuzzy sets, Inf. Control, 8: 338–353, 1965. 2. H.-J. Zimmermann Fuzzy Set Theory and Its Applications, Norwell, MA: Kluwer, 1996. 3. J. C. Bezdek S. K. Pal Fuzzy Models for Pattern Recognition, Piscataway, NJ: IEEE Press, 1992. 4. T. Terano K. Asai M. Sugeno Fuzzy Systems Theory and Its Applications, New York: Academic Press, 1992. 5. A. L. Ralescu (ed.) Applied Research in Fuzzy Technology, Norwell, MA: Kluwer, 1994. 6. A. Kandel Fuzzy Expert Systems, Boca Raton, FL: CRC Press, 1991. 7. B. Kosko Neural Networks and Fuzzy Systems, Englewood Cliffs, NJ: Prentice-Hall, 1992. 8. J. M. Zurada et al., (eds.) Computational Intelligence Imitating Life, Piscataway, NJ: IEEE Press, 1994. 9. S. K. Pal S. Mitra Multilayer perceptron, fuzzy sets and classification, IEEE Trans. Neural Netw., 3: 683–697, 1992. 10. J.-S. R. Jang Self-learning fuzzy controllers based on temporal back-propagation, IEEE Trans. Neural Netw., 3: 714–723, 1992. 11. H. R. Berenji P. Khedkar Learning and tuning fuzzy logic controllers through reinforcements, IEEE Trans. Neural Netw., 3: 724–740, 1992. 12. H. Takagi et al. Neural networks designed on approximate reasoning architecture and their applications, IEEE Trans. Neural Netw., 3: 752–760, 1992. 13. J. M. Keller R. Krishnapuram F. C.-H. Rhee Evidence aggregation networks for fuzzy logic inference, IEEE Trans. Neural Netw., 3: 761–769, 1992. 14. L.-X. Wang J. M. Mendel Fuzzy basis functions, universal approximation and orthogonal least-squares learning, IEEE Trans. Neural Netw., 3: 807–814, 1992. 15. A. Ghosh et al. Self-organization for object extraction using a multilayer neural network and fuzziness measures, IEEE Trans. Fuzzy Syst., 1: 1993. 16. R. Krishnapuram J. Lee Fuzzy-connective-based hierarchical aggregation networks for decision making, Fuzzy Sets Syst., 46 (1): 11–27, 1992.
17. J. M. Mendel Fuzzy logic systems for engineering: A tutorial, Proc. IEEE, 83: 345–377, 1995. 18. W. Pedrycz (ed.) Fuzzy Modelling: Paradigms and Practice, Norwell, MA: Kluwer, 1996. 19. L. H. Tsoukalas, R. E. Uhrig, L. A. Zadeh, Fuzzy and Neural Approaches in Engineering, Wiley, 1997. 20. R, C. Berkan, S. L. Trubatch, Fuzzy Systems Design Principles: Building Fuzzy If-Then Rule Bases, Wiley, 1997. 21. F. Russo, Edge Detection in Noisy Images Using Fuzzy Reasoning, IEEE Transactions on Instrumentation and Measurement, vol. 47, n. 5, October 1998, pp. 1102–1105. 22. F. Russo, Recent Advances in Fuzzy Techniques for Image Enhancement, IEEE Transactions on Instrumentation and Measurement, vol. 47, n. 6, December 1998, pp. 1428–1434. 23. F. Russo, Noise Removal from Image Data Using Recursive Neurofuzzy Filters, IEEE Transactions on Instrumentation and Measurement, vol. 49, n. 2, April 2000, pp. 307–314. 24. S. K. Pal, A. Ghosh and M. K. Kundu (eds.) Soft-computing Techniques for Image Processing Physica-Verlag, Heidelberg, Germany, 2000. 25. M. Russo and L. C. Jain (eds.) Fuzzy Learning and Applications, CRC Press, Germany, 2001. 26. T. J. Ross, Fuzzy Logic with Engineering Applications, 2nd Edition, Wiley, 2004. 27. W. Siler, J. J. Buckley, Fuzzy Expert Systems and Fuzzy Reasoning, Wiley, 2005. 28. A. R. Varkonyi-K´ ´ oczy, A. R¨ovid, and M. da Gra¸ca Ruano, SoftComputing-Based Car Body Deformation and EES Determination for Car Crash Analysis Systems, IEEE Transactions on Instrumentation and Measurement, vol. 55, n. 6, December 2006, pp. 2304–2312. 29. J. Keller, Fuzzy logic and neural networks for computer vision, IEEE Video Tutorial 1992. 30. R. R. Yager, On a general class of fuzzy connectives, Fuzzy Sets Syst., 4: 235–241, 1980. 31. H. Dyckoff and W. Pedrycz, Generalized means as a model of compensation connectives, Fuzzy Sets Syst., 14 (2): 143–154, 1984. 32. H. J. Zimmermann and P. Zysno, Latent connectives in human decision making, Fuzzy Sets Syst., 4 (1): 37–51, 1980. 33. F. Russo, Fuzzy systems in instrumentation: Fuzzy signal processing, IEEE Trans. Instrum. Meas., 45: 683–689, 1996.
Fuzzy Model Fundamentals 34. F. Russo and G. Ramponi, Edge detection by FIRE operators, Proc. 3rd IEEE Int. Conf. Fuzzy Syst., FUZZ-IEEE ’94, Orlando, FL, June, 1994, pp. 249–253. 35. P. Baranyi, Y. Yam, A. R. Varkonyi-K´ ´ oczy, R. J. Patton, P. Michelberger, and M. Sugiyama, SVD-Based Complexity Reduction to TS Fuzzy Models, IEEE Transactions on Industrial Electronics, vol. 49, no. 2, April 2002, pp. 433–443. 36. Q. Liang and J. M. Mendel, Interval Type-2 Fuzzy Logic Systems: Theory and Design, IEEE Transactions on Fuzzy Systems, vol. 8, n. 5, October 2000, pp. 535–550. 37. J. M. Mendel and R. I. Bob John, Type-2 Fuzzy Sets Made Simple, IEEE Transactions on Fuzzy Systems, vol. 10, n. 2, April 2002, pp. 117–127. 38. J. M. Mendel, and H. Wu, Type-2 Fuzzistics for Symmetric Interval Type-2 Fuzzy Sets: Part 1, Forward Problems, IEEE Transactions on Fuzzy Systems, vol. 14, n. 6, December 2006, pp. 781–792. 39. S. Peng and L. Lucke, Fuzzy filtering for mixed noise removal during image processing in Proc. FUZZ-IEEE’94, Orlando, FL, June 26–29, 1994, pp. 89–93. 40. F. Russo, Nonlinear Filters Based on Fuzzy Models in Nonlinear Image Processing (eds.S. K. Mitra and G. L. Sicuranza), Academic Press, San Diego, CA, USA, 2001, pp. 355–374. 41. F. Russo, Technique for Image Denoising Based on Adaptive Piecewise Linear Filters and Automatic Parameter Tuning, IEEE Transactions on Instrumentation and Measurement, vol. 55, n. 4, August 2006, pp.
FABRIZIO RUSSO University of Trieste, Italy
13
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3507.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Neural Nets Standard Article Witold Pedrycz1 1University of Manitoba, Winnipeg, Canada Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3507 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (162K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Neurocomputing in Fuzzy Set Technology A Linguistic Interpretation of Computing With Neural Networks Fuzzy Neural Computing Structures Fuzzy Neurocomputing—an Architectural Fusion of Fuzzy and Neural Network Technology Conclusions About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3507.htm17.06.2008 15:58:37
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
166
FUZZY NEURAL NETS
FUZZY NEURAL NETS There is no doubt that neurocomputing and fuzzy set technology were dominant information technologies of the 1990s. The dominant paradigm of neurocomputing (1) is concerned with parallel and distributed processing realized by a vast number of simple processing units known as (artificial) neurons. The neural networks are universal approximators, meaning that they can approximate continuous relationships to any desired accuracy. This feature is intensively exploited in a vast number of applications of neural networks, in areas such as pattern recognition, control, and system identification. The underlying philosophy of fuzzy sets is that of a generalization of set theory with an intent of formalization of concepts with gradual boundaries (2). Through the introduction of fuzzy sets one develops a suitable conceptual and algorithmic framework necessary to cope with the most suitable level of information granularity. All constructs arising from the ideas of fuzzy sets hinge on the notion of information granularity and linguistic nonnumeric information, in particular. It becomes apparent that the technologies of fuzzy sets and neural networks are complementary: fuzzy sets deliver a suitable conceptual framework while neural networks furnish us with all necessary learning capabilities. The fuzzy set–neural networks synergy has already led to a number of interesting architectures that are usually referred to as fuzzy–neural systems or fuzzy neural networks. These are the models, the development of which heavily depends upon fuzzy sets and neurocomputing. The contribution
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
FUZZY NEURAL NETS
of the two technologies could vary from case to case. In some cases, we envision a significant dominance of fuzzy sets with some additional learning slant supported by neural networks. In other scenarios, one can witness structures that are essentially neural networks with some structural enhancements coming from the theory of fuzzy sets. In a nutshell, the existing diversity of such approaches calls for their systematic treatment that definitely helps us understand the benefits of the symbiosis and make the design of such systems more systematic. This study is organized in a way that unveils the main architectural and learning issues of synergy between neural networks and fuzzy sets. The agenda of this article is twofold: • First, we propose a general taxonomy of hybrid fuzzy– neural topologies by studying various temporal and architectural aspects of this symbiosis. • Second, our intent is to review some representative examples of hybrid structures that illustrate the already introduced typology thoroughly. NEUROCOMPUTING IN FUZZY SET TECHNOLOGY Generally speaking, in the overall fuzzy set–neural network hybrid methodology, neural networks are providers of useful computational and learning facilities. Fuzzy sets, as based on the mechanisms of set theory and multivalued logic, are chiefly preoccupied with the variety of aspects of knowledge representations. At the same time they tend to be somewhat weaker as far as their processing capabilities are concerned (interestingly enough, this claim becomes valid in the case of all constructs originating from set theory). In particular, settheoretic operations do not cope explicitly with repetitive information and cannot reflect this throughout their outcomes—the most evident examples arises in terms of highly noninteractive maximum and minimum operations. Being more specific, the result of the minimum (or maximum) operation relies on the extreme argument and does not tackle the remaining elements. Say min (0.30, 0.9, 0.95, 0.87, 0.96) is the same as the one of the expression min (0.30, 0.41, 0.32, 0.33, 0.31). There are a number of instances in which neural networks are used directly to support or realize computing with fuzzy sets. In general, in most of these cases, neural networks are aimed at straightforward computing through the utilization of membership values. Similarly, there are various approaches spearheaded along the line of the development of neural networks with the intent of processing fuzzy information. In this case there is not too much direct interaction and influence originating from the theory of fuzzy sets. It is essentially a way in which neural networks are aimed at the calibration of fuzzy sets—the construction of their membership functions is completed in the setting of numeric data available at hand.
167
• Preprocessing of training data that could easily lead to the improvement in learning and/or enhanced robustness characteristics of the network • Enhancements of specific training procedures through knowledge-based learning schemes (including learning metarules) • Linguistic interpretation of results produced by neural networks Each of these areas have specific and highly representative instances. We review them to expose the reader to the very nature of some important functional links between fuzzy sets and neural networks. Fuzzy Sets in the Preprocessing and Utilization of Training Data The function of fuzzy sets in this particular framework is to deliver an interface between the data (environment) and the neural network regarded primarily as a processing vehicle. As visualized in Fig. 1, the original data are transformed within the framework of the fuzzy set interface: the resulting format could be very different from the one encountered in the original environment. The intent of the interface is to expose the network to the most essential features of the data that need to be captured through the subsequent mechanisms of learning. These features are usually revealed as a part of the underlying domain knowledge. The notion of a cognitive perspective develops a suitable learning environment. By selecting a collection of socalled linguistic landmarks (2,3), one can readily meet several important objectives: • Performing a nonlinear normalization of the training data. By transforming any real data, any pattern x 僆 ⺢n becomes converted into the corresponding element of a highly dimensional unit hypercube. • Defining a variable (as opposed to fixed) processing resolution carried out within the resulting neural networks. • Coping with uncertainty in the training data. Let us briefly elaborate on the nature of these enhancements. Nonlinear Data Normalization. For each coordinate (variable) we define c linguistic terms. These are denoted en block
Fuzzy set interface
Data
Neural network
Fuzzy Sets in the Technology of Neurocomputing The key role of fuzzy sets is to enhance neural networks by incorporating knowledge-oriented mechanisms. Generally speaking, these knowledge-based enhancements of neural networks are threefold:
Figure 1. Fuzzy sets in interfacing neural networks with data environment: transforming data into a format assuring computational efficiency of neurocomputation.
168
FUZZY NEURAL NETS
as A ⫽ 兵A1, A2, . . ., Ac其 for the first coordinate, B ⫽ 兵B1,B2, . . .,Bc其 for the second, etc. Then the linguistic preprocessing P carries out the mapping of the form P : Rn → [0, 1]nc
(1)
More specifically, a numeric input x invokes (activates) a series of linguistic terms A1, A2, . . ., Ac, B1,B2, . . ., etc. As we are concerned with ‘‘n’’ dimensional inputs with ‘‘c’’ labels (fuzzy sets) associated each of them, we end up with n ⴱ c activation levels situated in the unit interval. Or, in other words, the results of this nonlinear transformation are located in the n ⴱ c-dimensional unit hypercube. Observe also that this preprocessing serves as a useful nonlinear data normalization. In contrast, the commonly exploited linear transformation defined as x − xmin xmax − xmin
(2)
(where xmin and xmax are the bounds of the variable) does not exhibit any nonlinear effect. The positive effect of data normalization has often been underlined in many studies on neural networks. The normalization is always recommended, especially if the ranges of the individual variables are very distinct, say [0,0.05] vis-a`-vis [106,108]. The direct use of rough (unscaled) data could easily lead to a completely unsuccessful learning. The nonlinear effect conveyed by Eq. (1) stems from the nonlinear membership functions of the linguistic terms. The linguistic preprocessing increases the dimensionality of the problem; however, it could also decrease the learning effort. The similar speedup effect in training is commonly observed in radial basic function (RBF) neural networks (4,5). The improvement in the performance achieved in this setting stems from the fact that the individual receptive fields modeled by the RBFs identify homogeneous regions in the multidimensional space of input variables. Subsequently, the updates of the connections of the hidden layers are less demanding, as a preliminary structure has been already established and the learning is oriented towards less radical changes of the connections and practically embarks on some calibration of the receptive fields. By modifying the form of the RBFs themselves, some regions exhibiting a significant variability of the approximated function are made smaller so that a single linear unit can easily adjust. Similarly, the regions over which the approximated function does not change drastically can be made quite large by adapting radial basis functions of lower resolution. In general, this concept leads to the concept of multiresolutionlike (fractal-oriented) neural networks. Variable Processing Resolution. By defining the linguistic terms (modeling landmarks) and specifying their distribution along the universe of discourse we can orient (focus) the main learning effort of the network. To clarify this idea, let us refer to Fig. 2.
Ω3
Ω1 Ω2
Ω4
Figure 2. Fuzzy quantization (partition, discretization) delivered by linguistic terms. Note a diversity of information granularity (variable processing resolution) captured by the respective linguistic terms.
The partition of the variable through A assigns a high level of information granularity to some regions (say ⍀1 and ⍀2) and sensitizes the learning mechanism accordingly. On the other hand, the data points falling under ⍀3 are regarded internally (at the level they are perceived by the networks) as equivalent (by leading to the same numeric representation in the unit hypercube). Uncertainty Representation. The factor of uncertainty or imprecision can be quantified by exploiting some uncertainty measures as introduced in the theory of fuzzy sets. The underlying rationale is to equip the internal format of information available to the network with some indicators describing how uncertain a given piece of data is. Considering possibility and necessity measures this quantification is straightforward: once Poss(X,Ak) ⬆ Nec(X,Ak), then X is regarded uncertain (the notion of uncertainty is also context-sensitive and depends on Ak) (6). For numerical data one always arrives at the equality of these two measures that underlines the complete certainty of X. In general, the higher the gap between the possibility and necessity measures, Poss(X,Ak) ⫽ Nec (X,Ak) ⫹ 웃, the higher the uncertainty level associated with X. The uncertainty gap attains its maximum for 웃 ⫽ 1. The way of treating the linguistic term makes a real difference between the architecture outlined above and the standard RBF neural networks. The latter ones do not have any provisions to deal with and quantify uncertainty. The forms of the membership function (RBFs) are very much a secondary issue. In general, one can expect that the fuzzy sets used therein can exhibit a variety of forms (triangular, Gaussian, etc.), while RBFs are usually more homogeneous (e.g., all assume Gaussian-like functions). Furthermore, there are no specific restrictions on the number of RBFs used as well as their distribution across the universe of discourse. For fuzzy sets one restricts this number to a maximum of 9 terms (more exactly, 7 ⫾ 2); additionally we make sure that the fuzzy sets are kept distinct and thus retain a clear semantic identity that supports their interpretation. Knowledge-Based Learning Schemes Fuzzy sets influence neural networks as far as learning mechanisms and interpretation of the constructed networks are constructed. The set of examples discussed in this section illustrates this point. Metalearning and Fuzzy Sets Even though guided by detailed gradient-based formulas, the learning of neural networks can be enhanced by making use of some domain knowledge acquired via intense experimentation (learning). By running a mixture of successful and unsuccessful learning sessions one can gain a qualitative knowledge on what an efficient leaning scenario should look like. In particular, some essential qualitative associations can be established by linking the performance of the learning process and the parameters of the training scheme being utilized. Two detailed examples follow. The highly acclaimed backpropagation (BP) scheme used in training multilevel neural networks is based upon the gradient of the performance index (objective index) Q. The basic update formula reads now as wij = wij − α
∂Q ∂wij
FUZZY NEURAL NETS
where wij stands for a connection (weight) between the two neurons (i and j). The positive learning rate is denoted by 움. Similarly ⭸Q/⭸wij describes a gradient of Q expressed with respect to wij. Obviously, higher values of 움 result in more profound changes (updates) of the connection. Higher values of 움 could result in faster learning that, unfortunately, comes at the expense of its stability (oscillations and overshoots in the values of Q). Under such circumstances, one may easily end up with the diverging process of learning. After a few learning sessions one can easily reveal some qualitative relationships that could be conveniently encapsulated in the form of ‘‘if– then’’ rules (7). if there are changes of Q(Q), then there are changes in α A collection of such learning rules (metarules) is shown in Table 1. These learning rules are fairly monotonic (yet not symmetric) and fully comply with our intuitive observations when it comes to the representation of the supervisory aspects of the learning procedures in neural networks. In general, any increase in Q calls for some decrease of 움; when Q decreases, then the increases in 움 need to be made more conservative. The linguistic terms in the corresponding rules are defined in the space (universe) of changes of Q (antecedents) and a certain subset of [0,1] (conclusions). Similarly, the BP learning scheme can be augmented by taking into account a momentum term; the primary intent of this expansion is to suppress eventual oscillations of the performance index or reduce its amplitude. This makes the learning more stable, yet adds one extra adjustable learning parameter in the update rule itself. The learning metarules rules can be formulated at the level of some critical parameters of the networks. The essence of the ensuing approach is to modify activation functions of the neurons in the network. Consider the sigmoid nonlinearity (that is commonly encountered in many neural architectures) y=
1 1 + exp(−γ u)
169
As before, it is intuitively straightforward to set up a collection of the detailed learning rules. Summing up, we highlight two crucial design issues: • The considered approach is fully experimental. The role of fuzzy sets is to represent and to summarize the available domain knowledge properly. • While the control protocol (rules) seems to be universal, the universes of discourse should be modified according to the current application (problem). In other words, the basic linguistic terms occurring therein need to be adjusted (calibrated). The realization of this phase calls for some additional computational effort, which could somewhat offset the benefits originating from the availability of the domain knowledge. Fuzzy Clustering in Revealing Relationships within Data In the second approach the domain knowledge about learning is acquired through some preprocessing of training data prior to running any specific learning scheme. This is the case in the construction known as a fuzzy perceptron (8). In an original setting, a single-layer perceptron is composed of a series of linear processing units equipped with the threshold elements. The basic perception-based scheme of learning is straightforward. Let us start with a two-category multidimensional classification problem. If the considered patterns are linearly separable, then there exists a linear discriminant function f such that > 0 if x is in class 1 f(x,w) = wT x < 0 if x is in class 2 where x,w 僆 ⺢n⫹1. The dimensionality of the original space of patterns (⺢n) has been increased due to a constant term standing in the linear discriminant function, say f(x,w) ⫽ wo ⭈ 1 ⫹ w1x1 ⫹ w2x2 ⫹ ⭈ ⭈ ⭈ ⫹ wnxn. After multiplying the class 2 patterns by ⫺1, we obtain the system of positive inequalities f(xk ,w) > 0
We assume that the steepness factor of the sigmoid function (웂) is modifiable. As the changes of the connections are evidently affected by this parameter (웂), we can easily set up metarules of the form: if the performance index is Qo then γ is γo
Table 1. BP-Oriented Learning Rules* ⌬Q
⌬움
NB NM NS Z PS PM PB
PB PM PS Z NM NB NB
*NB, negative big; NM, negative medium; NS, negative small; Z, zero; PS, positive small; PM, positive medium; PB, positive big.
where k ⫽ 1,2,. . ., N. The learning concerns a determination of the connections (w) so that all inequalities are made positive. The expression f(xk,w) ⫽ 0 defines a hyperplane partitioning the patterns; all class 1 patterns are located at the same side of this hyperplane. Assume that xk’s are linearly separable. The perception algorithm (shown as follows) guarantees that the discriminating hyperplane (vector w) is found in a finite number of steps. do for all vectors xk, ⫽ 1,2,. . .,N if wTxk ⱕ 0, then update the weights (connections) w ⫽ w ⫹ cxk c⬎0 end; the loop is repeated until no updates of w occur The crux of the preprocessing phase as introduced by Keller and Hunt (8) is to carry out the clustering of data and determine the prototypes of the clusters as well as compute the class membership of the individual patterns. The ensuing membership values are used to monitor the changes. Let uik
170
FUZZY NEURAL NETS
and u2k be the membership grades of the kth pattern. Definitely, u1k ⫹ u2k ⫽ 1. The outline of the learning algorithm is the same as before. The only difference is that the updates of the weights are governed by the expression
interpretation
i
w = w + cxk |u1k − u2k | p where p ⬎ 1. These modifications depend very much on the belongingness of the current pattern in the class. If u1k ⫽ u2k ⫽ 0.5, then the correction term is equal to zero and no update occurs. On the other hand, if u1k ⫽ 1, then the updates of the weights are the same as those encountered in the original perceptron algorithm. In comparison to the previous approaches, the methods stemming from this category require some extra computing (preprocessing) but relieve us from the calibration of the linguistic terms (fuzzy sets) standing in the learning metarules.
A LINGUISTIC INTERPRETATION OF COMPUTING WITH NEURAL NETWORKS This style of the usage of fuzzy sets is advantageous in some topologies of neural networks, especially those having a substantial number of outputs and whose learning is carried out in unsupervised form. Fuzzy sets are aimed at the interpretation of results produced by such architectures and facilitates processes of data mining. To illustrate the idea, we confine ourselves to self-organizing maps. An important property of such architecture is their ability to organize multidimensional patterns in such a way that their vicinity (neighborhood) in the original space is retained when the patterns are mapped onto a certain low-dimensional space so that the map attempts to preserve the main topological properties of the data set. Quite often, the maps are considered in the form of the two-dimensional arrays of regularly distributed processing elements. The mechanism of self-organization is established via competitive learning; the unit that is the ‘‘closest’’ to the actual pattern is given an opportunity to modify its connections and follow the pattern. These modifications are also allowed to affect the neurons situated in the nearest neighborhood of the winning neuron (node) of the map. Once the training has been completed, the map can locate any multidimensional pattern on the map by identifying the most active processing unit. Subsequently, the linguistic labels are essential components of data mining by embedding the activities of the network in a certain linguistic context. This concept is visualized in Fig. 3. Let us consider that for each variable we have specified a particular linguistic term (context) defined as a fuzzy set in the corresponding space, namely A1,A2, . . . and An1 for x1, B1,B2, . . . and Bn2 for x2, etc. When exposed to any input pattern, the map responds with the activation levels computed at each node in the grid. The logical context leads to an extra two-dimensional grid, the elements of which are activated based on the corresponding activation levels of the nodes located at the lower layer as well as the level of the contexts assumed for the individual variables. These combinations are of the AND form—the upper grid is constructed as a series of the AND neurons. The activation region obtained in this way indicates how much the linguistic
Kohonen map j
Ai
Bj Ck
x3
x2
x1
Linguistic terms (fuzzy sets) Figure 3. Self-organizing (Kohonen) map and its linguistic interpretation produced by an additional interpretation layer. This layer is activated by the predefined linguistic terms—fuzzy sets (Ai, Bj, Ck, etc.) activated by the inputs of the map.
description (descriptors) Ai
and B j
and Ck
and . . .
‘‘covers’’ (activates) the data space. The higher the activation level of the region the more visible the imposed linguistic pattern within the data set. By performing an analysis of this type for several linguistic data descriptors one can develop a collection of the descriptors that cover the entire data space. We may eventually require that this collection should cover the entire map to a high extent, meaning that ∃
∀
a>0 i, j=1,2,n
N(i, j, c) ≥ α
c∈C
where N(i, j,c) is the response of the neuron located at the (i, j) and considered (placed) in context c from a certain family of contexts C . Some other criteria could be also anticipated; for example, one may request that the linguistic descriptions are well separated, meaning that their corresponding activation regions in the map are kept almost disjoint. FUZZY NEURAL COMPUTING STRUCTURES The examples discussed in the preceding section have revealed a diversity of approaches taken towards building neural network–fuzzy architectures. Taking this into account, we distinguish between two key facets one should take into account in any design endeavor: • Architectural • Temporal These properties are exemplified in the sense of the plasticity and explicit knowledge representation of the resulting neural network–fuzzy structure. The strength of the interaction itself can vary from the level at which the technology of fuzzy sets and neurocomputing are loosely combined and barely coexist to the highest one where there emerges a genuine fusion between the technologies.
FUZZY NEURAL NETS
171
Architectures of Fuzzy–Neural Network Systems
Classes of Fuzzy Neurons
The essence of the architectural interaction of fuzzy sets and neural networks is visualized in Fig. 4. This point has already been made clear through the studies in the previous section. By and large, the role of fuzzy sets gets more visible at the input and output layers of any multilayer structure of the network. The input and output layers are much more oriented toward the capturing the semantics of data rather than focusing on pure numeric processing.
We elaborate on the three models that are representative of most of the existing hybrid architectures as developed in the setting of fuzzy sets and neurocomputing. The first fuzzy set–oriented construct proposed by Lee and Lee (11) was contrasted with the generic model of the neuron as discussed by McCulloch and Pitts as a binary device. Let us recall that the basic binary neuron has n excitatory inputs (e1,e2,. . .,en) and m inhibitory inputs (i1,i2,. . .,im). The firing of the neuron is binary: y is set to 1 if all inhibitory inputs are set to 0 and the sum of all excitatory inputs exceeds a threshold level,
Temporal Aspects of Interaction in Fuzzy–Neural Network Systems
The temporal aspects of interaction arise when dealing with the various levels of intensity of learning, Fig. 4. Again the updates of the connections are much more vigorous at the hidden layers—we conclude that their plasticity (that is an ability to modify the values of the connections) is higher than the others situated close to the input and output layers.
The main generalization proposed by Lee and Lee (11) was to consider that an activity of the neuron is continuous. More specifically, the output of the neuron is one of the positive numbers ui in [0, 1], i ⫽ 1,2, . . ., p, which means that the output (y) reads as
FUZZY NEUROCOMPUTING—AN ARCHITECTURAL FUSION OF FUZZY AND NEURAL NETWORK TECHNOLOGY
y=
In this section we concentrate on a certain category of hybrid processing in which the neurons combine a series of features that are essential to neural networks and symbolic processing. In fact, this is one of the approaches among these reported in the literature (9,10,11). Very often these basic constructs (fuzzy neurons) are exploited as generic building blocks in the development of fuzzy–neural network architectures.
ui
if the neuron is firing
0
otherwise
The firing rules are also restated accordingly: 1. All inhibitory inputs are set to 0. 2. The sum of excitatory inputs must be equal or greater than a threshold T.
Knowledge representation
Numeric processing and learning
Knowledge representation
Fuzzy sets
Neurocomputing
Fuzzy sets
Preprocessing
ei > T
i=1
Postprocessing
Plasticity
Number of layer
Figure 4. Architectural and temporal synergy of fuzzy set constructs and neural networks. Architectural level: fuzzy sets contribute to the construction of preprocessing and postprocessing modules (input and output layer) and are aimed at knowledge representation whereas numeric processing and learning occurs at the level of the hidden layers of the entire architecture. Temporal interactions: most parametric learning occurs at the level of hidden layers which exhibit high plasticity.
172
FUZZY NEURAL NETS
The original study (11) illustrates the application of such fuzzy neurons to a synthesis of fuzzy automata. An idea proposed by Buckley and Hayashi (12) is to develop a fuzzy neuron in the sense that its connections are viewed as fuzzy sets (more precisely, fuzzy numbers); similarly we consider the inputs to be fuzzy numbers. The underlying formula of the neuron generalizes from the pure numeric neurons and is expressed as
Y = f
n
!
Wi Xi +
i=1
Here the connections and bias and inputs are fuzzy numbers. Moreover, the operations (summation and product) are viewed in terms of fuzzy set operations. Here that the output of the neuron is a fuzzy number. All the operations in the expression that follows are carried out via the extension principle. To illustrate the relevant calculations, let us consider the fuzzy neuron with two inputs (X1 and X2). In light of the extension principle, the output of the fuzzy neuron reads as Y( y) = sup{min[X1 (x1 ),X2 (x2 ),W1 (w1 ),W2 (w2 ),(ϑ )]} where the supremum in the above expression is taken over all the arguments satisfying the nonlinear constraint
y= f
2
!
wi xi − i + ϑ
i=1
The idea developed by Bortolan (13) makes the previous concept of the fuzzy neuron more computationally attractive by restricting the form of the input fuzzy sets as well as the connections of the neuron to trapezoidal fuzzy sets T (x;a,b,c,d), see Fig. 5. Such piecewise membership functions represent uncertain variables. This drastically reduces computational overhead. The pertinent version of a well-known 웃 learning algorithm is covered in Ref. 13. Fuzzy Logic Neurons The main rationale behind this choice (14,15,16) is that the resulting neural networks effortlessly combine learning capabilities with the mechanism of knowledge representation in its explicit manner. The neurons are split into two main categories, namely aggregative and referential processing units. We discuss here aggregative neurons. In what follows, we denote t-novus by t (or T). S-novus will be denoted by s (or S).
a
b
c
d
Figure 5. An example of a trapezoidal fuzzy number: a and d denote a lower and upper bound of the linguistic concept. The elements situated in-between b and c belong to the concept at degree 1 and are indistinguishable.
The n-input OR processing unit is governed by the expression y = OR(x; w) namely n
y=
S(w tx ) i=1
i
i
The inputs of the neuron are described by x while the vector w summarizes its connections. The computations of the output (y) rely on the use of some triangular norms (s and t norm). Observe that if w ⫽ 1 then y ⫽ OR(x1,x2,. . .,xn) so that the neuron reduces to a standard OR gate encountered in digital logic. The AND neuron is described in the following form y = AND(x; w) or equivalently n
y=
T(w sx ) i=1
i
i
Note that the composition operation used here uses the s and t norm in a reversed order. An important class of fuzzy neural networks concerns an approximation of mappings between the unit hypercubes (namely, from [0,1]n to [0,1]m or [0,1] for m ⫽ 1). These mappings are realized in a logic-based format. To fully comprehend the fundamental idea behind this architecture, let us note some very simple yet powerful concepts form the realm of two-valued systems. The well-known Shannon’s theorem states that any Boolean function 兵0,1其n 씮 兵0,1其 can be uniquely represented as a logical sum (union) of minterms (a so-called SOM representation) or, equivalently, a product of some maxterms (known as a POM representation). By minterm we mean an AND combination of all the input variables of this function; they could appear either in a direct or complemented (negated) form. Similarly, the maxterm consists of the variables that now occur in their OR combination. A complete list of minterms and maxterms for Boolean functions of two variables consists of the expressions x1 AND x2 , x1 AND x2 , x1 AND x2 , x1 AND x2 for minterms x1 OR x2 , x1 OR x2 , x1 OR x2 , x1 OR x2 for maxterms From a functional point of view, the minterms can be identified with the AND neurons while the OR neurons can be used to produce the corresponding maxterms. It is also noticeable that the connections of these neurons are restricted to the two-valued set 兵0,1其, therefore making these neurons two-valued selectors. Taking into account the fundamental representation of the Boolean functions, two complementary (dual) architectures are envisioned. In the first case, the network includes a single hidden layer that is constructed with the aid of the AND neurons and the output layer consisting of the OR neurons (SOM version of the network). The dual type of the network is of the POM type in which the hidden layer consists of some OR neurons while the output layer is formed by the AND neurons.
FUZZY PATTERN RECOGNITION
Two points are worth making here that contrast between the logic processors (LP) in their continuous and two-valued versions: 1. The logic processor represents or approximates data. For the Boolean data, assuming that all the input combinations are different, we are talking about a representation of the corresponding Boolean function. In this case the POM and SOM versions of the logic processors for the same Boolean function are equivalent. 2. The logic processor used for the continuous data approximates a certain unknown fuzzy function. The equivalence of the POM and SOM types of the obtained logic processors is not guaranteed at all. Moreover, the approximation exhibits an inherent logical flavor not necessarily leading to the same approximation accuracy as achieved for ‘‘classic’’ neural networks. This should not be regarded as a shortcoming, as in return we obtain some essential transparency of the neural architecture that could be easily interpreted in the form of ‘‘if–then’’ statements—the most evident enhancement of the architecture in an attempt to alleviate the black box nature inherent of most of the neural networks. CONCLUSIONS Fuzzy sets and neurocomputing are two supplementary technologies. The two-way integration is not only possible but highly beneficial. The knowledge-based faculties are well handled by the technology of fuzzy sets, while the learning activities are chiefly addressed by neural networks. Interestingly, there are a number of new constructs combining the ideas stemming from fuzzy sets and neural networks. We have investigated various levels of synergy and proposed a consistent classification of the systems emerging as an outcome of the symbiosis of these two technologies. BIBLIOGRAPHY 1. D. E. Rumelhart and J. L. McLelland, Parallel Distributed Processing, Cambridge, MA: MIT Press, 1986. 2. L. A. Zadeh, Fuzzy sets and information granularity. In: M. M. Gupta, R. K. Ragade, and R. R. Yager (eds.), Advances in Fuzzy Set Theory and Applications, Amsterdam: North Holland, 1979, pp. 3–18. 3. W. Pedrycz, Selected issues of frame of knowledge representation realized by means of linguistic labels, Int. J. Intelligent Systems, 7:155–170, 1992. 4. S. Chen, C. F. N. Cowan, and P. M. Grant, Orthogonal least squares learning algorithm for radial basis function networks, IEEE Trans. Neural Networks, 2:302–309 1991. 5. J. Moody and C. Darken, Fast learning networks of locally-tuned processing units, Neural Computing, 1:281–294, 1989. 6. D. Dubois and H. Prade, Possibility Theory—An Approach to Computerized Processing of Uncertainty, New York: Plenum Press, 1988. 7. F. M. Silva and L. B. Almeida, Acceleration techniques for the back-propagation algorithm. In Lecture Notes in Computer Science, Berlin: Springer-Verlag, 1990, Vol. 412, pp. 110–119. 8. J. M. Keller and D. J. Hunt, Incorporating fuzzy membership functions into the perceptron algorithm, IEEE Trans. Pattern Anal. Mach. Intell., PAMI-7:693–699, 1985.
173
9. H. Ishibuchi, R. Fujioka, and H. Tanaka, An architecture of neural networks for input vectors of fuzzy numbers. Proc. IEEE Int. Conf. Fuzzy Systems (FUZZ-IEEE ’92), San Diego, March 8–12, 1992, pp. 1293–1300. 10. I. Requena and M. Delgado, R-FN: A model of fuzzy neuron, Proc. 2nd Int. Conf. Fuzzy Logic Neural Networks (IIZUKA ’92), Iizuka, Japan, July 17–22, 1992, pp. 793–796. 11. S. C. Lee and E. T. Lee, Fuzzy neural networks, Math Biosci. 23:151–177, 1975. 12. J. Buckley and Y. Hayashi, Fuzzy neural networks: A survey, Fuzzy Sets Systems 66:1–14, 1994. 13. G. Bortolan, Neural networks for the processing of fuzzy sets. In: M. Marinaro and P. G. Morasso (eds.), Proceeding of the International Conference on Artificial Neural Networks, London: SpringerVerlag, 1994, pp. 181–184. 14. W. Pedrycz, Fuzzy neural networks and neurocomputations. Fuzzy Sets Systems 56:1–28, 1993. 15. W. Pedrycz, Fuzzy Sets Engineering. Boca Raton, FL: CRC, 1995. 16. W. Pedrycz and A. F. Rocha, Fuzzy-set based models of neurons and knowledge-based networks, IEEE Trans. Fuzzy Systems, 1:254–266, 1993.
WITOLD PEDRYCZ University of Manitoba
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3505.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Pattern Recognition Standard Article James C. Bezdek1 and Ludmila Kuncheva2 1University of West Florida, Pensacola, FL 2University of Wales, Bangor, Sofia, Bulgaria Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3505 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (280K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
Abstract The sections in this article are Pattern Recognition: Data, Label Vectors, and Measures of Similarity Fuzzy Cluster Analysis Fuzzy Classifier Design Feature Analysis Remarks on Applications of Fuzzy Pattern Recognition Keywords: c-means clustering models; classifier design; feature analysis; fuzzy clustering; fuzzy models; k-nearest neighbor classifier; label vectors; nearest prototype classifier; partitions of data About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3505.htm17.06.2008 15:58:54
FUZZY PATTERN RECOGNITION
173
FUZZY PATTERN RECOGNITION Fuzzy sets were introduced by Zadeh (1) to represent nonstatistical uncertainty. Suppose you must advise a driving student when to apply the brakes of a car. Would you say ‘‘begin braking 74.2 feet from the crosswalk’’? Or would you say ‘‘apply the brakes pretty soon’’? You would choose the second instruction because the first one is too precise to be implemented. So, precision can be useless, while vague directions can be interpreted and acted upon. Fuzzy sets are used to endow computational models with the ability to recognize, represent, manipulate, interpret, and use (act on) nonstatistical imprecision. Conventional (crisp) sets contain objects that satisfy precise properties. The set H ⫽ 兵r 僆 ᑬ兩6 ⱕ r ⱕ 8其 is crisp. H can be described by its membership function, 1 6≤r≤8 mH (r) = 0 otherwise Since mH maps all real numbers onto the two points 兵0, 1其, crisp sets correspond to 2-valued logic; every real number either is in H or is not. Consider the set F of real numbers that are close to seven. Since ‘‘close to seven’’ is fuzzy, there is not a unique membership function for F. Rather, the modeler must decide, based on the potential application and imprecise properties of F, what mF should be. Properties that seem plausible for this F include: (1) normality (mF(7) ⫽ 1); (2) unimodality (only mF(7) ⫽ 1); (3) the closer r is to 7, the closer mF(r) is to 1, and conversely; and (4) symmetry (numbers equally far left and right of 7 should have equal memberships). Infinitely many functions satisfy these intuitive constraints. For example, 2 m1F(r) ⫽ e⫺(r⫺7) and m2F(r) ⫽ 1/(1 ⫹ (r ⫺ 7)2). Notice that no J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
174
FUZZY PATTERN RECOGNITION
physical entity corresponds to F. Fuzzy sets are realized only through membership functions, so it is correct to call mF the fuzzy set F, even though it is a function. Formally, every function m: X 哫 [0, 1] could be a fuzzy subset of any set X, but functions like this become fuzzy sets when and only when they match some intuitively plausible semantic description of imprecise properties of the objects in X. A question that continues to spark much debate is whether or not fuzziness is just a clever disguise for probability. The answer is no. Fuzzy memberships represent similarities of objects to imprecisely defined properties; probabilities convey information about relative frequencies. Another common misunderstanding is that fuzzy models are offered as replacements for crisp or probabilistic models. But most schemes that use fuzziness use it in the sense of embedding: Conventional structure is preserved as a special case of fuzzy structure, just as the real numbers are a special case of the complex numbers. Zadeh (2) first discussed models that had both fuzziness and probability. A recent publication about this is special issue 2(1) of the IEEE Transactions on Fuzzy Systems, 1994. PATTERN RECOGNITION: DATA, LABEL VECTORS, AND MEASURES OF SIMILARITY There are two major approaches to pattern recognition: numerical (3) and syntactic (4). Discussed here are three areas of numerical pattern recognition for object data: clustering, classifier design, and feature analysis. The earliest reference to fuzzy pattern recognition was Bellman et al. (5). Fuzzy techniques for numerical pattern recognition are now fairly mature. Reference 6 is an edited collection of 51 papers on this subject that span the development of the field from 1965 to 1991. Object data are represented as X ⫽ 兵x1, . . ., xn其, a set of n feature vectors in feature space ᑬp. The jth object is a physical entity such as a fish, medical patient, and so on. Column vector xj is the object’s numerical representation; xkj is its kth feature. There are four types of class labels—crisp, fuzzy, probabilistic and possibilistic. Let integer c denote the number of classes, 1 ⬍ c ⬍ n. Define three sets of label vectors in ᑬc as follows:
between 0 and 1 and are constrained to sum to 1. If y is a label vector for some z 僆 ᑬp generated by, say, the fuzzy cmeans clustering method, y is a fuzzy label for z. If y came from a method such as maximum likelihood estimation in mixture decomposition, it would be a probabilistic label for z. Npc ⫽ [0, 1]c ⫺ 兵0其 is the unit hypercube in ᑬc, excluding the origin. Vectors such as y ⫽ (0.4, 0.2, 0.7)T are possibilistic label vectors in Np3. Labels in Npc. are produced, for example, by possibilistic clustering algorithms (7) and neural networks (8). Most pattern recognition models are based on statistical or geometrical properties of substructure in X. Two key concepts for describing geometry are angle and distance. Let A be any positive-definite p ⫻ p matrix. For vectors x, v 僆 ᑬp, v xx , v A = x T Av √ xx A = x T Axx
(4) (5)
and δA (xx, v ) = xx − v A =
N f c = {yy ∈ N pc :
c
yi = 1}
(1) (2)
i=1
Nhc = {yy ∈ N f c : yi ∈ {0, 1}∀i} = {ee1 , e 2 , . . ., e c }
(6)
In Eq. (7) I is the p ⫻ p identity matrix. Equations (8) and (9) n (xk ⫺ v) use the covariance matrix of X, M ⫽ cov(X) ⫽ ⌺k⫽1 n xk /n. D is the diagonal matrix (xk ⫺ v)T /n, where v ⫽ ⌺k⫽1 extracted from M by deletion of its off-diagonal entries. A second family of commonly used lengths and distances are the Minkowski norm and norm metrics:
p q 1/q x xxq = j ,
q≥1
q 1/q p δq (xx, v ) = xx − v q = x j − v j ,
(10)
c times
0} N pc = {yy ∈ c :yi ∈ [0, 1]∀ i, yi > 0∃ i} = [0, 1]c − {0
(xx − v )T A(xx − v )
are the inner product, norm (length), and norm metric (distance) induced on ᑬp by A. The most important instances of Eq. (6), together with their common names and inducing matrices, are xx − v I = (xx − v )T (xx − v ) Euclidean, A = I (7) xx − v D −1 = (xx − v )T D−1 (xx − v ) Diagonal, A = D−1 (8) −1 T −1 xx − v M −1 = (xx − v ) M (xx − v ) Mahalanobis, A = M (9)
j=1
let [0, 1]c = [0, 1] × · · · × [0, 1], ! "
(3)
In Eq. (1) 0 is the zero vector in ᑬc. Note that Nhc 傺 Nfc 傺 Npc. Nhc is the canonical (unit vector) basis of Euclidean cspace.
Three are commonly used: p xx − v 1 = x j − v j
q≥1
(11)
City block (1-norm); q = 1
j=1
(12) 2 1/2 p xx − v 2 = x j − v j
Euclidean (2-norm); q = 2
j=1
f
e i = (0, 0, . . ., 1 , . . ., 0)T ,
j=1
i
the ith vertex of Nhc, is the crisp label for class i, 1 ⱕ i ⱕ c. Nfc, a piece of a hyperplane, is the convex hull of Nhc. The vector y ⫽ (0.1, 0.6, 0.3)T is a label vector in Nf3; its entries lie
xx − v ∞
(13)
$ # = max x j − v j 1≤ j≤ p
Sup or Max norm; q → ∞
(14)
FUZZY PATTERN RECOGNITION
Equations (7) and (13) both give the Euclidean norm metric, the only one in both of the inner product and Minkowski norm metric families. FUZZY CLUSTER ANALYSIS
Crisp c-partitions of X obtained this way are denoted by UH ⫽ [H(U1) . . . H(Un)]. Example 1. Let O ⫽ 兵o1 ⫽ peach, o2 ⫽ plum, o3 ⫽ nectarine其, and let c ⫽ 2. Typical 2-partitions of O are as follows:
Object
This field comprises three problems: tendency assessment, clustering and validation. Given an unlabeled data set X, is there substructure in X? This is clustering tendency—should you look for clusters at all? Very few methods—fuzzy or otherwise—address this problem. Jain and Dubes (9) discuss some formal methods for assessment of cluster tendency, but most users begin clustering without checking the data for possible tendencies. Why? Because it is impossible to guess what structure your data may have in p dimensions, so hypothesis tests cast against structure that cannot be verified are hard to interpret. The usefulness of tendency assessment lies with its ability to rule out certain types of cluster structure. Different clustering algorithms produce different partitions of X, and it is never clear which one(s) may be most useful. Once clusters are obtained, how shall we pick the best clustering solution (or solutions)? This is cluster validation (4,5,9,10). Brevity precludes a discussion of this topic here. Clustering (or unsupervised learning) in unlabeled X is the assignment of (hard or fuzzy or probabilistic or possibilistic) label vectors to the 兵xk其. Cluster substructure is represented by a c ⫻ n matrix U ⫽ [U1 . . . Uk . . . Un] ⫽ [uik], where Uk denotes the kth column of U. A c-partition of X belongs to one of three sets: M pcn = {U ∈ cn : U k ∈ N pc ∀k}
(15)
M f cn = U ∈ M pcn : U k ∈ N f c ∀k; 0 <
n
' uik ∀i
(16)
k=1
Mhcn = {U ∈ M f cn : U k ∈ Nhc ∀k}
(17)
Equations (15)–(17) define, respectively, the sets of possibilistic, fuzzy or probabilistic, and crisp c-partitions of X. Each column of U in Mpcn(Mfcn, Mhcn) is a label vector from Npc(Nfc, Nhc). Note that Mhcn 傺 Mfcn 傺 Mpcn. The reason these matrices are called partitions follows from the interpretation of uik. If U is crisp or fuzzy, uik is the membership of xk in the ith partitioning fuzzy subset (cluster) of X. If U is probabilistic, uik is usually the (posterior) probability that, given xk, it came from class i. When U is possibilistic, uik is the typicality of xk to class i. Since definite class assignments are often the ultimate goal, labels in Npc or Nfc are usually transformed into crisp labels. Most noncrisp partitions are converted to crisp ones using the hardening function H: Npc 哫 Nhc, that is, H (yy ) = e i ⇔ yy − e i 2 ≤ yy − e j 2 ⇔ yi ≥ y j ;
j = i
175
Peaches Plums
%
o1 o2 o3 1 0
0 1
&
0 1
o o % 1 2 1 0.2 0 0.8
U1 ∈ Mh23
o3
& 0.4 0.6
o o % 1 2 1 0.2 0 0.8
U2 ∈ M f 23
o3
& 0.5 0.8
U3 ∈ M p23
The nectarine, o3, is labeled by the last column of each partition, and in the crisp case it must be (erroneously) given full membership in one of the two crisp subsets partitioning X. In U1, o3, is labeled plum. Noncrisp partitions enable models to (sometimes!) avoid such mistakes. The last column of U2 allocates most (0.6) of the membership of o3 to the plums class but also assigns a lesser membership (0.4) to o3 as a peach. U3 illustrates a possibilistic partition, and its third column exhibits a possibilistic label for the nectarine. The values in the third column indicate that this nectarine is more typical of plums than of peaches. Columns like the ones for the nectarine in U2 and U3 serve a useful purpose: Lack of strong membership in a single class is a signal to ‘‘take a second look.’’ In this example the nectarine is a hybrid of peaches and plums, and the memberships shown for it in the last column of either U2 or U3 seem more plausible physically than crisp assignment of o3 to an incorrect class. Mpcn and Mfcn can be more realistic than Mhcn because boundaries between many classes of real objects are badly delineated (i.e., really fuzzy). Mfcn reflects the degrees to which the classes share 兵ok其 because of the constraint that ⌺ci⫽1uik ⫽ 1. Mpcn reflects the degrees of typicality of 兵ok其 with respect to the prototypical (ideal) members of the classes. Finally, observe that U1 ⫽ UH2 ⫽ UH3 . Crisp partitions of data do not possess the information content to suggest fine details of infrastructure such as hybridization or mixing that are available in U2 and U3. Here, hardening U2 and U3 with H destroys useful information. The c-Means Clustering Models How can we find partitions of X such as those in Example 1? The c-means models are used more widely than any other clustering methods for this purpose. The optimization problem that defines the hard (H), fuzzy (F), and possibilistic (P) c-means (HCM, FCM, and PCM, respectively) models is c n V; w xk , v i ) min Jm (U,V w) = um ik Dik (x V) (U,V
i=1 k=1
+
c i=1
(18)
wi
n
' (1 − uik )
m
(20)
k=1
where In Eq. (18), ties are broken randomly. H finds the crisp label vector ei in Nhc closest to y. Alternatively, H finds the maximum coordinate of y and assigns the corresponding crisp label to the object z that y labels. For fuzzy partitions, hardening each column of U with Eq. (18) is called defuzzification by maximum membership (MM): U k ) = e i ⇔ uik ≥ u jk U MM,k = H (U
∀ j = i; 1 ≤ k ≤ n
(19)
U ∈ Mhcn , M f cn or M pcn , depending on the approach V = (vv1 , v 2 , . . ., v c ) ∈ c p , v i specifies the ith point prototype wi ≥ 0 are user-specified penalty w = (w1 , w2 , . . ., wc )T , weights m ≥ 1 is a weighting exponent that controls the degree of fuzzification of U, and Dik (xx k , v i ) = Dik is the deviation of x k from the ith cluster prototype.
176
FUZZY PATTERN RECOGNITION
Optimizing Jm(U, V; w) when Dik is an inner product norm metric, Dik ⫽ 储xk ⫺ vi储A2 , is usually done by alternating optimization (AO) through the first-order necessary conditions on (U, V): HCM: Minimize over Mhcn ⫻ ᑬcp: m ⫽ 1: wi ⫽ 0 ᭙ i. (U, V) may minimize J1 only if i 1; xx k − v i A ≤ xx k − v j A , j = uik = 0, otherwise
∀i, k; ties are broken randomly ) * ( n n u x u ik k ik ∀i vi = k=1
(21)
(22)
k=1
FCM: Minimize over Mfcn ⫻ ᑬcp: assume 储xk ⫺ vi储A2 ⬎ 0 ᭙ i, k: m ⬎ 1 : wi ⫽ 0 ᭙ i. (U, V) may minimize Jm only if
uik =
vi =
% c
&−1 (xx k − v i A /xx k − v j A )2/(m−1)
j=1
( n k=1
um ik x k
) n
∀i, k
(23)
∀i
(24)
* um ik
k=1
PCM: Minimize over Mpcn ⫻ ᑬcp : m ⬎ 1 : wi ⬎ 0 ᭙ i. (U, V) may minimize Jm only if + ,−1 uik = 1 + (xx k − v i 2A /wi )1/(m−1) ∀i, k ) * ( n n um um ∀i ikx k ik vi = k=1
(25)
(26)
k=1
The HCM/FCM/PCM-AO Algorithms: Inner Product Norms Case Store: Pick:
Guess: Iterate:
Unlabeled Object Data X ⫽ 兵x1, . . ., xn其 傺 ᑬp Numbers of clusters: 1 ⬍ c ⬍ n. Rule of thumb: Limit c to c ⱕ 兹n Maximum number of iterations: T Weighting exponent: 1 ⱕ m ⬍ 앝 (m ⫽ 1 for HCM) Norm for similarity of data to prototypes in Jm: 具x, x典A ⫽ 储x储A2 ⫽ xT Ax Norm for termination criterion: Et ⫽ 储Vt ⫺ Vt⫺1储err Termination threshold: 0 ⬍ ⑀ Weights for penalty terms: wi ⬎ 0 ᭙ i (w ⫽ 0 for FCM/HCM) Initial prototypes: V0 ⫽ (v10, . . ., vc0) 僆 ᑬcp 兵or initial partition U0 僆 Mpcn其 For t ⫽ 1 to T: 兵reverse U and V if initializing with U0 僆 Mpcn其 Calculate Ut with Vt⫺1 and (21, 23, or 25) Update Vt⫺1 to Vt with Ut and (22, 24, or 26) If Et ⱕ ⑀, exit for loop; Else Next t (U, V) ⫽ (Ut, Vt)
number of iterations. Justifying a choice of m in FCM or PCM is a challenge. FCM-AO will produce equimembership partitions that approach U ⫽ [1/c] as m 씮 앝; but in practice, terminal partitions usually have memberships very close to (1/c) for values of m not much larger than 20. At the other extreme, as m approaches 1 from above, FCM reduces to HCM, and terminal partitions become more and more crisp. Thus, m controls the degree of fuzziness exhibited by the soft boundaries in U. Most users choose m in the range [1.1, 5], with m ⫽ 2 an overwhelming favorite. Example 2. Table 1 lists the coordinates of 20 two-dimensional points X ⫽ 兵x1, . . ., x20其. Figure 1(a) plots the data. HCM, FCM, and PCM were applied to X with the following protocols: The similarity and termination norms were both Euclidean; c ⫽ p ⫽ 2; n ⫽ 20; ⑀ ⫽ 0.01, T ⫽ 50, m ⫽ 2 for FCM and PCM; initialization for HCM and FCM was at the V0 shown below the columns labeled U10 and U20; initialization for PCM was the terminal Vf from FCM shown below the FCM columns labeled U1f and U2f; and the weights for PCM were fixed at w1 ⫽ 0.15, w2 ⫽ 0.16. All three algorithms terminated in less than 10 iterations at the partition matrices Uf (rows are shown transposed) and point prototypes Vf shown in the table. HCM and FCM began with the first 16 points in crisp cluster 1. HCM terminated with 10 points in each cluster as indicated by the boundaries in Figure 1(b). FCM and PCM terminated at the fuzzy and possibilistic partitions of X shown in Table 1. The difference between these two partitions can be seen, for example, by looking at the memberships of point x7 in both clusters (the values are underlined in Table 1). The fuzzy memberships are (0.96, 0.04), which sum to 1 as they must. This indicates that x7 is a very strong member of fuzzy cluster 1 and is barely related to cluster 2. The PCM values are (0.58, 0.06). These numbers indicate that x7 is a fairly typical member of cluster 1 (on a scale from 0 to 1), while it cannot be regarded as typical of cluster 2. When hardened with Eq. (19), the FCM and PCM partitions coincide with the HCM result; that is, UHCM ⫽ UHFCM ⫽ UHPCM. This is hardly ever the case for data sets that do not have compact, well-separated clusters. Data point x13, partially underlined in Table 1, is more or less in between the two clusters. Its memberships, the fuzziest ones in the FCM partition (0.41, 0.59), point to this anomaly. The possibilities (0.22, 0.31) in the PCM partition are also low and roughly equal, indicating that x13 is not typical of either cluster. Finally, note that HCM estimates of the subsample means of the two groups (v2 for points 11–20 in Table 1 and Fig. 1) are exact. The FCM estimates differ from the means by at most 0.07, and the PCM estimates differ from the means by at most 0.10. In this simple data set then, all three algorithms produce roughly the same results. The apples and pears in the first column of Table 1 and the point z in Fig. 1 are discussed in the next section.
FUZZY CLASSIFIER DESIGN In theory, iterate sequences of these algorithms possess subsequences that converge to either local minima or saddle points of their objective functions (6). In practice they almost always terminate at useful solutions within a reasonable
A classifier is any function D: ᑬp 哫 Npc. The value y ⫽ D(z) is the label vector for z in ᑬp. D is a crisp classifier if D[ᑬp] ⫽ Nhc. Designing a classifier means the following: Use
FUZZY PATTERN RECOGNITION
177
Table 1. Example 2 Data, Initialization, and Terminal Outputs of HCM, FCM, and PCM X ei
Initialization
HCM
FCM
PCM
xi
x1
x2
U10
U20
U1 f
U2 f
U1 f
U2 f
U1 f
U2 f
1 2 3 4 5 6 7 8 9 10
1.00 1.75 1.30 0.80 1.10 1.30 0.90 1.60 1.40 1.00
0.60 0.40 0.10 0.20 0.70 0.60 0.50 0.60 0.15 0.10
1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0
0.97 0.77 0.96 0.94 0.95 0.97 0.96 0.84 0.95 0.95
0.03 0.23 0.04 0.06 0.05 0.03 0.04 0.16 0.05 0.05
0.70 0.35 0.49 0.36 0.72 0.90 0.58 0.51 0.51 0.42
0.07 0.16 0.07 0.05 0.08 0.10 0.06 0.15 0.08 0.05
11 12 13 14 15 16 17 18 19 20
2.00 2.00 1.90 2.20 2.30 2.50 2.70 2.90 2.80 3.00
0.70 1.10 0.80 0.80 1.20 1.15 1.00 1.10 0.90 1.05
1 1 1 1 1 1 0 0 0 0
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
0.33 0.19 0.41 0.10 0.04 0.01 0.01 0.05 0.03 0.06
0.67 0.81 0.59 0.90 0.96 0.99 0.99 0.95 0.97 0.94
0.19 0.14 0.22 0.13 0.08 0.07 0.06 0.05 0.05 0.04
0.34 0.43 0.31 0.59 0.75 0.90 0.73 0.45 0.56 0.36
v1 1.22 0.40
v2 2.43 0.98
v10 1.57 0.61
v20 2.85 1.01
v1 f 1.22 0.40
v2 f 2.43 0.98
v1 f 1.21 0.41
v2 f 2.50 1.00
v1 f 1.23 0.50
v2 f 2.45 1.02
X to find a specific D from a specified family of functions (or algorithms). If the data are labeled, finding D is called supervised learning. Classifier models based on statistical, heuristic and network structures are discussed elsewhere in this Encyclopedia. This section describes some of the basic (and often most useful) classifier designs that have fuzzy generalizations. The Nearest Prototype Classifier Synonyms for the word prototype include vector quantizer, signature, template, codevector, paradigm, centroid, and exemplar. The common denominator in all prototype generation schemes is a mathematical definition of how well prototype vi represents a set of vectors Xi. Any measure of similarity or dissimilarity on ᑬp can be used; the usual choice is one of the distances at Eqs. (7)–(9) or (12)–(14). Definition (1-np classifier). Given (V, E) ⫽ 兵(vi, ei): i ⫽ 1, . . ., c其 僆 ᑬcp ⫻ Nchc, c crisply labeled prototypes (one per class) and any distance measure 웃 on ᑬp. The crisp nearest prototype (1-np) classifier DV,E,␦ is defined, for z 僆 ᑬp, as Decide z ∈ class i ⇔ DV ,EE ,δ (zz ) = e i ⇔ δ(zz, v i ) ≤ δ(zz, v j ) ∀ j = i
(27)
Equation (27) says: Find the closest prototype to z, and assign its label to z. Ties are broken randomly. For example, the Euclidean distances from the point ❖ ⫽ z ⫽ (2, 0.5)T to the subsample means (shown as dashed lines in Fig. 1) are 储z ⫺ v2储 ⫽ 0.64 ⬍ 储z ⫺ v1储 ⫽ 0.79, so z acquires the label of v2; that is, z is in class 2. If the first 10 points are class 1 ⫽ apples
and the second 10 points are class 2 ⫽ pears as shown in column 1 of Table 1, then the crisp labels for the 20 data points are ei ⫽ e1 ⫽ (1, 0)T, i ⫽ 1, . . ., 10; ei ⫽ e2 ⫽ (0, 1)T, i ⫽ 11, . . ., 20, and z is declared a pear by DV,E,␦2 . The notation for DV,E,␦ emphasizes that there are three ways to alter Eq. (27): We can change V, E, or 웃. As the measure of distance 웃 changes with V and E fixed, it is possible that the label assigned by Eq. (27) will too. If we use the 1norm distance at Eq. (12) instead of the 2-norm distance at Eq. (13), then 储z ⫺ v1储1 ⫽ 0.89 ⬍ 储z ⫺ v2储1 ⫽ 0.91, so the decision is reversed: z is in class 1 ⫽ apples. Finally, if we use Eq. (14), then 储z ⫺ v1储앝 ⫽ 0.78 ⬎ 储z ⫺ v2储앝 ⫽ 0.48, so the label for z with this distance reverts to class 2 ⫽ pears. This shows why it is important to choose the distance carefully and understand the effect of changing it when using DV,E,␦ . Second, we can change the prototype set V while holding E and 웃 fixed. The crisp 1-np design can be implemented using prototypes from any algorithm that produces them. DV,E,␦ is crisp because of E, even if V comes from a fuzzy, probabilistic, or possibilistic algorithm. Table 1 shows four different sets of prototypes for the data: the sample means v1 and v2, which coincide with the HCM estimates, and the FCM and PCM prototypes. Repeating the calculations of the last paragraph with the FCM or PCM prototypes leads here to the same labels for z using the three distances in Eqs. (12)–(14) because the sets of prototypes are nearly equal. But generally, this is not the case. Third, the crisp labels E can be softened while holding V and 웃 fixed. In this case a more sophisticated approach based on aggregation of the soft label information possessed by several close prototypes is needed. This is a special case of the classifier we turn to next.
178
FUZZY PATTERN RECOGNITION
x2 1.50
1.00
0.50
x1
0.00 0.05
1.00
1.50
2.00
2.50
3.00
(a) x2 1.50
v2
1.00 x13 x14 x11 x8 0.50
0.00 0.05
x7
x2
v2
z = ( 2.0 ) 0.5 Euclidean 5-nn disk x1
1.00
1.50
2.00
2.50
3.00
(b) Figure 1. (a) The 20-point data set for Examples 2 and 3. (b) Clustering and classification results for Examples 2 and 3.
The Crisp k-Nearest Neighbor Classifier Another widely used classifier with fuzzy and possibilistic generalizations is the k-Nearest Neighbor (k-nn) rule, which requires labeled samples from each class. As an example, the symbols in the first column of Table 1 enable each point in the data to serve as a labeled prototype. The crisp k-nn rule finds the k nearest neighbors (points in X) to z, and then it aggregates the votes of the neighbors for each class. The majority vote determines the label for z. Only two parameters must be selected to implement this rule: k, the number of nearest neighbors to z; and 웃, a measure of nearness (usually distance) between pairs of vectors in ᑬp. Definition (k-nn Classifier). Given (X, U) ⫽ 兵(xk, Uk): k ⫽ 1, . . ., n其 僆 ᑬnp ⫻ Ncpc and any distance measure 웃 on ᑬp. Let z 僆 ᑬp and let U(1) . . . U(k) denote the columns of U corresponding to the k nearest neighbors of z. Aggregate votes (full k or partial) for each class in the label vector D(X,U),k,웃(z) ⫽ ⌺j⫽1 U(j) /k. The crisp k-nn classifier is defined as D (X ,U ),k,δ (zz )) = e i Decide z ∈ i ⇔ H (D
(28)
Example 3. Figure 1(b) shows a shaded disk with radius 储x8 ⫺ z储2 ⫽ 0.41 centered at z which corresponds to the k ⫽
5-nn rule with Euclidean distance for 웃. The disk captures three neighbors—x11, x13, and x14 —labeled pears in Table 1 and captures two neighbors—x2 and x8 —labeled apples in Table 1. This 5-nn rule labels z a pear, realized by Eq. (28) as follows: 5 j=1 U crisp( j) z D (X ,U (z ) = ),5,δ 2 crisp - . 5- . - . - . - . 1 1 0 0 0 + + + + (29) 0 0 1 1 1 = 5 - . 0.4 = 0.6 - . - . / 0 0.4 0 z H D (X ,U (z ) = H = = e2 (30) crisp ),5,δ 2 0.6 1 To see that k and 웃 affect the decision made by (28), Table 2 shows the labeling that (28) produces for z using k ⫽ 1 to 5 and the three distances shown in Eqs. (12)–(14) with Ucrisp. Distances from z to each of its five nearest neighbors are shown in the upper third of Table 2. The five nearest neighbors are ranked in the same order by all three distances, x(1) ⫽ x11 being closest to z, and x(5) ⫽ x8 being furthest from z, where x(k) is the kth ranked nearest neighbor to z. L(x(k)) is the crisp label for x(k) from Table 1. The label sets—in order, left to right—that are used for each of the 15 decisions (3 distances by 5 rules) are shown in the middle third of Table 2. Lq(z) in the lower third of Table 2 is the crisp label assigned to z by each k-nn rule for the q ⫽ 1, 2, and 앝 distances. Whenever there is a tie, the label assigned to z is arbitrary. There are two kinds of ties: label ties and distance ties. The 1-nn rule labels z a pear with all three distances. All three rules yield a label tie using k ⫽ 2, so either label may be assigned to z by these three classifiers. For k ⫽ 3 the 1 and 2 norm distances label z a pear. The sup norm experiences a distance tie between x(3) and x(4) at k ⫽ 3, but both points are labeled pear so the decision is still pear regardless of how the tie is resolved. At k ⫽ 4 the 1 norm has a distance tie between x(4) and x(5). Since these two points have different labels, the output of this classifier will depend on which point is selected to break the distance tie. If the apple is selected, resolution of the distance tie results in a label tie, and a second tie must be broken. If the distance tie breaker results in the pear, there are three pears and one apple as in the other two cases at k ⫽ 4. And finally, for k ⫽ 5 all three classifiers agree that z is a pear. Table 2 illustrates that the label assigned by (28) is dependent on both k and 웃. Equation (28) is well-defined for fuzzy and possibilistic labels. If, for example, we use the FCM labels from Table 1 for the five nearest neighbors to z instead of the crisp labels used in Example 3, we have
D (X ,U
FCM
z) ),5,δ 2 (z
5
=
j=1 U FCM( j)
.5 . . . . 0.33 0.77 0.41 0.10 0.84 + + + + 0.67 0.23 0.59 0.90 0.16 = 5 . 0.49 = 0.51
(31)
FUZZY PATTERN RECOGNITION
179
Table 2. The k-nn Rule Labels z for Three Distances and Five Sets of Neighbors Distances from z to the Ranked Neighbors k
x(k)
1 2 3 4 5
x11 x2 x13 x14 x8
L(x(k))
웃1(z, x(k))
웃2(z, x(k))
웃앝(z, x(k))
0.20 0.35 0.40 0.50 0.50
0.20 0.27 0.32 0.36 0.41
0.20 0.25 0.30 0.30 0.40
Labels of the Ranked Neighbors Ranked Neighbors
k 1 2 3 4 5
x11 x11 , x11 , x11 , x11 ,
웃1
x2 x2 , x13 x2 , x13 , x14 x2 , x13 , x14 , x8
웃2
⫽ Label tie
웃앝
⫽ Label tie
⫽ Label tie ⫽ 웃 tie
⫽ 웃 tie
Output Label for z Ranked Neighbors
k 1 2 3 4 5
x11 x11 , x11 , x11 , x11 ,
x2 x2 , x13 x2 , x13 , x14 x2 , x13 , x14 , x8
. - . / 0 0.49 0 H D (X ,U ),5,δ (zz ) = H = = e 2 ⇒ z = pear 2 0.51 1
L1(z) Label tie
20
(32)
&
. 0.50 z = H ⇒ tie H D (X ,U (z ) = ),20,δ 2 HCM 0.50 20 (33) & % 20 . 0.52 j=1 U FCM( j) z) = =H H D (X ,U ),20,δ 2 (z FCM 0.48 20 (34) - . 1 = = e 1 ⇒ z = apple 0 & % 20 . 0.33 j=1 U PCM( j) z = H (z ) = H D (X ,U ),20,δ 2 PCM 0.31 20 (35) - . 1 = = e 1 ⇒ z = apple 0 j=1 U HCM( j)
L앝(z)
Label tie
Label tie
Label ⫺ 웃 tie
Equation (32) is a crisp decision based on fuzzy labels, so it is still a crisp k-nn rule. Possibilistic labels for these five points from Table 1 would result in the same decision here, but this is not always the case. If all 20 sets of FCM and PCM memberships from Table 1 are used in Eq. (28), the 20-nn rules based on the HCM, FCM, and PCM columns in Table 1 yield
%
L2(z)
-
Terminology. The output of Eq. (31) and the argument of H in Eq. (34) are fuzzy labels based on fuzzy labels. Even though the final outputs are crisp in these two equations, some writers refer to the overall crisp decision as the fuzzy k-nn rule. More properly, however, the fuzzy k-nn rule is the algorithm that produces the fuzzy label which is subsequently
hardened in Eq. (28). Similarly, the input or argument of H in (35) is properly regarded as the output of the possibilistic k-nn rule, but some authors prefer to call the output of Eq. (35) the possibilistic k-nn rule. The important point is that if all 20 labels are used, the rule based on crisp labels is ambiguous, while the fuzzy and possibilistic based rules both label z an apple. This shows that the type of label also impacts the decision made by Eq. (28). The Crisp, (Fuzzy and Possibilistic) k-nn Algorithms
Problem:
To label z in p
Store:
Labeled object data X = {xx 1 , . . ., x n } ⊂ p and n label matrix U ∈ N pc
Pick:
k = number of nn’s and δ: p × p → + = any metric on p
Find:
The n distances {δ j ≡ δ(zz, x j ): j = 1, 2, . . ., n}
Rank:
δ(1) ≤ δ(2) ≤ · · · ≤ δ(k) ≤ δ(k+1) ≤ . . . ≤ δ(n) ! " k−nn indices
Compute: D (X ,U ),k,δ (zz ) =
k
U ( j)
) k
j=1
Do:
/ 0 Decide z ∈ i ⇔ H D (X ,U ),k,δ (zz ) = e i
FEATURE ANALYSIS Methods that explore and improve raw data are broadly characterized as feature analysis. This includes scaling, normalization, filtering, and smoothing. Any transformation
180
FUZZY PATTERN RECOGNITION X = X1
x2
x
follows: to improve the data for solving a particular problem; to compress feature space to reduce time and space complexity; and to eliminate redundant (dependent) and unimportant (for the problem at hand) features.
X2… ¬2
6
7
8
3
3
4
¬
…
¬
0
0
5
10 …
X1
15
x1
0
1
2
1
X1 + X2 2
…
X2
5
2
¬
Figure 2. Feature selection and extraction on a 30 point data set.
⌽ : ᑬp 哫 ᑬq does feature extraction when applied to X. Usually q Ⰶ p, but there are cases where q ⱖ p too. Examples of feature extraction transformations include Fourier transforms, principal components, and features such as the digital gradient, mean, range, and standard deviation from intensities in image windows. Feature selection consists of choosing subsets of the original measured features. Here ⌽ projects X onto a coordinate subspace of ᑬp. The goals of extraction and selection are as
Example 4. The center of Fig. 2 is a scatterplot of 30 twodimensional points X ⫽ 兵(x1, x2)其 whose coordinates are listed in Table 3. The data are indexed so that points 1–10, 11–20, and 21–30 correspond to the three visually apparent clusters. Projection of X onto the first and second coordinate axes results in the one-dimensional data sets X1 and X2; this illustrates feature selection. The one-dimensional data set (X1 ⫹ X2)/2 in Fig. 2 (plotted to the right of X, not to scale) is made by averaging the coordinates of each vector in X. Geometrically, this amounts to orthogonal projection of X onto the line x1 ⫽ x2; this illustrates feature extraction. Visual inspection should convince you that the three clusters seen in X, X1 and (X1 ⫹ X2)/2 will be properly detected by most clustering algorithms. Projection of X onto its second axis, however, mixes the data and results in just two clusters in X2. This suggests that projections of high-dimensional data into visual dimensions cannot be relied upon to show much about cluster structure in the original data. The results of applying FCM to these four data sets with c ⫽ 3, m ⫽ 2, ⑀ ⫽ 0.01, and the Euclidean norm for both termination and Jm are shown in Table 3, which also shows
Table 3. Terminal FCM Partitions (Cluster 1 Only) for the Data Sets in Example 4 Initialization x1
x2
(x1 ⫹ x2)/2
U10
U20
U30
X U1
X1 U1
(X1 ⫹ X2)/2 U1
X2 U1
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
1.5 1.7 1.2 1.8 1.7 1.3 2.1 2.3 2 1.9
2.5 2.6 2.2 2 2.1 2.3 2 1.9 2.4 2.2
2 2.15 1.7 1.9 1.9 1.8 2.05 2.1 2.2 2.05
1 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0
0 0 1 0 0 1 0 0 1 0
0.99 0.99 0.99 1.00 1.00 0.99 0.99 0.97 0.99 1.00
1.00 1.00 0.99 1.00 1.00 0.99 0.99 0.98 1.00 1.00
1.00 0.99 0.98 1.00 1.00 0.99 1.00 1.00 0.98 1.00
0.00 0.03 0.96 0.92 0.99 0.63 0.92 0.82 0.17 0.96
x11 x12 x13 x14 x15 x16 x17 x18 x19 x20
6 6.6 5.9 6.3 5.9 7.1 6.5 6.2 7.2 7.5
1.2 1 0.9 1.3 1 1 0.9 1.1 1.2 1.1
3.6 3.8 3.4 3.8 3.45 4.05 3.7 3.65 4.2 4.3
0 0 1 0 0 1 0 0 1 0
1 0 0 1 0 0 1 0 0 1
0 1 0 0 1 0 0 1 0 0
0.01 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.02 0.03
0.01 0.00 0.02 0.00 0.02 0.01 0.00 0.00 0.02 0.03
0.01 0.00 0.07 0.00 0.05 0.02 0.00 0.01 0.03 0.04
0.02 0.00 0.02 0.07 0.00 0.00 0.02 0.00 0.02 0.00
x21 x22 x23 x24 x25 x26 x27 x28 x29 x30
10.1 11.2 10.5 12.2 10.5 11 12.2 10.2 11.9 11.5
2.5 2.6 2.5 2.3 2.2 2.4 2.2 2.1 2.7 2.2
6.3 6.9 6.5 7.25 6.35 6.7 7.2 6.15 7.3 6.85
0 1 0 0 1 0 0 1 0 0
0 0 1 0 0 1 0 0 1 0
1 0 0 1 0 0 1 0 0 1
0.01 0.00 0.01 0.01 0.01 0.00 0.01 0.01 0.01 0.00
0.01 0.00 0.01 0.01 0.01 0.00 0.01 0.01 0.01 0.00
0.01 0.00 0.00 0.01 0.01 0.00 0.01 0.02 0.01 0.00
0.00 0.03 0.00 0.63 0.96 0.17 0.96 0.99 0.09 0.96
FUZZY STATISTICS
the initialization used. Only memberships in the first cluster are shown. As expected, FCM discovers three very distinct fuzzy clusters in X, X1, and (X1 ⫹ X2)/2. Table 3 shows the three clusters blocked into their visually apparent subsets of 10 points each. For X, X1, and (X1 ⫹ X2)/2, all memberships for the first 10 points are ⱖ 0.97, and memberships of the remaining 20 points in this cluster are ⱕ 0.07. For X2, however, this cluster has eight anomalies with respect to the original data. When column U1 of X2 is hardened, this cluster contains the 12 points (underlined in Table 3) numbered 3, 4, 5, 6, 7, 8, 10, 24, 25, 27, 28, and 30; the last five of these belong to cluster 3 in X, and the points numbered 1, 2, and 9 should belong to this cluster, but do not.
REMARKS ON APPLICATIONS OF FUZZY PATTERN RECOGNITION Retrieval from the Science Citation Index for years 1994–1997 on titles and abstracts that contain the keyword combinations ‘‘fuzzy’’ and either ‘‘clustering’’ or ‘‘classification’’ yield 460 papers. Retrievals against ‘‘fuzzy’’ and either ‘‘feature selection’’ or ‘‘feature extraction’’ yield 21 papers. This illustrates that the literature contains some examples of fuzzy models for feature analysis, but they are widely scattered because this discipline is very data-dependent and, hence, almost always done on a case-by-case basis. A more interesting metric for the importance of fuzzy models in pattern recognition lies in the diversity of applications areas represented by the titles retrieved. Here is a partial sketch: Chemistry. Analytical, computational, industrial, chromatography, food engineering, brewing science. Electrical Engineering. Image and signal processing, neural networks, control systems, informatics, automation, robotics, remote sensing and control, optical engineering, computer vision, parallel computing, networking, dielectrics, instrumentation and measurement, speech recognition, solid-state circuits. Geology/Geography. Photogrammetry, geophysical research, geochemistry, biogeography, archeology. Medicine. Magnetic resonance imaging, diagnosis, tomography, roentgenology, neurology, pharmacology, medical physics, nutrition, dietetic sciences, anesthesia, ultramicroscopy, biomedicine, protein science, neuroimaging, pharmocology, drug interaction. Physics. Astronomy, applied optics, earth physics. Environmental Sciences. Soils, forest and air pollution, meteorology, water resources. Thus, it seems fair to assert that this branch of science and engineering has established a niche as a useful way to approach pattern recognition problems. BIBLIOGRAPHY 1. L. A. Zadeh, Fuzzy sets, Inf. Control, 8: 338–352, 1965. 2. L. A. Zadeh, Probability measures of fuzzy events, J. Math. Anal. Appl., 23: 421–427, 1968.
181
3. R. Duda and P. Hart, Pattern Classification and Scene Analysis, New York: Wiley-Interscience, 1973. 4. K. S. Fu, Syntactic pattern recognition and applications, Englewood Cliffs, NJ: Prentice Hall, 1982. 5. R. E. Bellman, R. Kalaba, and L. A. Zadeh, Abstraction and pattern classification, J. Math. Anal. Appl., 13: 1–7, 1966. 6. J. C. Bezdek and S. K. Pal, Fuzzy Models for Pattern Recognition, Piscataway, NJ: IEEE Press, 1992. 7. R. Krishnapuram and J. Keller, A possibilistic approach to clustering, IEEE Trans. Fuzzy Syst., 1 (2), 98–110, 1993. 8. Y. H. Pao, Adaptive Pattern Recognition and Neural Networks, Reading, MA: Addison-Wesley, 1989. 9. A. Jain and R. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice Hall, 1988. 10. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum, 1981. Reading List G. Klir and T. Folger, Fuzzy Sets, Uncertainty and Information, Englewood Cliffs, NJ: Prentice Hall, 1988. D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications, New York: Academic Press, 1980. H. J. Zimmermann, Fuzzy Set Theory—and Its Applications, 2nd ed., Boston: Kluwer, 1990. D. Schwartz, G. Klir, H. W. Lewis, and Y. Ezawa, Applications of fuzzy sets and approximate reasoning, Proc. IEEE, 82: 482–498, 1994. A. Kandel, Fuzzy Techniques in Pattern Recognition, New York: WileyInterscience, 1982. S. K. Pal and D. K. Dutta Majumder, Fuzzy Mathematical Approach to Pattern Recognition, New York: Wiley, 1986. B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Approach to Machine Intelligence, Englewood Cliffs, NJ: Prentice Hall, 1991. Journals IEEE Transactions on Fuzzy Systems, IEEE Transactions on Neural Networks, IEEE Transactions on Systems, Man Cybernetics, Fuzzy Sets and Systems, International Journal of Approximate Reasoning, International Journal of Intelligent Systems, Intelligent Automation and Soft Computing, Uncertainty, Fuzziness and Knowledge Based Systems, Journal of Intelligent and Fuzzy Systems
JAMES C. BEZDEK University of West Florida
LUDMILA KUNCHEVA University of Wales, Bangor
FUZZY QUERYING. See FUZZY INFORMATION RETRIEVAL AND DATABASES.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3509.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Statistics Standard Article Rudolf Kruse1, Jörg Gebhardt2, María Angeles Gil3 1Otto-von-Guericke University of Magdeburg, Magdeburg, Germany 2University of Braunschweig, Braunschweig, Germany 3University of Oviedo, Oviedo, Spain Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3509 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (284K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Introduction Why Develop Fuzzy Statistics? Fuzzy Statistics Based on Fuzzy Perceptions or Reports of Existing Numerical Data Fuzzy Statistics Based on Existing Fuzzy-Valued Data Other Studies on Fuzzy Statistics Some Examples of Fuzzy Statistics Additional Remarks About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3509.htm17.06.2008 15:59:11
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
FUZZY STATISTICS
181
FUZZY STATISTICS The aim of this article is to give a summary view of many concepts, results, and methods to deal with statistical problems in which some elements are either fuzzily perceived, or reported, or valued. Different handy approaches to model and manage univariate problems are examined and a few techniques from them are gathered. Multivariate statistics with fuzzy elements are briefly discussed, and finally two examples J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
182
FUZZY STATISTICS
illustrating the use of some models and procedures in the article are included. INTRODUCTION Applications of statistics occur in many fields, and the general theory of statistics has been developed by considering the common features of these fields. Three major branches of statistics are descriptive statistics, inferential statistics, and statistical decision making. All of them, and especially the latter two, are closely related to the concept of uncertainty. In making inferential statements or statistical decisions, the statistician or decision maker is usually unsure of the certain characteristics of a random experiment. Uncertainty involved in statistical problems is traditionally assumed to be due to randomness (or unpredictability of the outcomes or events occurring in any performance of the experiment). To deal with this type of uncertainty, probability theory has become a well-developed mathematical apparatus. However, in several fields of applications of statistics, other types of uncertainty often arise. Thus, in social sciences, psychology, engineering, communications, and so on, statistical problems can include observed or reported data like very long, quite fast, a few people, more or less in agreement, and good yield. The entry of fuzzy set theory has allowed dealing with the type of uncertainty referred to as fuzziness or vagueness (or difficulty of defining sharply the elements— outcomes, events or data—in the problem). We are now going to summarize the models and some relevant methods stated in the literature to manage and solve statistical decision problems involving both randomness and fuzziness. WHY DEVELOP FUZZY STATISTICS? The basic model in statistics is a mathematical idealization which is used to describe a random experiment. This model is given by a probability space (⍀, A , P), 僆 ⌰, where • ⍀ is the sample space, which is defined so that each element of ⍀ denotes an experimental outcome, and any experimental performance results in an element of ⍀. • A is a class of events of interest (which are assumed to be identifiable with subsets of ⍀), this class being a field of ⍀, and • P is a probability measure defined on A (that is, a real function from A which is nonnegative, normalized, and -additive), which often involves some uncertain elements that will be generically denoted by (unknown parameter value, unknown state of nature, or unknown subindex), ⌰ being the parameter, state, or index space. The mechanism of this model can be summarized as follows:
experimental distribution
R
→ {Pθ , θ ∈ } → → R → B experimental distribution
experimental performance
random variable
θ −−−−−−−→ Pθ −−−−−−−→ ω −−−−−→ X (ω) event of interest
−−−−→ [X ∈ B] (occurs if X (ω) ∈ B) The preceding two models could be enlarged if a Bayesian context is considered. In this context would behave as a random variable, so that the parameter, state, or index would be specified in accordance with a prior distribution. Indeed, the uncertainty involved in the preceding models corresponds to randomness, which arises in the experimental performance (and in the specification of if a Bayesian framework is considered). However, statistical problems can also involve fuzziness. More precisely, either the numerical values associated with experimental outcomes can be fuzzily perceived or reported, or the values associated with experimental outcomes can be fuzzy, or the events of interest can be identifiable with fuzzy subsets of the sample space. We are now going to recall the most well-developed approaches to model and handle problems involving both fuzzy imprecision and probabilistic uncertainty.
FUZZY STATISTICS BASED ON FUZZY PERCEPTIONS OR REPORTS OF EXISTING NUMERICAL DATA In this section we consider situations in which the experimental outcomes have been converted into numerical values, by means of a classical random variable, but numerical data are fuzzily perceived or reported by the statistician or observer. In this way, when we say that an item is expensive or a day is cool, there exist some underlying numerical values (the exact price and temperature) so we are considering fuzzy perceptions or reports of some existing real-valued data. The scheme of such situations is the following:
→ {Pθ , θ ∈ } → → R → F (R ) experimental distribution
experimental performance
random variable
θ −−−−−−−→ Pθ −−−−−−−→ ω −−−−−→ X (ω) perception of report
−−−−−−→ V˜
→ {Pθ , θ ∈ } → → A experimental performance
values by associating with each outcome 웆 僆 ⍀ a real (or vectorial) value, so that the interest is not focused on the outcomes but on the associated values. The rule formalizing this association is referred to as a random variable, and it is assumed to be Borel-measurable to guarantee that many useful probabilities can be computed. The incorporation of a random variable to the former model induces a probability space, (⺢, B ⺢, PX ), 僆 ⌰ (or in general (⺢k, B ⺢k, PX ), 僆 ⌰, with k ⱖ 1), PX denoting the induced probability. The mechanism of the induced model can be summarized as follows:
event of interest
θ −−−−−−−→ Pθ −−−−−−−→ ω −−−−→ A (occurs if ω ∈ A) To develop a more operational model to describe random experiments, the outcomes can be ‘‘converted’’ into numerical
˜ of ⺨. where F (⺨) means the class of fuzzy subsets V To deal with these types of situations we can consider two different approaches. The first one is based on the concept of fuzzy random variable, as intended by Kwakernaak (1,2), and
FUZZY STATISTICS
Kruse and Meyer (3). The second one is based on the concept of fuzzy information [Okuda et al. (4), Tanaka et al. (5)]. The essential differences between these approaches lie in the nature of the parameters and in the probabilistic assessments. Thus, parameters in the first approach are assumed to be either fuzzy (fuzzy perceptions of unknown classical parameters) or crisp, whereas parameters in the second one are always assumed to be crisp (the classical parameters of the original random variable). On the other hand, in the approach based on fuzzy random variables probabilities often refer to fuzzy variable values and inferences are commonly fuzzy, whereas in the approach based on fuzzy information probabilities are initially assessed to the underlying numerical values and inferences are crisp. Approach Based on Fuzzy Random Variables Whenever an extension of probability theory to nonstandard data is going to be established, the fundamental aim is to provide an appropriate concept of a generalized random variable that allows one to verify the validity of essential limit theorems such as the strong law of large numbers and the central limit theorem. In the case of set-valued data, the generalized random variable is a random set [Matheron (6), Kendall (7), Stoyan et al. (8)], for which a strong law of large numbers was proved in Artstein and Vitale (9). With regard to fuzzy data and the basic notions of a fuzzy random variable [Kwakcrnaak (1)], and random fuzzy set (which are also referred to in the literature as fuzzy random variables) introduced by Puri and Ralescu (10), an analogous theorem can be formulated [Ralescu (11), Kruse (12), Miyakoshi and Shimbo (13), Klement et al. (14), and Kruse and Meyer (3)]. It supports the development of a fuzzy probability theory and thus, the laying down of the concepts for mathematical statistics on fuzzy sets. The monographs Kruse and Meyer (3), and Bandemer and Na¨ther (15) describe the theoretical and practical methods of fuzzy statistics in much detail. For comparable discussions and alternative approaches we mention, for instance, Gil (16), Czogala and Hirota (17), Hirota (18), Viertl (19,20), Tanaka (21), Kandel (22). The background of fuzzy statistics can be distinguished from two different viewpoints of modelling imperfect information using fuzzy sets. The first one regards a fuzzy datum as an existing object, for example a physical grey scale picture. Therefore, this view is called the physical interpretation of fuzzy data. The second view, the epistemic interpretation, applies fuzzy data to imperfectly specify a value that is existing and precise, but not measurable with exactitude under the given observation conditions. Thus, the first view does not examine real-valued data, but objects that are more complex. In the most simple case we need multivalued data, as they turn up in the field of random sets. In this section we restrict ourselves to considering the second view (since the first view is considered in the next section), namely the extension of traditional probability theory and mathematical statistics from the treatment of real-valued crisp data to handling fuzzy data in their epistemic interpretation as possibility distributions [Zadeh (23), Dubois and Prade (24)]. The statistical analysis of this sort of data was first studied by Kwakernaak (1,2). An extensive investigation of relevant aspects of statistical inference in the presence of possibilistic data can be found in Kruse and Meyer (3). The
183
corresponding methods have been incorporated into the software tool SOLD (Statistics on Linguistic Data) that offers several operations for analysing fuzzy random samples [Kruse (25), Kruse and Gebhardt (26)]. In the following sections we introduce the concept of a fuzzy random variable and outline how to use it for the development of a theory of fuzzy probability and fuzzy statistics. Additionally, we show some implementation aspects and features of the mentioned software tool SOLD. Fuzzy Random Variables. Introducing the concept of a fuzzy random variable means that we deal with situations in which two different types of uncertainty appear simultaneously, namely randomness and possibility. Randomness refers to the description of a random experiment by a probability space (⍀, A , P), and we assume that the whole information that is relevant for further analysis of any outcome of the random experiment can be expressed with the aid of a real number, so that we can specify a mapping U : ⍀ 씮 ⺢, which assigns to each outcome in ⍀ its random value in ⺢, U being a random variable. Possibility as a second kind of uncertainty in our description of a random experiment has to be involved whenever we are not in the position to fix the random values U(웆) as crisp numbers in ⺢, but only to imperfectly specify these values by possibility distributions on ⺢. In this case, the random variable U : ⍀ 씮 ⺢ changes into a fuzzy random variable X : ⍀ 씮 ˜ 兩V ˜ : ⺢ 씮 [0, 1]其 denoting the class of all F (⺢) with F (⺢) ⫽ 兵V fuzzy subsets (unnormalized possibility distributions) of the real numbers. A fuzzy random variable X : ⍀ 씮 F (⺢) is interpreted as a (fuzzy) perception of an inaccessible usual random variable U0 : ⍀ 씮 ⺢, which is called the original of X . The basic idea is to assume that the considered random experiment is characterized by U0, but the available description of its attached random values U0(웆) is imperfect in the sense that their most specific specification is the possibility distribution X 웆 ⫽ X (웆). In this case, for any r 僆 ⺢, the value X 웆(r) quantifies the degree of possibility with which the proposition U0(웆) ⫺ r is regarded as being true. More particularly, X 웆(r) ⫺ 0 means that there is no supporting evidence for the possibility of truth of U0(웆) ⫽ r, whereas X 웆(r) ⫺ 1 means that there is no evidence against the possibility of truth of U0(웆) ⫽ r, so that this proposition is fully possible, and X 웆(r) 僆 (0, 1) reflects that there is evidence that supports the truth of the proposition as well as evidence that contradicts it, based on a set of competing contexts for the specification of U0(웆). Recent research activities in possibility theory have delivered a variety of different approaches to the semantic background of a degree of possibility, similar to the several interpretations that have been proposed with respect to the meaning of subjective probabilities [Shafer (27), Nguyen (28), Kampe´ de Fe´riet (29), Wang (30), Dubois et al. (31)]. A quite promising way of interpreting a possibility distribution X 웆 : ⺢ 씮 [0, 1] is that of viewing X 웆 in terms of the context approach [Gebhardt (32), Gebhardt and Kruse (33,34)]. It is very important to provide such semantical underpinnings in order to obtain a well-founded concept of a fuzzy random variable. On the other hand, the following results are presented in a way that makes it sufficient for the reader to confine himself to the intuitive view of a possibility distribution X 웆
184
FUZZY STATISTICS
as a gradual constraint on the set ⺢ of possible values [Zadeh (23)]. The concept of a fuzzy random variable is a reasonable extension of the concept of a usual random variable in the many practical applications of random experiments where the implicit assumption of data precision seems to be an inappropriate simplification rather than an adequate modelling of the real physical conditions. Considering possibility distributions allows one to involve uncertainty (due to the probabilities of occurrence of competing specification contexts) as well as imprecision [due to the context-dependent set-valued specifications of U0(웆)]. For this reason a frequent case in applications, namely using error intervals instead of crisp points for measuring U0(웆), is covered by the concept of a fuzzy random variable. Note that fuzzy random variables describe situations where the uncertainty and imprecision in observing a random value U0(웆) is functionally dependent on the respective outcome 웆. If observation conditions are not influenced by the random experiment, so that for any 웆1, 웆2, 僆 ⍀, the equality of random values U0(웆1) and U0(웆2) does not imply that their imperfect specifications with the aid of possibility functions are the same, then theoretical considerations in fuzzy statistics become much simpler, since in this case we do not need anymore a concept of a fuzzy random variable. It suffices to generalize operations of traditional statistical inference for crisp data to operations on possibility distributions using the well-known extension principle [Zadeh (35)]. We reconsider this topic later. After the semantical underpinnings and aims of the concept of a fuzzy random variable have been clarified, we will now present its full formal definition and show how to use it for a probability theory based on fuzzy sets. Let F N(⺢) be the class of all normal fuzzy sets of the real line. Moreover, let F c(⺢) denote the class of all upper semicontinuous fuzzy sets ˜ 僆 F N(⺢), which means that for all 움 僆 (0, 1], the 움-cuts V ˜ 움 ⫽ 兵x 僆 ⺢ 兩 V ˜ (x) ⱖ 움其 are compact real sets. V Definition. Let (⍀, A , P) be a probability space. A function X : ⍀ 씮 F c(⺢) is called a fuzzy random variable, if and only if
inf Xα : → R, ω → inf(X (ω))α and
sup Xα : → R, ω → sup(X (ω))α
are Borel-measurable for all 움 僆 (0, 1). The notion of a fuzzy random variable and the related notion of a probabilistic set were introduced by several authors in different ways. From a formal viewpoint, the definition in this section is similar to that of Kwakernaak (1,2) and Miyakoshi and Shimbo (13). Puri and Ralescu (10) and Klement et al. (14) considered fuzzy random variables [which will be hereafter referred to as random fuzzy sets] as measurable mappings whose values are fuzzy subsets of ⺢k, or, more generally, of a Banach space; this approach involves distances on spaces of fuzzy sets and measurability of random elements valued in a metric space. Fuzzy Probability Theory and Descriptive Statistics. Our generalization of concepts of traditional probability theory to a
fuzzy probability theory is based on the idea that a fuzzy random variable is considered as a (fuzzy) perception of an inaccessible usual random variable U0 : ⍀ 씮 ⺢, which we referred to as the unknown original of X. Let H ⫽ 兵U 兩 U : ⍀ 씮 ⺢ and U Borel-measurable w.r.t. (⍀, A )其 be the set of all one-dimensional random variables w.r.t. (⍀, A , P). If only fuzzy data are available, then it is of course not possible to identify one of the candidates in U as the true original of X , but we can evaluate the degree of possibility OrigX (U) of the truth of the statement ‘‘U is the original of X ,’’ determined by the following possibility distribution OrigX on U : OrigX : H → [0, 1],
U → infω∈ {Xω (U (ω))}
The definition of OrigX shows relationships to random set theory (Matheron (6)) in the way that for all 움 僆 [0, 1], (OrigX )움 coincides with the set of all selectors of the random set X 움 : ⍀ 씮 B ⺢, X 움(웆) ⫽ (X 웆)움. Zadeh’s extension principle, which can be justified by the context approach mentioned previously [Gebhardt and Kruse (33)], helps us to define fuzzifications of well-known probability theoretical notions. As an example, consider the generalization of characteristic parameters of crisp random variables to fuzzy random variables: If 웂(U) is a characteristic of a classical random variable U : ⍀ 씮 ⺢, then γ (X ) : R → [0, 1], t →
sup
inf {Xω (U (ω))}
U ∈U ,γ (U )=t ω∈
turns out to be the corresponding characteristic of a fuzzy random variable. For example, expected value and variance of a fuzzy random variable are defined as follows: Definition ˜ (X ) : ⺢ 씮 [0, 1], t 哫 sup兵OrigX (U) 兩 U 僆 H , E(兩U兩) ⬍ (a) E 앝, E(U) ⫽ t其, is called the expected value of X . 앑 (b) Var (X ) : ⺢ 씮 [0, 1], t 哫 sup兵OrigX (U) 兩 U 僆 H , E(兩U ⫺ E(U)兩2 ⬍ 앝, E[(U ⫺ E(U))2] ⫽ t其 is the variance of X . There are also definitions for a real-valued variance [see Bandemer and Na¨ther (15), Na¨ther (36), and recently Ko¨rner (37)], but they are introduced on the basis of the Fre´chet approach and will be concerned in fact with random fuzzy sets. In a similar way, other notions of probability theory and descriptive statistics can be generalized to fuzzy data. Based on the semantically well-founded concept of a fuzzy random variable, the fuzzification step is quite simple, since it only refers to an appropriate application of the extension principle. The main theoretical problem consists in finding simplifications that support the development of efficient algorithms for calculations in fuzzy statistics. It turns out that the horizontal representation of fuzzy sets and possibility distributions by using the family of their 움-cuts is more appropriate than fixing on the vertical representation that attaches a membership degree or a degree of possibility to each element of the domain of the respective fuzzy set or possibility distribution. The horizontal representation has the advantage that it reduces operations on fuzzy sets and possibility distributions
FUZZY STATISTICS
to operations on 움-cuts [Kruse et al. (38)]. Nevertheless, many algorithms for efficient computations in fuzzy statistics require deeper theoretical effort. For more details, see Kruse and Meyer (3). Complexity problems may also result from nontrivial structures of probability spaces. Tractability therefore often means that one has to confine oneself to the consideration of finite probability spaces or to use appropriate approximation techniques[Kruse and Meyer (3)]. The following theorem shows that a convenient representation of the fuzzy expected value as one example for a characteristic of a fuzzy random variable X is derived under certain restrictions on X . Theorem. Let X : ⍀ 씮 F c(⺢) be a finite fuzzy random vari˜ n其 and pi ⫽ P(兵웆 僆 ⍀ 兩 X 웆 ˜ 1, . . ., V able such that X (⍀) ⫽ 兵V ˜ i其), i ⫽ 1, . . ., n. Then, ⫽V % &' n n pi inf(V˜ i )α , pi sup(V˜ i )α i=1
i=1
α∈(0,1]
is an 움-cut representation of E(co(X )), where co(X ) : ⍀ 씮 F c(⺢) is defined by co(X )(웆) ⫽ co(X 웆) with co(X 웆) denoting the convex hull of X 웆. Fuzzy Statistics. When fuzzy sets are chosen to be applied in mathematical statistics we have to consider two conceptual different approaches. The first one strictly refers to the concept of a fuzzy random variable. It assumes that, given a generic X : ⍀ 씮 F c(⺢) and a fuzzy random sample X 1, . . ., X n independent and identically distributed from the distribution of X , the realization of an underlying random experiment is ˜ n) 僆 [F c(⺢)]n of fuzzy-valued ˜ 1, . . ., V formalized by a tuple (V outcomes. Kruse and Meyer (3) verified that all important limit theorems (e.g., the strong law of large numbers, the central limit theorem, and the theorem of Gliwenko–Cantelli) remain valid in the more general context of fuzzy random variables. From this it follows that the extension of mathematical statistics from crisp to fuzzy data is well-founded. As an example for the generalization of an essential theorem we present a fuzzy data version of the strong law of large numbers. More general versions can be found in Klement et al. (14), Meyer (39), Kruse and Meyer (3). Theorem. Let 兵X i其i僆⺞ be an i.i.d.-sequence on the probability space (⍀, A , P) with the generic fuzzy random variable X : ⍀ 씮 F c(⺢). Let E(兩(inf X i)0兩) ⬍ 앝 and E(兩(sup X i)0兩) ⬍ 앝. Then there exists a null set N (i.e., a set with probability zero) such that for all 웆 僆 ⍀N
lim d∞
n→∞
n 1 ˜ X (ω), E(X ) n i=1 i
!
=0
where d앝 is the so-called generalized Hausdorff metric [first ˜, time introduced by Puri and Ralescu (40)], defined for V ˜ W 僆 F c(⺢) as follows: ˜ ) = sup d (V˜ α , W ˜α) d∞ (V˜ , W H α∈(0,1]
dH being the Hausdorff metric on the collection of nonempty compact (and often assumed to be convex) subsets of ⺢,
185
K c(⺢), defined for A, B 僆 K c(⺢) by dH (A, B) = max sup inf |a − b|, sup inf |a − b| a∈A b∈B
b∈B a∈A
with 兩 ⭈ 兩 denoting the euclidean norm in ⺢. The second approach to statistics with fuzzy data is not based on the concept of a fuzzy random variable, but rather on the presupposition that there is a generic random variable U : ⍀ 씮 ⺢, a crisp random sample U1, . . ., Un (that is, U1, . . ., Un are i.i.d. from the distribution of U), and a corresponding realization (u1, . . ., un) imperfectly specified by ˜ n) 僆 [F c(⺢)]n. ˜ 1, . . ., V (V Furthermore let U1, . . ., Un i.i.d. from the distribution function Fu and T(u1, . . ., un) be a realization of a statistical function T(U1, . . ., Un). The target is to calculate the corre˜ 1, . . ., V ˜ n). sponding fuzzy statistical function T(V As an example consider the problem of computing fuzzy parameter tests: Suppose that FU depends on a parameter 僆 ⌰ of a predefined parameter space ⌰ 債 ⺢k, k 僆 ⺞. Let D be a class of distribution functions, FU 僆 D , D : ⌰ 씮 D a mapping, and ⌰0, ⌰1 債 ⌰ two disjoint sets of parameters. A function ⌽ : ⺢n 씮 兵0, 1其 is called nonrandomized parameter test for (웃, ⌰0, ⌰1) with respect to D based on a given significance level 웃 僆 (0, 1), null hypothesis H0 : 僆 ⌰0, and alternative hypothesis H1 : 僆 ⌰1, if and only if ⌽ is Borelmeasurable and, for all 僆 ⌰0, E(⌽(U1, . . ., Un)兩P) ⱕ 웃) holds for U1, . . ., Un i.i.d. from FU. By application of the extension principle we obtain the corresponding fuzzy parameter test, where the calculation of ˜ 1, . . ., V ˜ n) often turns out to be a time-consuming task. ⌽(V For this reason we present one of the simple extensions, which is the fuzzy chi-square test. Theorem. Let N be the class of all normal distributions N(애, 2) and U : ⍀ 씮 ⺢ a N(애0, ˆ 2)-distributed random variable with given expected value 애0, but unknown ˆ 僆 ⌰ ⫽ ⺢⫹. Define D : ⌰ 씮 N , D() ⫽ N(애0, 2), ⌰0 ⫽ 兵0其, ⌰1 ⫽ ⌰⌰0, and choose U1, . . ., Un i.i.d. from FU and 웃 僆 (0, 1). Suppose ⌽ : ⺢n 씮 兵0, 1其 to be the nonrandomized double-sided ˜ 1, . . ., chi-square test for (웃, ⌰0, ⌰1) with respect to D. If (V ˜ 1, . . ., V ˜ n) is the realization of the ˜ n) 僆 [F c(⺢)]n, then ⌽(V V corresponding fuzzy chi-square test. For 움 僆 (0, 1] we obtain ;
(V˜ 1 , . . ., V˜ n ) α 2 {0} iff Iα (V˜ 1 , . . ., V˜ n ) > σ02 χδ/2 (n) 2 2 ˜ ˜ ( V and S α 1 , . . ., Vn ) < σ0 χ1−δ/2 (n) 2 2 ˜ ˜ = {1} iff Sα (V1 , . . ., Vn ) ≤ σ0 χδ/2 (n) 2 (n) and Iα (V˜ 1 , . . ., V˜ n ) ≥ σ02 χ1−δ/2 {0, 1} otherwise
where 웃2/2(n) denotes the 웃 /2-quantile of the chi-square distribution with n degrees of freedom, and n ; 2 Iα (V˜ , . . ., V˜ n ) = inf(V˜ )α − µ 1
i
i=1,inf(V˜ i ) α ≥µ n
+
;
2 µ − sup(V˜ i )α
i=1,sup(V˜ i ) α ≤µ
Sα (V˜ 1 , . . ., V˜ n ) =
n i=1
max
#;
2 ;
2 $
inf(V˜ i )α − µ , sup(V˜ i )α − µ
186
FUZZY STATISTICS
The SOLD-System: An Implementation. As an example for the application of many of the concepts, methods, and results discussed here, we briefly present the software tool SOLD (Statistics on Linguistic Data) [Kruse and Gebhardt (26)], that supports the modelling and statistical analysis of linguistic data, which are representable by fuzzy sets. An application of the SOLD system consists of two steps, which have to be considered separately with regard to their underlying concepts. In the first step (specification phase) SOLD enables its user to create an application environment (e.g., to analyze weather data), that consists of a finite set of attributes (e.g., clouding, temperature, precipitation) with their domains (intervals of real numbers, e.g., [0, 100] for the clouding of the sky in %). For each attribute A the user states several (possibly parameterized) elementary linguistic values (e.g., cloudy or approximately 75% as fuzzy degrees of the clouding of the sky) and defines for all of these values w the ˜ w, that shall be associated with them. For this fuzzy sets V reason SOLD provides 15 different classes of parameterized fuzzy sets of ⺢ (e.g., triangular, rectangular, trapezoidal, Gaussian, and exponential functions) as well as 16 logical and arithmetical operators (and, or, not, ⫹, ⫺, *, /, **) and functions (e.g., exp, log, min, max), that are generalized to fuzzy sets using the extension principle. The application of context-free generic grammars GA permits the combination of elementary linguistic values by logic operators (and, or, not) and linguistic hedges (very, considerable) to increase or decrease the specificity of fuzzy data. By this, formal languages L(GA) are obtained, which consist of the linguistic expressions that are permitted to describe the values of the attributes A (e.g., cloudless or fair as a linguistic expression with respect to the attribute clouding). In the second step (analysis phase) the application environments created in the specification phase can be applied to describe realizations of random samples by tuples of linguistic expressions. Since the random samples consist of existing numeric values, that generally cannot be observed exactly, the fuzzy sets, which are related to the particular linguistic expressions, are interpreted epistemically as possibility distributions. The SOLD system allows one to determine convex fuzzy estimators for several characteristic parameters of the generic random variables for the considered attributes (e.g., for the expected value, variance, p-quantile, and range). In addition SOLD calculates fuzzy estimates for the unknown parameters of several classes of given distributions and also determines fuzzy tests for one- or two-sided hypotheses with regard to the parameters of normally distributed random variables. The algorithms incorporated in this tool are based on the original results about fuzzy statistics that were presented in the monograph Kruse and Meyer (3). For reasons of efficiency, in SOLD only fuzzy sets of the classes F Dk(⺢) are employed, namely the subclasses of F N(⺢) that consist of the fuzzy sets with membership degrees out of 兵0, 1/k, . . . 1其, and 움-cuts that are representable as the union of a finite number of closed intervals. In this case the operations to be performed can be reduced to the 움-cuts of the involved fuzzy sets [Kruse et al. (38)]. Nevertheless the simplification achieved by this restriction does not guarantee that we gain an efficient implementation, since operations on 움-cuts are not equivalent to elementary interval arithmetics. The difficulties that arise
can be recognized already in the following example of determining a fuzzy estimator for the variance. Let U : ⍀ 씮 ⺢ be a random variable defined with respect to a probability space (⍀, A , P) and FU its distribution function. By a realization (u1, . . ., un) 僆 ⺢n of a random sample (U1, . . ., Un) with random variables Un : ⍀ 씮 ⺢, n ⱖ 2, that are completely independent and equally distributed according to FU, the parameter Var(U) can be estimated with the help of the variance of the random sample, defined as
1 Sn (U1 , . . ., Un ) = n−1
n
i=1
n 1 Ui − U n j=1 j
! 2
Sn(U1, . . ., Un) is an unbiased, consistent estimator for Var(U). ˜ 1, . . ., V ˜ n) 僆 [F D (⺢)]n is the specification of a fuzzy If (V k observation of (u1, . . ., un) for a given k 僆 ⺞, then by applying the extension principle we obtain the following fuzzy estimator for Var(U):
Sˆ n : [FD (R )]n → [FD (R )] k k Sˆ n (V˜ 1 , . . ., V˜ n )(y) = sup min{V˜ 1 (x1 ), . . ., V˜ n (xn )}|(x1 , . . ., xn )
∈ Rn and Sn (x1 , . . ., xn ) = y
For 움 僆 (0, 1], this leads to the 움-cuts ;
; Sˆ n (V˜ 1 , . . ., V˜ n ) α = Sn (V˜ 1 )α , . . ., (V˜ n )α ' n ˜ = y | ∃(x1 , . . ., xn ) ∈ (Vi )α : Sn (x1 , . . ., xn ) = y i=1
!2 n n n 1 1 = y |∃(x1 , . . ., xn ) ∈ (V˜ i )α : xi − xj =y n − 1 i=1 n j=1 i=1 It is ;
Sn (V˜ 1 )α , . . ., (V˜ n )α ⊆
n n 1 1 (V˜ i )α − (V˜ )α n − 1 i=1 n j=1 j
!2
˜ 1)움, . . ., and equality does not hold in general, so that Sn[(V ˜ n)움] cannot be determined by elementary interval arith(V metics. Therefore, the creation of SOLD had to be preceded by further mathematical considerations that were helpful to the development of efficient algorithms for the calculation of fuzzy estimators. Some results can be found in Kruse and Meyer (3) and Kruse and Gebhardt (41). The fuzzy set calculated during the analysis phase by statistical inference with regard to an attribute A (e.g., fuzzy estimation for the variance of the temperature) is not transformed back to a linguistic expression by SOLD, as might be expected at first glance. The fundamental problem consists in the fact that in general no w 僆 L(GA) can be found, for which ˜ w holds. Consequently one is left to a linguistic approxi⬅V mation of , that is, to find those linguistic expressions w of ˜ w approximate the fuzzy set L(GA), whose interpretations V under consideration as accurately as possible. The distance between two fuzzy sets is measured with the help of the generalized Hausdorff metric d앝.
FUZZY STATISTICS
The aim of this linguistic approximation is to determine a wopt 僆 L(GA) that satisfies for all w 僆 L(GA) that d∞ (V˜ w opt , ν) ≤ d∞ (V˜ w , ν) Since this optimization problem in general is very difficult and can lead to unsatisfactory approximations, if L(GA) is chosen unfavorably (Hausdorff distance too large or linguistic expressions too complicated), SOLD uses the language L(GA) only to name the fuzzy data that appear in the random samples related to A in an expressive way. SOLD calculates the ˜ w, ) between and a fuzzy set V ˜ w, Hausdorff distance d앝(V provided by the user as a linguistic expression w 僆 L(GA), that turns out to be suitable, but does not carry out a linguistic approximation by itself, since the resulting linguistic expression would not be very useful in order to make a decision making in consequence of the statistical inference. Before concluding this section, it should be mentioned that addition and product by a real number of fuzzy numbers based on Zadeh’s extension principle (in fact, intervals arithmetics) do not preserve all the properties of the real-valued case, so that in most situations
n n 1 1 (V˜ i )α − (V˜ )α n − 1 i=1 n j=1 j
is not equivalent to
1 n−1
n
;(V˜ ) 2 − 1 i α
i=1
n
n
!2
(V˜ j )α
!2
j=1
Approach Based on Fuzzy Information If a random experiment involving a classical (one-dimensional) random variable X is formalized by means of the induced probability space (⺢, B ⺢, PX ), 僆 ⌰, in which means a (real or vectorial) parameter value, then in accordance with Okuda et al. (4), and Tanaka et al. (5) we have the following definition. ˜ 僆 F (⺢) such that V ˜ is a BorelDefinition. An element V ˜ 債 X(⍀) is measurable function from ⺢ to [0, 1] and supp V called fuzzy information associated with (⺢, B ⺢, PX ), 僆 ⌰. As we have mentioned previously, the approach based on fuzzy information considers that the available probabilities refer to the distribution of the classical random variable. Zadeh (42) suggested a probabilistic assessment to fuzzy information from the probability distribution of the original random variable, which can be described as follows: ˜ inDefinition. The probability of the fuzzy information V duced from PX is given by the Lebesgue–Stieltjes integral P Xθ (V˜ ) = V˜ (x) dP Xθ (x)
R
187
able X, and the perceptions or reports from X are fuzzy, the ˜ 1, . . ., V ˜ k) is often intended as sample fuzzy information (V ˜ 1, the element of F (⺢k) given by the product aggregation of V ˜ ˜ ˜ ˜ . . ., and Vk (that is, (V1, . . ., Vk)(x1, . . ., xk) ⫽ V1(x1) ⭈ . . . ⭈ ˜ k(xk) for all (x1, . . ., xk) 僆 ⺢k). V Eventually, the class C of the available fuzzy perceptions/ reports should be assumed to be a fuzzy partition [in Ruspini’s ˜ (x) ⫽ 1 for all x 僆 ⺢, which is sense (45)], that is, 兺V˜僆C V usually referred to as a fuzzy information system associated with the random experiment (⺢, B ⺢, PX ), 僆 ⌰. Of course, if ˜ ) ⫽ 1 for all C is a fuzzy information system, then 兺V˜僆C PX (V 僆 ⌰. On the basis of the model we have just presented, several statistical problems involving fuzzy experimental data can be formulated and solved. We are now going to summarize most of the methods developed in the literature to deal with these problems, and we will describe in more detail a few of them. Parameter Estimation from Fuzzy Information. The aim of the point parameter estimation problem on the basis of fuzzy experimental data is to make use of the information contained in these data to determine a single value to be employed as an estimate of the unknown value of the nonfuzzy parameter . To this purpose, the classical maximum likelihood method has been extended by using Zadeh’s probabilistic definition [see Gil and Casals (46), Gil et al. (47,48)], and properties of this extension have been examined. Another technique which has been suggested to solve this problem has been introduced [Corral and Gil (49), Gil et al. (47,48)] to supply an operational approximation of the extended maximum likelihood estimates. This technique is defined as follows: Definition. Let (⺢, B ⺢, PX ), 僆 ⌰, be a random experiment in which 兵PX , 僆 ⌰其 is a parametric family of probability measures dominated by the counting or the Lebesgue measure, . Assume that the set ⺨n ⫽ 兵(x1, . . ., xn)兩 L(x1, . . ., xn; ) ⬎ 0其 does not depend on . If we consider the sample fuzzy infor˜ 1, . . ., V ˜ n) from the experiment, then the value mation (V ˜ 1, . . ., V ˜ n) 僆 ⌰, if it exists, such that *(V ; I V˜ 1 , . . ., V˜ n ; θ ∗ (V˜ 1 , . . ., V˜ n ) = inf I (V˜ 1 , . . ., V˜ n ; θ θ ∈
with
I (V˜ 1 , . . ., V˜ n ; θ ) = −
Xn
|(V˜ 1 , . . ., V˜ n )|(x1 , . . ., xn )
log L(x1 , . . ., xn ; θ ) dλ(x1 ) . . . dλ(xn ) which is the Kerridge inaccuracy between the membership ˜ 1, . . ., function of the ‘‘standardized form’’ [Saaty (50)] of (V ˜ ˜ ˜ Vn), that will be denoted by 兩(V1, . . ., Vn)兩( ⭈ ), and the likelihood function of , L( ⭈ ; ), is called minimum inaccuracy esti˜ 1, . . ., V ˜ n). mate of for the sample fuzzy information (V
(which can be considered as a particularization of LeCam’s probabilistic definition (43,44), for single stage experiments).
Some of the most valuable properties of the preceding method [see Corral and Gil (49), Gil et al. (47,48), Gebhardt et al. (51)] are those concerning the existence and uniqueness of the minimum inaccuracy solutions:
When the induced probability space corresponds to a random sample of size k from a one-dimensional random vari-
Theorem. Let (⺢, B ⺢, PX ), 僆 ⌰, be a random experiment in which 兵PX , 僆 ⌰其 is a parametric family of probability mea-
188
FUZZY STATISTICS
sures dominated by the counting or the Lebesgue measure. Assume that the experiment satisfies the following regularity conditions: (i) ⌰ is a real interval which is not a singleton; (ii) the set ⺨n does not depend on ; (iii) PX is associated with a parametric distribution function which is regular with respect to all its second -derivatives in ⌰. Suppose that the sample ˜ 1, . . ., V ˜ n) satisfies the following regularfuzzy information (V ˜ ˜ n) ⫽ 兰 n 兩(V ˜ 1, . . ., V ˜ n)兩(x1, . . ., ity conditions: (iv) (V1, . . ., V ⺨ ˜ 1, . . ., V ˜ n; ) ⬍ 앝, for all xn) d(x1) . . . d(xn) ⬍ 앝 and I (V ˜ 1, . . ., V ˜ n)兩( ⭈ ) log L( ⭈ ; ) is 僆 ⌰; (v) the product function 兩(V ‘‘regular’’ with respect to all its first and second -derivatives in ⌰, in the sense that ∂ ∂ I (V˜ 1 , . . ., V˜ n ; θ ) = − |(V˜ 1 , . . ., V˜ n )|(x1 , . . ., xn ) ∂θ ∂θ n log L(x1 , . . ., xn ; θ ) dλ(x1 ) . . . dλ(xn )
X
simple hypotheses [Casals et al. (53), Casals and Gil (54)], and the likelihood ratio test [Gil et al. (48)] for fuzzy data have been developed. On the other hand, some significance tests, like the chisquare and the likelihood ratio test for goodness of fit, have also been extended to deal with fuzzy sample information [see Gil and Casals (46), Gil et al. (47,48)]. In particular, the last technique can be presented as follows [see Gil et al. (48)]: Theorem. Let (⺢, B ⺢, PX ), 僆 ⌰, be a random experiment and let C be a finite fuzzy information system associated with it. Consider the null hypothesis H0 : PX ⬅ Q. Then, the test ˜ 1, rejecting H0 if, and only if, the sample fuzzy information (V ˜ n) satisfies that . . ., V
(V˜ 1 , . . ., V˜ n ) = 2
and
∂2 ∂2 ˜ ˜ I (V1 , . . ., Vn ; θ ) = − |(V˜ 1 , . . ., V˜ n )|(x1 , . . ., xn ) 2 2 ∂θ n ∂θ log L(x1 , . . ., xn ; θ ) dλ(x1 ) . . . dλ(xn )
X
Under the regularity conditions (i)–(v), if there is an estimator of for the (nonfuzzy) simple random sample Xn ⬅ (⺢n, B ⺢n, PX ), 僆 ⌰, whose variance attains the Fre´chet– ˜ 1, Crame´r–Rao bound, then the inaccuracy equation, ⭸/⭸ I (V ˜ . . ., Vn; ) ⫽ 0, admits a solution minimizing the inaccuracy ˜ 1, . . ., V ˜ n; ) with respect to in ⌰. I (V Moreover, under the regularity conditions (i)–(v), let T(Xn) be an estimator of for the (non-fuzzy) simple random sample Xn, attaining the Fre´chet–Crame´r–Rao lower bound for the variance, and whose expected value is given by E(T) ⫽ h(), (h being a one-to-one real-valued function on ⌰). Then, ˜ 1, . . ., V ˜ n) the inaccuracy for the sample fuzzy information (V equation admits a unique solution minimizing the inaccuracy ˜ 1, . . ., V ˜ n; ) and taking on the value *(V ˜ 1, . . ., V ˜ n) 僆 ⌰ I (V such that ; h θ ∗ (V˜ 1 , . . ., V˜ n ) = |(V˜ 1 , . . ., V˜ n )|
X n
(x1 , . . ., xn )T (x1 , . . ., xn ) dλ(x1 ) . . . dλ(xn )
The aim of the interval estimation problem on the basis of fuzzy experimental data is to make use of the information contained in these data to determine an interval to be employed as an estimate of the unknown value of the nonfuzzy parameter . To this purpose, Corral and Gil (52) have stated a procedure to construct confidence intervals. Testing Statistical Hypotheses from Fuzzy Information. The aim of the problem of testing a statistical hypothesis on the basis of fuzzy experimental data is to make use of the information contained in these data, either to conclude whether or not a given assumption about the experimental distribution could be accepted, or to determine how likely or unlikely the fuzzy sample information is if the hypothesis is true (depending on the fact that either we use a concrete significance level, or we compute the p-value, respectively). To this purpose, techniques based exactly or asymptotically on the Neyman–Pearson optimality criterion have been extended. More precisely, the Neyman–Pearson test of two
V˜ ∈c
ν(V˜ ) log
ν(V˜ ) > c∗ nQ(V˜ )
˜ ) is the observed absolute frequency of V ˜ in (V ˜ 1, where (V ˜ n), Q(V ˜) ⫽ 兰 V ˜ (x) dQ(x) is the (induced) expected prob. . ., V ⺢ ˜ if Q is the experimental distribution, and c* is the ability of V 1 ⫺ 움 fractile of the chi-square distribution with r ⫺ 1 degrees of freedom (r being the cardinality of C ), is a test at a significance level approximately 움 for large n. More precisely, un2 . der H0 the statistic ⌫ is asymptotically distributed as a r⫺1 In Gil et al. (48), the last test has been generalized to deal with composite parameter hypotheses. Statistical Decision Making from Fuzzy Information. The aim of the problem of statistical decision making from fuzzy experimental data is to make use of the information contained in these data to make a choice from a set of possible actions, when the consequences of choosing a decision are assumed to depend on some uncertainties (states). If a Bayesian context is considered, so that a prior distribution associated with the state space is defined, the extension of the Bayes principle of choice among actions has been developed [see Okuda et al. (4), Tanaka et al. (5), Gil et al. (55), Gil (56)]. In Gebhardt et al. (51) the extensive and normal forms of Bayesian decision analysis have been described, and conditions for their equivalence have been given. As particularizations of the Bayes principle for statistical decision making from fuzzy data, the Bayes point estimation and hypothesis testing techniques have been established [see Gil et al. (57), Gil (56), Casals et al. (53,58). In addition, studies on the Bayesian testing of fuzzy statistical hypotheses and on sequential tests from fuzzy data can be found in the literature [see Casals and Salas (59), Pardo et al. (60), Casals (61), Casals and Gil (62)]. As an instance of these studies, we can recall the Bayesian test of two simple fuzzy hypotheses, which has been stated [Casals (61)] as follows: Theorem. Let (⺢, B ⺢, PX ), 僆 ⌰, be a random experiment and let 앟 be a prior distribution on a measurable space (⌰, ˜ 0 be a fuzzy subset on ⌰, and let ⌰ ˜ c0 be D ) defined on ⌰. Let ⌰ its complement (in Zadeh’s sense). If A ⫽ 兵a0, a1其 is the action ˜ 0’’ and a1 ⫽ space, with a0 ⫽ accepting the hypothesis ‘‘ is ⌰
FUZZY STATISTICS
˜ c0’’, and we consider the realaccepting the hypothesis ‘‘ is ⌰ ˜ 0, ⌰ ˜ c0其 ⫻ A such that L(⌰ ˜ 0, a0) ⫽ valued loss function L : 兵⌰ ˜ c0, a1) ⫽ 0, L(⌰ ˜ c0, a0) ⫽ c0 ⬎ 0 and L(⌰ ˜ 0, a1) ⫽ c1 ⬎ 0, then L(⌰ there exists a Bayes test with respect to the prior distribution ˜ 1, . . ., V ˜ n) satisfies that 앟 which chooses a0 if, and only if, (V ˜ c0 (θ )(V˜ 1 , . . ., V˜ n )(x1 , . . ., xn ) dP Xθ (x1 ) . . . n c ˜ (θ )(V˜ 1 , . . ., V˜ n )(x1 , . . ., xn ) dP Xθ (xn ) dπ (θ ) > 1 c0 n 0
R
R
dP Xθ (x1 ) . . .
dP Xθ (xn )dπ (θ )
and a1 otherwise. On the other hand, and still in a Bayesian context, some criteria to compare fuzzy information systems have been developed. In this sense, we can refer to the criterion based on the extension of the Raiffa and Schlaifer EVSI [expected value of sample information (63)] [see Gil et al. (55)], and to that combining this extension with an informational measure [see Gil et al. (64)]. Quantification of the Information Contained in Fuzzy Experimental Data. The quantification of the information contained in data about the experimental distribution is commonly carried out through a measure of the amount of information associated with the experiment. To this purpose, the expected Fisher amount of information, the Shannon information, the Jeffreys invariant of the Kullback–Leibler divergence, and the Csisza´r parametric and nonparametric information, have been extended for fuzzy information systems and their properties have been examined [see Gil et al. (65,66), Pardo et al. (67), Gil and Gil (68), Gil and Lo´pez (69)]. Several criteria to compare fuzzy information systems have been developed on the basis of these measures, and the suitability of these criteria, along with their agreement with the extension of Blackwell’s sufficiency comparison [introduced by Pardo et al. (70)], has been analyzed. The preceding measures and criteria have been additionally employed to discuss the loss of information due to fuzziness in experimental data [see Gil (16,56), Okuda (71)]. This discussion has been used to examine the problem of choosing the appropriate size of the sample fuzzy information to guarantee the achievement of a desirable level of information, or the increasing of the fuzzy sample size with respect to the nonfuzzy one, to compensate the loss of information when only fuzzy experimental data are available. The main conclusion in this last study for the well-known Fisher information measure [Fisher (72,73)] is gathered in the following result [Gil and Lo´pez (69)]: Theorem. Let X ⬅ (⺢, B ⺢, PX ), 僆 ⌰, be a random experiment and let C be a fuzzy information system associated with it. The value of the Fisher information function associated with X, IFX(), is greater than or equal to that associated with C , IFC(), for all 僆 ⌰, where
ICF (θ ) =
22 1 ∂ log P Xθ (V˜ ) P Xθ (V˜ ) ∂θ ˜ V ∈c
On the other hand, the smallest size n of the sample fuzzy information from C which can be guaranteed to be at least as
189
informative on the average as a (nonfuzzy) random sample of size m from X is given by & mI F (θ ) n = sup FX θ ∈ IC (θ ) with ] denoting the greatest integer part. FUZZY STATISTICS BASED ON EXISTING FUZZY-VALUED DATA In this section we consider situations in which the experimental outcomes have been (directly) converted into fuzzy values, by associating with each outcome 웆 僆 ⍀ a fuzzy number or, more generally, an element of F c(⺢k), k ⱖ 1, where F c(⺢k) will ˜ of ⺢k such that denote henceforth the class of fuzzy subsets V ˜ is upper ˜ for each 움 僆 [0, 1] the 움-cut V움 is compact (that is, V ˜ ˜ 兾, V0 ⫽ cl[co(supp V)] is compact, and semicontinuous), V1 ⬆ 0 ˜ 움 is assumed to be convex for all 움 僆 [0, 1]. often V In these situations random fuzzy sets, as defined by Puri and Ralescu (10) [see also Klement et al. (14), Ralescu (74)] and originally and most commonly called in the literature fuzzy random variables, represent an appropriate model. The scheme of such situations is the following:
→ {Pθ , θ ∈ } → → Fc (Rk ) experimental distribution
experimental performance
random variable
θ −−−−−−−→ Pθ −−−−−−−→ ω −−−−−→ V˜ where now represents a subindex. To formalize the concept of random fuzzy set in Puri and Ralescu’s sense, we have first to remark that F c(⺢k) can be endowed with a linear structure with the fuzzy addition and product by a real number based on Zadeh’s extension principle (35) (although F c(⺢k) is not a vector space with these operations), and it can be endowed with the d앝 metric, defined as indicated in the first approach in the previews section, and 兩 ⭈ 兩 denoting the Euclidean norm in ⺢k. (F c(⺢k), d앝) is a complete nonseparable metric space [see Puri and Ralescu (10), Klement et al. (14)]. Then, Definition. Given the probability space (⍀, A , P), and the metric space (F c(⺢k), d앝), a random fuzzy set associated with this space is a Borel measurable function X : ⍀ 씮 F c(⺢k). A random fuzzy set X is said to be integrably bounded if 储X 0储 僆 L1(⍀, A , P) (i.e., 储X 0储 is integrable with respect to (⍀, A , P)), where 储X 0(웆)储 ⫽ dH(兵0其, X(웆)) ⫽ supx僆X(웆)兩x兩 for all 웆 僆 ⍀. If X is a random fuzzy set in Puri and Ralescu’s sense, the set-valued mapping X 움 : ⍀ 씮 K c(⺢k) defined by X 움(웆) ⫽ (X (웆))움 for all 웆 僆 ⍀ is a compact (often convex) random set for all 움 僆 [0, 1] (i.e., a Borel-measurable function from ⍀ to K c(⺢k)). When X is a random fuzzy set, an average value of X should be essentially fuzzy. In this sense, for an integrably bounded random fuzzy set, the fuzzy expected value has been introduced by Puri and Ralescu (10) as follows: Definition. If X is an intregably bounded random fuzzy set associated with the probability space (⍀, A , P), then the fuzzy expected value of X is the unique fuzzy subset of ⺢k,
190
FUZZY STATISTICS
˜ (X ), satisfying that (E ˜ (X ))움 ⫽ 兰 X 움 dP for all 움 僆 (0, 1], E ⍀ 兰⍀ X 움 dP being the Aumann’s integral of the random set X 움 (75), that is, 兰⍀ X 움 dP ⫽ 兵E( f)兩f 僆 L1(⍀, A , P), f 僆 X 움 a.s.[P]其, where E( f) is the (classical) expected value of the (real-valued) L1(⍀, A , P)-random variable f. Zhong and Zhou (76) have proven that in the case in which k ⫽ 1 and mappings are F c(⺢)-valued (although the assumption of compactness for X 0(웆) is not presupposed), Puri and Ralescu’s definition coincides with Kruse and Meyer’s one. On the basis of the last model [whose mathematical background has been also examined in Diamond and Kloeden (77)], several studies have been developed. We are now going to summarize most of them. Probabilistic Bases of Random Fuzzy Sets Several studies based on Puri and Ralescu’s definition have been devoted to establish proper probabilistic bases to develop statistical studies. Among these bases, we can point out the following: the characterization of random fuzzy sets and integrably bounded random fuzzy sets, as dH- and d앝-limits of sequences and dominated sequences, respectively, of certain operational types of random fuzzy sets [see Lo´pez-Dı´az (78), Lo´pez-Dı´az and Gil (79,80)]. As a consequence of this characterization, two practical ways for the computation of the fuzzy expected value of an integrably bounded random fuzzy set exist. In this way, the following characterizations of integrably bounded random fuzzy sets have been presented in detail in Lo´pez-Dı´az and Gil (79,80): Theorem. Let (⍀, A , P) be a probability space. A fuzzy-valued mapping X : ⍀ 씮 F c(⺢k) is an integrably bounded random fuzzy set associated with (⍀, A , P) if, and only if, there exists a sequence of simple (that is, having finite image) random fuzzy sets, 兵X m其m, X m : ⍀ 씮 F c(⺢k), associated with (⍀, A , P), and a function h : ⍀ 씮 ⺢, h 僆 L1(⍀, A , P), such that 储(X m)0(웆)储 ⱕ h(웆) for all 웆 僆 ⍀ and m 僆 ⺞, and such that lim dH ((Xm )α (ω), Xα (ω)) = 0
m→∞
for all 웆 僆 ⍀ and for each 움 僆 (0, 1]. On the other hand, X is an integrably bounded random fuzzy set associated with (⍀, A , P) if, and only if, there exists a sequence of random fuzzy sets associated with (⍀, A , P), 兵X m其m, X m : ⍀ 씮 F c(⺢k), with simple 움-cut functions, (X m)움, and a function h : ⍀ 씮 ⺢, h 僆 L1(⍀, A , P), such that 储(X m)0(웆)储 ⱕ h(웆) for all 웆 僆 ⍀ and m 僆 ⺞, and such that lim d∞ ((Xm )(ω), X (ω)) = 0
m→∞
for all 웆 僆 ⍀. In Lo´pez-Dı´az and Gil (81,82), conditions are given to compute iterated fuzzy expected values of random fuzzy sets, irrespectively of the order of integration. Some limit theorems, as a strong law of large numbers and a central limit theorem (in which the notion of normal random fuzzy set is introduced) have been obtained for these random fuzzy sets [see Ralescu and Ralescu (83), Klement et al. (14,84), Negoita and Ralescu (85), Ralescu (74)].
On the other hand, Li and Ogura (86–90) have studied setvalued functions and random fuzzy sets whose 움-cut functions are closed rather than compact. The completeness of the space of these random fuzzy sets and the existence theorem of conditional expectations have been obtained. Furthermore, regularity theorems and convergence theorems in the Kuratowski–Mosco sense have been proven for both, closed setand fuzzy-valued martingales, sub- and super-martingales, by using the martingale selection method instead of the embedding method, which is the usual tool in studies for compact ones. Inferential Statistics from Random Fuzzy Sets Several inferential problems (point estimateion, interval estimation, and hypothesis testing), concerning fuzzy parameters of random fuzzy sets, have been analyzed (see, for instance, Ralescu (11,74,91,92), Ralescu and Ralescu (83,93). Some useful results in traditional statistics, like the Brunn–Minkowski and the Jensen inequality, have been extended for random fuzzy sets [see Ralescu (74,92)]. The one extending the well-known and valuable Jensen inequality can be presented as follows: Theorem. Let (⍀, A , P) be a probability space and let X : ⍀ 씮 F c(⺢k) be an integrably bounded random fuzzy set associated with (⍀, A , P). If : F c(⺢k) 씮 ⺢ is a convex function ˜ ) 丣 ((1 ⫺ ) 䉺 W ˜ )) ⱕ (V ˜ ) ⫹ (1 ⫺ )(W ˜) (that is, (( 䉺 V k ˜ ˜ for all V, W 僆 F c(⺢ )), then ; ˜ ) ≤ E(ϕ ◦ X ) ϕ E(X
The problem of quantifying the relative inequality associated with random fuzzy sets has been studied [see Corral et al. (94), Lo´pez-Garcı´a (95), Gil et al. (96)]. This study introduces some measures of the extent or magnitude of the inequality associated with fuzzy-valued variables (like some linguistic or opinion variables, and so on), which could not be quantified by means of classical indices [like those given by Gastwirth (97)]. As a consequence, the class of fields the measurement of inequality can be applied to will significantly increase (so that, not only economics and industry, but psychology, social sciences, engineering, medicine, etc., will benefit from the conclusions of this study). Main properties of the classical inequality indices, like the mean independence, the population homogeneity, the principles of transfers, the Schur-convexity, the symmetry, and the continuity (in terms of d앝), are preserved for the fuzzy-valued indices introduced in Lo´pez-Garcı´a (95) and Gil et al. (96) In Lo´pez-Garcı´a (95), a convenient software to compute the fuzzy expected value and the fuzzy inequality indices has been developed. This software permits an easy graphical representation of the computed values by integrating it in commercial applications. The problem of measuring the mean dispersion of a random fuzzy set with values in F c(⺢) with respect to a concrete element in F c(⺢) (and, in particular, with respect to the fuzzy expected value of the random fuzzy set) has been examined in Lubiano et al. (98). The suggested measure, which will be 씮 referred to as the -mean square dispersion is real-valued, since it has been introduced not only as a summary measure of the extent of the dispersion, but rather with the purpose of
FUZZY STATISTICS
comparing populations or random fuzzy sets when necessary. The approach to get the extension of the variance for a random fuzzy set differs from that by Na¨ther (36) and Ko¨rner 씮 (37), although coincides with it for a particular choices of . Another statistical problem which has been discussed is that of estimating some population characteristics associated with random fuzzy sets (like the fuzzy inequality index) in random samplings from finite populations [see Lo´pez-Garcı´a (95), Lo´pez-Garcı´a et al. (99)]. As an example of the results obtained in the last discussions, the following result has been stated [see Lo´pez-Garcı´a (95), Lo´pez-Garcı´a et al. (99): Theorem. In the simple random sampling of size n from a population of N individuals or sampling units, an unbiased [up to additive equivalences, 앑丣, Maresˇ (100)] fuzzy estimator of the fuzzy hyperbolic population index is that assessing to the sample [] the fuzzy value
\
+ ,s I˜H (X ) ([τ ]) = +; ; , 1 wv n(N − 1) I˜H (X [τ ]) ⊕ (n − N) I˜H (X [τ ]) (n − 1)N
where X [] means the random fuzzy set X as distributed on [], I˜H(X []) is the sample fuzzy hyperbolic index in [], which is given by the fuzzy value such that for each 움 僆 (0, 1]
(I˜H (X [τ ])α
1
= E
E(inf(X [τ ])
α)
sup(X [τ ])α
− 1 ,E
E(sup(X [τ ])
α)
inf(X [τ ])α
−1
2
and I˜H(X []) is the expected within-values hyperbolic inequality in sample [], that is, , 1 + wv I˜H (X [τ ]) = I˜H (X [Uτ 1 ]) ⊕ . . . ⊕ I˜H (X [Uτ n ]) n with X [Ui] denoting the random fuzzy set degenerate at the fuzzy value of X on the i-th individual of sample [], i ⫽ 1, . . ., n. Statistical Decision Making with Fuzzy Utilities A general handy model to deal with single-stage decision problems with fuzzy-valued consequences has been presented [Gil and Lo´pez-Dı´az (101), Lo´pez-Dı´az (78), Gebhardt et al. (51)], the model being based on random fuzzy set in Puri and Ralescus’s sense. This problem was previously discussed in the literature of fuzzy decision analysis [see Watson et al. (102), Freeling (103), Tong and Bonissone (104), Dubois and Prade (105), Whalen (106), Gil and Jain (107), Lamata (108)]. Gil and Lo´pez-Dı´az’s model has a wider application than the previous ones (in the case of real-valued assessments of probabilities). The aim of the problem of statistical decision making with fuzzy utilities on the basis of real-valued experimental data is to use the information contained in these data to help the decision maker in taking an appropriate action chosen from a set of possible ones, when the consequence of the choice is assumed to be the interaction of the action selected and the state which actually occurs.
191
By using the concepts of random fuzzy set and its fuzzy expected value, and the ranking of fuzzy numbers given by Campos and Gonza´lez (109) (the -average ranking method), the concept of fuzzy utility function in the fuzzy expected utility approach has been introduced [Gil and Lo´pez-Dı´az (101), Lo´pez-Dı´az (78)]. Gil and Lo´pez-Dı´az (101) [see also Gebhardt et al. (51)] have developed Bayesian analyses (in both, the normal and the extensive form) of the statistical decision problem with fuzzy utilities, and conditions have been given for the equivalence of these two forms of the Bayesian analysis. An interesting conclusion obtained in this study indicates that the axiomatic developments establishing the fundamentals of the real-valued utility functions in the expected utility approach, also establishes the fundamentals of the fuzzy utility function in the fuzzy expected utility approach. Thus, [see Gil and Lo´pez-Dı´az (101), Lo´pez-Dı´az (78), Gebhardt et al. (51)]: Theorem. Consider a decision problem with reward space R and space of lotteries P . If S is a set of axioms guaranteeing the existence of a bounded real-valued utility on R , which is unique up to an increasing linear transformation, then S also ensures the existence of a class of fuzzy utility functions on R . An analysis of the structure properties of the last class of fuzzy utility functions has been stated by Gil et al. (110). Finally, a criterion to compare random experiments in the framework of a decision problem with fuzzy utilities has been developed [Gil et al. (111)]. OTHER STUDIES ON FUZZY STATISTICS The last two sections have been devoted to univariate fuzzy statistics. Multivariate fuzzy statistics refers to descriptive and inferential problems and procedures, to manage situations including several variables and involving either fuzzy data or fuzzy dependences. The problems of multivariate fuzzy statistics which have received a deeper attention are fuzzy data and fuzzy regression analyses. We are now going to present a brief review of some of the best known problems and methods. Studies on fuzzy data analysis are mainly focused on cluster analysis, which is a useful tool in dealing with a large amount of data. The aim of the fuzzy cluster analysis is to group a collection of objects, each of them described by means of several variables, in a finite number of classes (clusters) which can overlap and allow graduate membership of objects to clusters. Fuzzy clustering supplies solutions to some problems (like bridges, strays, and undetermined objects among the clusters) which could not be solved with classical techniques. The first approach to fuzzy clustering was developed by Ruspini (45,112), and this approach is based on the concept of fuzzy partition and an optimization problem where the objective function tends to be small as a close pair of objects have nearly equal fuzzy cluster membership. The optimal fuzzy partition is obtained by using an adapted gradient method. Another well-known method of fuzzy clustering is the socalled fuzzy k-means, which has been developed by Dunn
192
FUZZY STATISTICS
(113) and Bezdek (114), and it is based on a generalization of the within-groups sum of squares and the use of a norm (usually a euclidean one) to compute distances between objects and ‘‘centres’’ of clusters. The solution of the optimization problem in this method is obtained by employing an algorithm, which has been recently modified (Wang et al. (115)) by considering a bi-objective function. The classical clustering procedure based on the maximum likelihood method, has been extended to fuzzy clustering by Trauwaert et al. (116) and Yang (117). These extensions do not force clusters to have a quite similar shape. Finally, we have to remark that a divisive fuzzy hierarchical clustering technique which does not require a previous specification of the number of clusters has been also developed [Dimitrescu (118)]. A general review of many techniques in fuzzy data analysis based on distances or similarities between objects and clusters can be found in Bandemer and Na´ther (15) [see also, Bandemer and Gottwald (119)]. The aim of the fuzzy regression analysis is to look for a suitable mathematical model relating a dependent variable with some independent ones, when some of the elements in the model can be fuzzy. Tanaka et al. (120–122) considered a possibilistic approach to linear regression analysis, which leads to the fuzzy linear regression in which experimental data are assumed to be real-valued, but parameters of the linear relation are assumed to be fuzzy-valued, and they are determined such that the fuzzy estimate contains the observed real value with more than a given degree, the problem being reduced to a linear programming one. Some additional studies on this problem have been also developed by Moskowitz and Kim (123). Ba´rdossy (124) extended the preceding study by considering the fuzzy general regression problem (the fuzzy linear regression being a special case), and also incorporating more general fuzzy numbers. Another interesting approach to fuzzy regression is that based on extending the least squares procedure of the classical case by previously defining some suitable distances between fuzzy numbers. In this approach, we must refer to the Diamond (125,126) and the Ba´rdossy et al. (127) studies, which consider the fuzzy linear regression problem involving real- or vectorial-valued parameters and fuzzy set data. Salas (128) and Bertoluzza et al. (129) have studied fuzzy linear and polinomial regression based on some operational distances between fuzzy numbers [see Salas (128), Bertoluzza et al. (130)]. Na¨ther (36) presents an attempt to develop a linear estimation theory based on the real-valued variance for random fuzzy sets mentioned in the previous two sections. Other valuable studies on fuzzy regression are due to Yager (131), Heshmaty and Kandel (132), Celminsˇ (133), Wang and Li (134), Savic and Pedrycz (135,136), Ishibuchi and Tanaka (137,138), and Guo and Chen (139).
nential distribution with unknown parameter . To get information and obtain conclusions about the parameter value, a psychologist considers the experiment in which the time of attention to a game chosen at random by a four-year-old child, 웆, is observed. The mathematical model for this random experiment is the probability space X ⬅ (⺨, B ⺨, PX ), 僆 ⌰, where ⺨ ⫽ ⺢⫹ and P is the exponential distribution 웂(1, ). Assume that as the loss of interest in a game does not usually happen in an instantaneous way, the psychologist pro˜ 1 ⫽ a few minutes, V ˜i ⫽ vides us with imprecise data like V ˜ 9 ⫽ much more than around 10i minutes (i ⫽ 2, . . ., 8), and V 1 hour. These data could easily be viewed as fuzzy information associated with the random experiment and can be described by means of the triangular/trapezoidal fuzzy numbers with support contained in [0, 120] in Fig. 1. ˜ 1, . . ., V ˜ 9其 determines a fuzzy information The set C ⫽ 兵V system, so that we can consider methods of statistics in the approach based on fuzzy information. In this way, if the psychologist wants to estimate the unknown value of , and for that purpose he selects at random and independently a sample of n ⫽ 600 four-year-old children, and observes the time of attention to a given game, and the data reported to the ˜ 1, . . ., and V ˜ 9, with respective absolute frestatistician are V quencies n1 ⫽ 314, n2 ⫽ 114, n3 ⫽ 71, n4 ⫽ 43, n5 ⫽ 24, n6 ⫽ 18, n7 ⫽ 10, and n8 ⫽ 6, then since the experimental distribution is 웂(1, ), the minimum inaccuracy estimate would be given by θ ∗ = 8
i=1 ni ,
600 = 0.05 x|V˜ |(x) dx
R
i
Example. A neurologist has to classify his most serious patients as requiring exploratory brain surgery (action a1), requiring a preventive treatment with drugs (action a2), or not requiring either treatment or surgery (action a3). From medical databases, it has been found that 50% of the people examined needed the operation (state 1), 30% needed the preventive treatment (state 2), while 20% did not need either treatment or surgery (state 3). The utilities (intended as opposite to losses) of right classifications are null. The utilities of wrong classifications are diverse: an unnecessary operation means resources are wasted and the health of the patient may be prejudiced; a preventive treatment means superfluous expenses and possible side effects, if the patient does not require either preventive treat-
1
~ V1
~ V2
~ V3
~ V4
~ V5
~ V6
~ V7
~ V8
20
30
40
50
60
70
80
~ V9
SOME EXAMPLES OF FUZZY STATISTICS The models and methods of fuzzy statistics in this article can be applied to many problems. In this section we present two examples which illustrate the practical use of some of these methods. Example. The time of attention (in minutes) to the same game of four-year old children is supposed to have an expo-
10
90
120
Figure 1. Time of attention to the same game of four-year-old children.
FUZZY STATISTICS
U (θ 2 , a 3 )
U(θ 3, a1) U(θ 2, a1)
U(θ 3, a2)
1
193
tion which is defined by ˜ = VLλ (A) [λ inf A˜ α + (1 − λ) sup A˜ α ] dα (0.1]
˜ 僆 F c(⺢), and 僆 [0, 1] being a previously fixed optifor all A mism–pessimism parameter [see Campos and Gonza´lez (109) for a graphical interpretation of this function]. This model will lead us to conclude that if we apply the VL.5 ranking function we get that –1
–.8
–.7
–.6
–.5
Figure 2. Fuzzy utilities of wrong classifications.
Step 1:
VL.5 (U (θ1 , a1 )) = VL.5 (U (θ2 , a2 )) = VL.5 (U (θ3 , a3 )) = 0 VL.5 (U (θ1 , a2 )) = −0.922967,VL.5 (U (θ1 , a3 )) = −0.930410
ment or surgery, and may be insufficient if the surgery is really required; if a patient requiring surgery does not get it on time and no preventive treatment is applied, the time lost until clear symptoms appear may be crucial. The preceding problem can be regarded as a single-stage decision problem in a Bayesian context, with state space ⌰ ⫽ 兵1, 2, 3其, action space A ⫽ 兵a1, a2, a3其, and prior distribution 앟 with 앟(1) ⫽ 0.5, 앟(2) ⫽ 0.3, and 앟(3) ⫽ 0.2. Problems of this type usually receive in the literature a real-valued assessment of utilities [see, for instance, Wonnacott and Wonnacott (140) for a review of similar problems]. However, a real-valued assessment seems to be extremely rigid, in view of the nature of the elements in this problem, but rather a more realistic utility evaluation to describe the neurologist preferences would be the following:
U (θ1 , a1 ) = U (θ2 , a2 ) = U (θ3 , a3 ) = 0 U (θ1 , a2 ) = very dangerous, U (θ1 , a3 ) = extremely dangerous U (θ2 , a1 ) = inconvenient, U (θ2 , a3 ) = dangerous U (θ3 , a1 ) = excessive, U (θ3 , a2 ) = unsuitable The values assessed to the consequences of this decision problem cannot be represented on a numerical scale, but they could be expressed in terms of fuzzy numbers as, for instances, U (2, a1) ⫽ ⌸(0.1, ⫺0.6), U (3, a1) ⫽ ⌸(0.1, ⫺0.7) (⌸ being the well-known Pi curve, cf. Zadeh (141), Cox (142)), 2 if t ∈ [−1, −0.75] 1 − 12(t + 1) 2 U (θ2 , a3 )(t) = 20t + 24t + 7 if t ∈ [−0.75, −0.7] 0 otherwise 2 5t + 8t + 3 if t ∈ [−0.6, −0.5] U (θ3 , a2 )(t) = 1 − 3t 2 if t ∈ [−0.5, 0] 0 otherwise and U (1, a2) and U (1, a3) are both obtained from U (2, a2) by applying the linguistic modifiers very and extremely [see Zadeh (141), Cox (142)], that is U (1, a2) ⫽ [U (2, a3)]2 and U (1, a3) ⫽ [U (2, a3)]3 (Fig. 2). Doubtless, the situation in this problem is one of those needing a crisp choice among actions a1, a2, and a3. The model and extension of the prior Bayesian analysis developed by Gil and Lo´pez Dı´az (101) is based on Campos and Gonza´lez (109) -average ranking criterion using the -average ranking func-
VL.5 (U (θ2 , a1 )) = −0.6,VL.5 (U (θ2 , a3 )) = −0.903334 VL.5 (U (θ3 , a1 )) = −0.7,VL.5 (U (θ3 , a2 )) = −0.193333 Step 2: The values of VL.5 for the prior fuzzy expected utilities of a1, a2, and a3, are given by
˜ a |π ) = −0.32 VL.5 ◦ E(U 1 ˜ a |π ) = −0.50 VL.5 ◦ E(U 2 ˜ a |π ) = −0.74 VL.5 ◦ E(U 3 whence a1 is the Bayes action of the problem. Step 3: The ‘‘value’’ of the decision problem in a prior ˜ (U a 兩앟), which is Bayesian analysis is then given by E 1 the fuzzy number given by the PI curve ⌸(0.05, ⫺0.32) (Fig. 3). ADDITIONAL REMARKS The development of statistics involving fuzzy data or elements is often based on the extension of classical procedures from mathematical statistics. Several of these extensions do not keep their properties in the nonfuzzy case. Thus, the fuzzy chi-square test in the approach based on fuzzy random variables is in general not a test with significance level 웃. In the same way, the extended maximum-likelihood methods in the approach based on fuzzy information cannot be applied to obtain a maximum-likelihood estimator for a parameter of a random fuzzy set, since maximum-likelihood methods are tied to a density of the underlying random fuzzy set, and characterization by densities does not exist for random sets/fuzzy random sets.
~ E (U a1π ) 1
–.37 –.32 –.27 Figure 3. Value of the decision problem.
194
FUZZY STATISTICS
BIBLIOGRAPHY 1. H. Kwakernaak, Fuzzy random variables, Part I: Definitions and theorems, Inform. Sci., 15: 1–29, 1978. 2. H. Kwakernaak, Fuzzy random variables, Part II: Algorithms and examples for the discrete case, Inform. Sci., 17: 253–278, 1979. 3. R. Kruse and K. D. Meyer, Statistics with Vague Data, Dordrecht: Reidel Publ. Co.,1987. 4. T. Okuda, H. Tanaka, and K. Asai, A formulation of fuzzy decision problems with fuzzy information, using probability measures of fuzzy events, Inform. Contr., 38: 135–147, 1978. 5. H. Tanaka, T. Okuda, and K. Asai, Fuzzy information and decision in statistical model, in M. M. Gupta, R. K. Ragade and R. R. Yager (eds.), Advances in Fuzzy Sets Theory and Applications: 303–320, Amsterdam: North-Holland, 1979. 6. G. Matheron, Random Sets and Integral Geometry, New York: Wiley, 1975. 7. D. G. Kendall, Foundations of a theory of random sets, in E. F. Harding and D. G. Kendall (eds.), Stochastic Geometry: 322–376, New York: Wiley, 1974. 8. D. Stoyan, W. S. Kendall, and J. Mecke, Stochastic Geometry and Its Applications, New York: Wiley, 1987. 9. Z. Artstein and R. A. Vitale, A strong law of large numbers for random compact sets, Ann. Probab., 3: 879–882, 1975. 10. M. L. Puri and D. A. Ralescu, Fuzzy random variables, J. Math. Anal. Appl., 114: 409–422, 1986. 11. D. A. Ralescu, Fuzzy logic and statistical estimation, Proc. 2nd World Conf. Math. Service Man, Las Palmas: 605–606, 1982. 12. R. Kruse, The strong law of large numbers for fuzzy random variables, Inform. Sci., 12: 53–57, 1982. 13. M. Miyakoshi and M. Shimbo, A strong law of large numbers for fuzzy random variables, Fuzzy Sets Syst., R12: 133–142, 1984. 14. E. P. Klement, M. L. Puri, and D. A. Ralescu, Limit theorems for fuzzy random variables, Proc. Royal Soc. London, Series A, 19: 171–182, 1986. 15. H. Bandemer and W. Na¨ther, Fuzzy Data Analysis, Boston: Kluwer Academic Publishers, 1992. 16. M. A. Gil, On the loss of information due to fuzziness in experimental observations, Ann. Inst. Stat. Math., 40: 627–639, 1988. 17. E. Czogala and K. Hirota, Probabilistic Sets: Fuzzy and Stochastic Approach to Decision Control and Recognition Processes, Ko¨ln: ¨ V Rheinland, 1986. Verlag TU 18. K. Hirota, Concepts of probabilistic sets, Fuzzy Sets Syst., 5: 31– 46, 1981. 19. R. Viertl, On statistical inference based on non-precise data. In H. Bandemer (ed.) Modelling Uncertain Data, Series: Mathematical Research, Vol. 68: Berlin: Akademie-Verlag, 1992, pp. 121–130. 20. R. Viertl, Statistical Methods Based on Non-precise Data, Boca Raton: CRC Press, 1996. 21. H. Tanaka, Fuzzy data analysis by possibilistic linear models, Fuzzy Sets Syst., 24: 363–375, 1987. 22. A. Kandel, On fuzzy statistics. In M. M. Gupta, R. K. Ragade and R. R. Yager (eds.), Advances in Fuzzy Sets Theory and Applications: Amsterdam: North-Holland, 1979, pp. 181–200. 23. L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst., 1: 3–28, 1978. 24. D. Dubois and H. Prade, Possibility Theory, New York: Plenum Press, 1988. 25. R. Kruse, On a software tool for statistics with linguistic data, Fuzzy Sets Syst., 24: 377–383, 1987.
26. R. Kruse and J. Gebhardt, On a dialog system for modelling and statistical analysis of linguistic data, Proc. 3rd IFSA Congress, Seattle: 157–160, 1989. 27. G. Shafer, A Mathematical Theory of Evidence, Princeton, NJ: Princeton University Press, 1976. 28. H. T. Nguyen, On random sets and belief functions, J. Math. Analysis Appl., 65: 531–542, 1978. 29. J. Kampe´ de Fe´riet, Interpretation of membership functions of fuzzy sets in terms of plausibility and belief. In M. M. Gupta and E. Sanchez (eds.), Fuzzy Information and Decision Processes: Amsterdam: North-Holland, 1982, pp. 13–98. 30. P. Z. Wang, From the fuzzy Statistics to the falling random subsets. In P. P. Wang (ed.), Advances in Fuzzy Sets Possibility and Applications: New York: Plenum Press, 1983, pp. 81–96. 31. D. Dubois, S. Moral, and H. Prade, A semantics for possibility theory based on likelihoods, Tech. Rep. CEC-Esprit III BRA, 6156 DRUMS II, 1993. 32. J. Gebhardt, On the epistemic view of fuzzy statistics. In H. Bandemer (ed.), Modelling Uncertain Data, Series: Mathematical Research, Vol. 68: Berlin: Akademie-Verlag, 1992, pp. 136–141. 33. J. Gebhardt and R. Kruse, A new approach to semantic aspects of possibilistic reasoning. In M. Clarke, R. Kruse and S. Moral (eds.), Symbolic and Quantitative Approaches to Reasoning and Uncertainty, Lecture Notes in Computer Science 747: Berlin: Springer-Verlag, 1993, pp. 151–160. 34. J. Gebhardt and R. Kruse, Learning possibilistic networks from data, Proc. 5th Int. Workshop Artificial Intell. Stat., Fort Lauderdale, FL: 1995, pp. 233–244. 35. L. A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning, Inform. Sci., Part 1 8: 199–249; Part 2 8: 301–353; Part 3 9: 43–80, 1975. 36. W. Na¨ther, Linear statistical inference for random fuzzy data, Statistics, 29: 221–240, 1997. 37. R. Ko¨rner, On the variance of fuzzy random variables, Fuzzy Sets Syst., 92: 83–93, 1997. 38. R. Kruse, J. Gebhardt, and F. Klawonn, Foundations of Fuzzy Systems, New York: Wiley, 1994. 39. K. D. Meyer, Grenzwerte zum Scha¨tzen von Parametern unscharfer Zufallsvariablen. PhD thesis, University of Braunschweig, 1987. 40. M. L. Puri and D. A. Ralescu, Diffe´rentielle d’une fonction floue, C.R. Acad. Sci. Paris, Se´rie I, 293: 237–239, 1981. 41. R. Kruse and J. Gebhardt, Some new aspects of testing hypothesis in fuzzy statistics, Proc. NAFIPS’90 Conf., Toronto: 1990, pp. 185–187. 42. L. A. Zadeh, Probability measures of fuzzy events, J. Math. Anal. Appl., 23: 421–427, 1968. 43. L. Le Cam, Sufficiency and approximate sufficiency, Ann. Math. Stat., 35: 1419–1455, 1964. 44. L. Le Cam, Asymptotic Methods in Statistical Decision Theory, New York: Springer-Verlag, 1986. 45. E. H. Ruspini, A new approach to clustering, Inform. Control, 15: 22–32, 1969. 46. M. A. Gil and M. R. Casals, An operative extension of the likelihood ratio test from fuzzy data, Statist. Pap., 29: 191–203, 1988. 47. M. A. Gil, N. Corral, and P. Gil, The minimum inaccuracy estimates in 2 tests for goodness of fit with fuzzy observations, J. Stat. Plan. Infer., 19: 95–115, 1988. 48. M. A. Gil, N. Corral, and M. R. Casals, The likelihood ratio test for goodness of fit with fuzzy experimental observations, IEEE Trans. Syst., Man, Cybern., 19: 771–779, 1989. 49. N. Corral and M. A. Gil, The minimum inaccuracy fuzzy estimation: An extension of the maximum likelihood principle, Stochastica, VIII: 63–81, 1984.
FUZZY STATISTICS 50. T. L. Saaty, Measuring the fuzziness of sets, J. Cybern., 4: 53– 61, 1974. 51. J. Gebhardt, M. A. Gil, and R. Kruse, Fuzzy set-theoretic methods in statistics (Chp. 10). In D. Dubois, H. Prade (Series eds.) and R. Slowinski (ed.), Handbook of Fuzzy Sets, Vol. 5— Methodology 3: Operations Research and Statistics (in press), Boston: Kluwer Academic Publishers, 1998. 52. N. Corral and M. A. Gil, A note on interval estimation with fuzzy data, Fuzzy Sets Syst., 28: 209–215, 1988. 53. M. R. Casals, M. A. Gil, and P. Gil, On the use of Zadeh’s probabilistic definition for testing statistical hypotheses from fuzzy information, Fuzzy Sets Syst., 20: 175–190, 1986. 54. M. R. Casals and M. A. Gil, A note on the operativeness of Neyman–Pearson tests with fuzzy information, Fuzzy Sets Syst., 30: 215–220, 1988. 55. M. A. Gil, M. T. Lo´pez, and P. Gil, Comparison between fuzzy information systems, Kybernetes, 13: 245–251, 1984. 56. M. A. Gil, Fuzziness and loss of information in statistical problems, IEEE Trans. Syst., Man, Cybern., 17: 1016–1025, 1987. 57. M. A. Gil, N. Corral, and P. Gil, The fuzzy decision problem: An approach to the point estimation problem with fuzzy information, Eur. J. Oper. Res., 22: 26–34, 1985. 58. M. R. Casals, M. A. Gil, and P. Gil, The fuzzy decision problem: An approach to the problem of testing statistical hypotheses with fuzzy information, Eur. J. Oper. Res., 27: 371–382, 1986. 59. M. R. Casals and A. Salas, Sequential Bayesian test from fuzzy experimental information, Uncertainty and Intelligent Systems—IPMU’88, Lecture Notes in Computer Science, 313: 314– 321, 1988. 60. L. Pardo, M. L. Mene´ndez, and J. A. Pardo, A sequential selection method of a fixed number of fuzzy information systems based on the information energy gain, Fuzzy Sets Syst., 25: 97– 105, 1988. 61. M. R. Casals, Bayesian testing of fuzzy parametric hypotheses from fuzzy information, R.A.I.R.O.-Rech. Ope´r., 27: 189–199, 1993. 62. M. R. Casals and P. Gil, Bayesian sequential test for fuzzy parametric hypotheses from fuzzy information, Inform. Sci., 80: 283– 298, 1994. 63. H. Raiffa and R. Schlaifer, Applied Statistical Decision Theory, Boston: Harvard University, Graduate School of Business, 1961. 64. M. A. Gil, M. T. Lo´pez, and J. M. A. Garrido, An extensive-form analysis for comparing fuzzy information systems by means of the worth and quietness of information, Fuzzy Sets Syst., 23: 239–255, 1987. 65. M. A. Gil, M. T. Lo´pez, and P. Gil, Quantity of information: Comparison between information systems: 1. Nonfuzzy states. Fuzzy Sets Syst., 15: 65–78, 1985. 66. P. Gil et al., Connections between some criteria to compare fuzzy information systems, Fuzzy Sets Syst., 37: 183–192, 1990. 67. L. Pardo, M. L. Mene´ndez, and J. A. Pardo, The f*-divergence as a criterion of comparison between fuzzy information systems, Kybernetes, 15: 189–194, 1986. 68. M. A. Gil and P. Gil, Fuzziness in the experimental outcomes: Comparing experiments and removing the loss of information, J. Stat. Plan. Infer., 31: 93–111, 1992. 69. M. A. Gil and M. T. Lo´pez, Statistical management of fuzzy elements in random experiments. Part 2: The Fisher information associated with a fuzzy information system, Inform. Sci., 69: 243–257, 1993. 70. L. Pardo, M. L. Mene´ndez, and J. A. Pardo, Sufficient fuzzy information systems, Fuzzy Sets Syst., 32: 81–89, 1989. 71. T. Okuda, A statistical treatment of fuzzy obserations: Estimation problems, Proc. 2nd IFSA Congress, Tokyo: 51–55, 1987.
195
72. R. A. Fisher, On the mathematical foundations of theoretical statistics, Phil. Trans. Roy. Soc. London, Series A, 222: 309– 368, 1922. 73. R. A. Fisher, Theory of statistical estimation, Proc. Camb. Phil. Soc., 22: 700–725, 1925. 74. D. A. Ralescu, Fuzzy random variables revisited, Proc. IFES’95, Vol. 2: Yokohama, 1995, pp. 993–1000. 75. R J. Aumann, Integrals of set-valued functions, J. Math. Anal. Appl., 12: 1–12, 1965. 76. C. Zhong and G. Zhou, The equivalence of two definitions of fuzzy random variables, Proc. 2nd IFSA Congress, Tokyo: 1987, pp. 59–62. 77. Ph. Diamond and P. Kloeden, Metric Spaces of Fuzzy Sets, Singapore: World Scientific, 1994. 78. M. Lo´pez-Dı´az, Medibilidad e Integracio´n de Variables Aleatorias Difusas. Aplicacio´n a Problemas de Decisio´n, PhD Thesis, Universidad de Oviedo, 1996. 79. M. Lo´pez-Dı´az and M. A. Gil, Constructive definitions of fuzzy random variables, Stat. Prob. Lett., 36: 135–143, 1997. 80. M. Lo´pez-Dı´az and M. A. Gil, Approximating integrably bounded fuzzy random variables in terms of the ‘‘generalized’’ Hausdorff metric, Inform. Sci., 104: 279–291, 1998. 81. M. Lo´pez-Dı´az and M. A. Gil, An extension of Fubini’s Theorem for fuzzy random variables, Tech. Rep. University of Oviedo, January 1997. 82. M. Lo´pez-Dı´az and M. A. Gil, Reversing the order of integration in iterated expectations of fuzzy random variables, and some statistical applications, Tech. Rep. University of Oviedo, January 1997. 83. A. Ralescu A and D. A. Ralescu, Probability and fuzziness, Inform. Sci., 17: 85–92, 1984. 84. E. P. Klement, M. L. Puri, and D. A. Ralescu, Law of large numbers and central limit theorem for fuzzy random variables. In R. Trappl (ed.), Cybernetics and Systems Research, 2: Amsterdam: North-Holland, 1984, pp. 525–529. 85. C. V. Negoita and D. A. Ralescu, Simulation, Knowledge-based Computing, and Fuzzy Statistics, New York: Van Nostrand Reinhold, 1987. 86. S. Li and Y. Ogura, Fuzzy random variables, conditional expectations and fuzzy martingales, J. Fuzzy Math., 4: 905–927, 1996. 87. S. Li and Y. Ogura, Convergence of set valued and fuzzy valued martingales, Fuzzy Sets Syst., in press, 1997. 88. S. Li and Y. Ogura, Convergence of set valued sub- and supermartigales in the Kuratowski–Mosco sense, (preprint), 1997. 89. S. Li and Y. Ogura, A convergence theorem of fuzzy valued martingales, Proc. FUZZ-IEEE’96, New Orleans: 1996, pp. 290–294. 90. S. Li and Y. Ogura, An optional sampling theorem for fuzzy valued martingales, Proc. IFSA’97, Prague (in press), 1997. 91. D. A. Ralescu, Statistical decision-making without numbers, Proc. 27th Iranian Math. Conf., Shiraz: 1996, pp. 403–417. 92. D. A. Ralescu, Inequalities for fuzzy random variables, Proc. 26th Iranian Math. Conf., Kerman: 1995, pp. 333–335. 93. A. Ralescu and D. A. Ralescu, Fuzzy sets in statistical inference. In A. Di Nola and A. G. S. Ventre (eds.), The Mathematics of ¨ V Rheinland, 1986, pp. Fuzzy Systems: Ko¨ln: Verlag TU 271–283. 94. N. Corral, M. A. Gil, and H. Lo´pez-Garcı´a, The fuzzy hyperbolic inequality index of fuzzy random variables in finite populations, Mathware & Soft Comput., 3: 329–339, 1996. 95. H. Lo´pez-Garcı´a, Cuantificacio´n de la desigualdad asociada a conjuntos aleatorios y variables aleatorias difusas, PhD Thesis, Universidad de Oviedo, 1997. 96. M. A. Gil, M. Lo´pez-Dı´az, and H. Lo´pez-Garcı´a, The fuzzy hyperbolic inequality index associated with fuzzy random variables, Eur. J. Oper. Res., in press, 1998.
196
FUZZY STATISTICS
97. J. L. Gastwirth, The estimation of a family of measures of income inequality, J. Econometrics, 3: 61–70, 1975. 씮 98. A. Lubiano et al., The -mean square dispersion associated with a fuzzy random variable, Fuzzy Sets Syst., in press, 1998. 99. H. Lo´pez-Garcı´a et al., Estimating the fuzzy inequality associated with a fuzzy random variable in random samplings from finite populations, Kybernetika, in press, 1998. 100. M. Maresˇ, Computation over Fuzzy Quantities, Boca Raton: CRC Press, 1994. 101. M. A. Gil and M. Lo´pez-Dı´az, Fundamentals and bayesian analyses of decision problems with fuzzy-valued utilities, Int. J. Approx. Reason., 15: 203–224, 1996. 102. S. R. Watson, J. J. Weiss, and M. L. Donnell, Fuzzy decision analysis, IEEE Trans. Syst., Man Cybern., 9: 1–9, 1979. 103. A N .S. Freeling, Fuzzy sets and decision analysis, IEEE Trans. Syst. Man Cybern., 10: 341–354, 1980. 104. R. M. Tong and P. P. Bonissone, A linguistic approach to decision-making with fuzzy sets, IEEE Trans. Syst. Man Cybern., 10: 716–723, 1980. 105. D. Dubois and H. Prade, The use of fuzzy numbers in decision analysis, in M. M. Gupta and E. Sanchez (eds.), Fuzzy Information and Decision Processes: 309–321. Amsterdam: North-Holland, 1982. 106. T. Whalen, Decision making under uncertainty with various assumptions about available information, IEEE Trans. Syst. Man Cybern., 14: 888–900, 1984.
122. H. Tanaka and J. Watada, Possibilistic linear systems and their application to the linear regression model, Fuzzy Sets Syst., 27: 275–289, 1988. 123. H. Moskowitz and K. Kwangjae, On assesing the H value in fuzzy linear regression, Fuzzy Sets Syst., 58: 303–327, 1993. 124. A. Ba´rdossy, Note on fuzzy regression, Fuzzy Sets Syst., 37: 65– 75, 1990. 125. P. Diamond, Fuzzy least squares, Inform. Sci., 46: 315–332, 1988. 126. P. Diamond, Higher level fuzzy numbers arising from fuzzy regression models, Fuzzy Sets Syst., 36: 265–275, 1990. 127. A. Ba´rdossy et al., Fuzzy least squares regression and applications to earthquake data, in J. Kacprzyk and M. Fedrizzi (eds.), Fuzzy Regression Analysis: 181–193. Warsaw: Springer-Verlag, 1992. 128. A. Salas, Regresio´n lineal con observaciones difusas, PhD Thesis, Universidad de Oviedo, 1991. 129. C. Bertoluzza, N. Corral, and A. Salas, Polinomial regression in a fuzzy context, the least squares method. Proc. 6th IFSA Congress, 2: 431–434, Sao Paolo, 1995. 130. C. Bertoluzza, N. Corral, and A. Salas, On a new class of distances between fuzzy numbers, Mathware & Soft Comput., 3: 253–263, 1995. 131. R. R. Yager, Fuzzy prediction based on regression models, Inform. Sci., 26: 45–63, 1982.
107. M. A. Gil and P. Jain, Comparison of experiments in statistical decision problems with fuzzy utilities, IEEE Trans. Syst. Man Cybern., 22: 662–670, 1992.
132. B. Heshmaty and A. Kandel, Fuzzy linear regression and its applications to forecasting in uncertain environment, Fuzzy Sets Syst., 15: 159–191, 1985.
108. M. T. Lamata, A model of decision with linguistic knowledge, Mathware & Soft Comput., 1: 253–263, 1994.
133. A. Celminsˇ, Least squares model fitting to fuzzy vector data, Fuzzy Sets Syst., 22: 245–269, 1987.
109. L. M. de Campos and A. Gonza´lez, A subjective approach for ranking fuzzy numbers, Fuzzy Sets Syst., 29: 145–153, 1989.
134. Z. Y. Wang and S. M. Li, Fuzzy linear regression analysis of fuzzy valued variables, Fuzzy Sets Syst., 36: 125–136, 1990.
110. M. A. Gil, M. Lo´pez-Dı´z, and L. J. Rodrı´guez-Mun˜iz, Structure properties of the class of the fuzzy utility function, Proc. EURO XV/INFORMS XXXIV, Barcelona: 1997, p. 128.
135. D. A. Savic and W. Pedrycz, Evaluation of fuzzy linear regression models, Fuzzy Sets Syst., 39: 51–63, 1991.
111. M. A. Gil, M. Lo´pez-Dı´az, and L. J. Rodrı´guez-Mun˜ez, An improvement of a comparison of experiments in statistical decision problems with fuzzy utilities, IEEE Trans. Syst. Man Cybern., in press, 1998. 112. E. H. Ruspini, Numerical methods for fuzzy clustering, Inform. Sci., 2: 319–350, 1970. 113. J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, J. Cybern., 3: 32–57, 1974. 114. J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press, 1987. 115. H. Wang, C. Wang, and G. Wu, Bi-criteria fuzzy c-means analysis, Fuzzy Sets Syst., 64: 311–319, 1994. 116. E. Trauwaert, L. Kaufman, and P. Rousseeuw, Fuzzy clustering algorithms based on the maximum likelihood principle, Fuzzy Sets Syst., 42: 213–227, 1991. 117. M. S. Yang, On a class of fuzzy classification maximum likelihood procedures, Fuzzy Sets Syst., 57: 365–375, 1993. 118. D. Dimitrescu, Hierarchical pattern recognition, Fuzzy Sets Syst., 28: 145–162, 1988. 119. H. Bandemer and S. Gottwald, Fuzzy Sets, Fuzzy Logic, Fuzzy Methods with Applications, New York: Wiley, 1995.
136. D. A. Savic and W. Pedrycz, Fuzzy linear regression models: Construction and evaluation, in J. Kacprzyk and M. Fedrizzi (eds.), Fuzzy Regression Analysis: 91–100, Warsaw: SpringerVerlag, 1992. 137. H. Ishibuchi and H. Tanaka, Fuzzy regression analysis using neural networks, Fuzzy Sets Syst., 50: 257–265, 1992. 138. H. Ishibuchi and H. Tanaka, An architecture of neural networks with interval weight and its applications to fuzzy regression analysis, Fuzzy Sets Syst., 57: 27–39, 1993. 139. S. Guo and S. Chen, An approach to monodic fuzzy regression, in J. Kacprzyk and M. Fedrizzi (eds.), Fuzzy Regression Analysis: 81–90, Warsaw: Springer-Verlag, 1992. 140. R. J. Wonnacott and T. H. Wonnacott, Introductory Statistics, New York: Wiley, 1985. 141. L. A. Zadeh, A fuzzy-algorithmic approach to the definition of complex or imprecise concepts, Int. J. Man-Machine Studies, 8: 249–291, 1976. 142. E. Cox, The Fuzzy Systems Handbook, Cambridge: Academic Press, 1994.
RUDOLF KRUSE Otto-von-Guericke University of Magdeburg
120. H. Tanaka, Fuzzy data analysis by possibilistic linear models, Fuzzy Sets Syst., 24: 263–275, 1987.
JO¨RG GEBHARDT
121. H. Tanaka, S. Uejima, and K. Asai, Linear regression analysis with fuzzy model, IEEE Trans. Syst., Man, Cybern., 12: 903– 907, 1982.
MARI´A ANGELES GIL
University of Braunschweig University of Oviedo
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3503.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Fuzzy Systems Standard Article James J. Buckley1, William Siler2, Thomas Feuring3 1University of Alabama at Birmingham, Birmingham, AL 2Southern Dynamic Systems, Birmingham, AL 3University of Münster, Münster, Germany Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3503 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (143K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Knowledge Acquisition Uncertainty Representation Rule-Based Reasoning Optimization Validation and Implementation Hardware Software About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3503.htm17.06.2008 15:59:30
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
FUZZY SYSTEMS There are many ways of representing knowledge in general; we will consider here only the very basic aspects of knowledge representation in a fuzzy expert system. Most basic is the representation of data. Next is the idea of representing knowledge about reasoning processes, usually represented in fuzzy expert systems by fuzzy production rules which are discussed later in this article. Fuzzy expert systems add two major elements to knowledge representation in nonfuzzy expert systems: (1) the addition of basic data types not found in conventional systems and (2) an expanded rule syntax which facilitates reasoning in terms of words rather than numbers and permits approximate numerical comparisons. We will first consider the representation of data. However, let us first explain the basic differences between fuzzy expert systems and fuzzy control. Both systems have a fuzzy rule base, where fuzzy expert systems are designed to model human experts in the areas of decision making and fuzzy control models human operators in control of a process. In fuzzy control the inputs to the fuzzy rule base are usually real numbers, representing measurements on the process, and the outputs are also real numbers representing how to change certain variables in the process to achieve better performance. In fuzzy expert systems the inputs are real numbers, character strings, or fuzzy sets, representing data on the decision problem, and the outputs are real numbers, character strings, or fuzzy numbers representing possible actions by the decision makers. Fuzzy control answers the question of ‘‘how much’’ to change process variables and fuzzy expert systems answer the question ‘‘what to do?’’ or ‘‘what is it?’’ Since both systems are based on a fuzzy rule base, some techniques useful in fuzzy control (discussed below) are presented because they are also useful in fuzzy expert systems. Basic data types available for individual data items include numbers and character strings. Some nonfuzzy expert system shells include with the basic data types a measure of the confidence that the value for the data is valid. Individual data items are usually collected into named groups. Groups may be simple structures of several data items, lists, frames, or classes. Fuzzy expert systems include additional data types not found in conventional expert systems: discrete fuzzy sets, and fuzzy numbers. A fuzzy set is similar to an ordinary set (a collection of objects drawn from some universe), with one major difference: to each member of the set is attached a grade of membership, a number between zero and one, which represents the degree to which the object is a member of the set. In probability theory, all the probabilities must add to one. However, in fuzzy set theory, all the grades of membership need not add to one. In general, the members of discrete fuzzy sets are words describing some real-world entity. Table 1 contains two examTable 1. Discrete Fuzzy Sets Speed
Fault
Member
Membership Degree
Member
Membership Degree
Stop Slow Medium Fast
0.000 0.012 1.000 0.875
Fuel Ignition Electrical Hydraulic
0.000 0.923 0.824 0.232
Degree of membership
FUZZY SYSTEMS
197
1.0 0.6
0.0
x
1
2 2.4
3
4
Figure 1. A fuzzy set describing uncertainty that a value x is equal to two.
ples of discrete fuzzy sets. Fuzzy set speed describes a numeric quantity. In Table 1 we see that speed could certainly be described as Medium, but could also be described as Fast with almost equal certainty. This represents an ambiguity. Ambiguities such as these need not be resolved, because they add robustness to a fuzzy expert system. Also in Table 1, discrete fuzzy set ‘‘fault,’’ whose members are words describing different possible faults, describes a nonnumeric categorical quantity. We are certain that the fault is not in the fuel system; it is probably in the ignition or electrical systems, although it just might be in the hydraulic system. Since these categories are mutually exclusive, we have not an ambiguity but a contradiction. Unlike ambiguities, contradictions must be resolved before our program is done. Fuzzy numbers, like statistical distributions, represent uncertain numerical quantities. Fuzzy numbers may have any of several shapes; most common are piecewise linear (triangles and trapezodoids), piecewise quadratic (s-shaped), and normal (Gaussian). A typical fuzzy number is shown in Fig. 1. From Fig. 1 we see that the confidence that 2.4 belongs to ‘‘fuzzy 2’’ is 0.6. Fuzzy production rules usually are of the IF . . . THEN . . . types. The IF part of the rule, called the antecedent, specifies conditions which the data must satisfy for the rule to be fireable; when the rule is fired, the actions specified in the THEN part, the consequent, are executed. Some simple fuzzy antecedents follow. IF size is Small AND class is Box, THEN . . . In this antecedent, size and class are discrete fuzzy sets; Small is a member of size and Box is a member of class. IF weight is about 20, THEN . . . Here weight is a scalar number. ‘‘About’’ is an adjective (called a hedge) which modifies the scalar 20, converting the scalar 20 to a fuzzy number. The resulting fuzzy number ‘‘about 20’’ could be triangular (as in Fig. 1) with base on the interval [16, 24] and vertex at 20. The comparison between weight and ‘‘about 20’’ is an approximate one, which can hold with varying degrees of confidence depending on the precise value for weight. IF speed is Fast AND distance is Short, THEN . . . This apparently simple antecedent is a little more complex than it appears, since speed and distance are scalar numbers, and Fast and Short are members of discrete fuzzy sets de-
J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
198
FUZZY SYSTEMS
scribing speed and distance. Fuzzy numbers are used to define Fast and Short. When the truth value of the antecedent is evaluated, the number speed must be converted into a grade of membership of Fast, and similarly for distance and Short. This conversion is called fuzzification, and it requires that fuzzy numbers be assigned to each fuzzy set member. Membership functions for the members of the discrete fuzzy set distance are shown in Fig. 2. This type of rule antecedent is so common in fuzzy control that a shorthand notation has evolved; Short, Medium, and Long are considered simply fuzzy numbers, the clause ‘‘distance is Short’’ is considered a comparison for approximate equality between distance and Short, and the fuzzy set of which Short, Medium, and Long are members may not even be named. In evaluating the truth value of an antecedent, the confidences that the individual clauses are true and that the rule itself is valid are combined. We are using ‘‘truth value’’ and ‘‘confidence’’ to mean the same thing. Confidence is discussed more in the section on uncertainty. The most common way of doing this is to use the Zadeh operators: truth value of A AND B ⫽ min(truth values of A and of B), truth value of A OR B ⫽ max(truth values of A and of B). There are many other combination rules. If the antecedent truth value exceeds an assigned threshold value and the rule is enabled for firing, the rule is said to be fireable. The THEN part of a rule is called the consequent, and it consists of instructions to be executed when the rule is fired. The truth value with which the consequent is asserted is usually the truth value of the antecedent. Consequent actions may be for input or output of data, information to the user, and the like, but the most important consequent actions are those which modify data. Some consequent actions are described below. action: write ‘Hello, World!’ action: read x The above two instructions need no explanation. action: MA =
MA + X T + DT
Degree of membership
Here a new value for MA is computed from values for variables assigned in the rule antecedent. The confidence in the new value for MA will be the antecedent confidence. The system may provide that the old confidence value not be overwritten unless the new value has confidence greater than (or
Short
Medium
Long
1.0
x 0.0
10
20
30
40
Distance Figure 2. Three fuzzy sets describing uncertainty that a distance is Short, Medium or Long.
possibly equal to) the old confidence value. action: class is Desk Here the confidence in member Desk of fuzzy set class will be set to the antecedent confidence, assuming that the new confidence value exceeds any old confidence value. action: z is Small In fuzzy control systems, z would be a scalar number. In fuzzy reasoning systems, z is likely to be a discrete fuzzy set, of which ‘‘small’’ is a member. One or the other syntaxes may be used depending on the particular fuzzy expert system shell being used. The effect of this action is to store the antecedent confidence as the grade of membership of ‘‘small’’. If z is a scalar number and Small is a fuzzy set member with its corresponding fuzzy number, as is the case in fuzzy control, this consequent operation is complex. To begin with, there are almost invariably other rules fired concurrently with other consequents such as z is Medium, z is Large, and so forth; the consequent z is Small cannot be applied except together with the other rule consequents. Now the inverse operation to fuzzification, called defuzzification, must be carried out. There are many ways of defuzzification used by fuzzy control engineers, which are beyond the scope of this article. Two ways of defuzzification are now described. If the fuzzy numbers corresponding to Small, Medium, and Large are singletons (i.e., only a single value has a nonzero grade of membership), defuzzification can be very simple; a weighted average of these values can be taken, using the confidences in Small, Medium, and Large assigned by the rules as weights. If the fuzzy numbers are not singletons, the conclusions of all rules which have fired are then aggregated (unioned) to get the fuzzy conclusion of the whole system; it is then defuzzified to a real number. In control applications, this number may then output to the controller or be used to compute the new control value; in fuzzy reasoning applications, the number may be output to the user or used as input to succeeding reasoning stages. We have only discussed one rule; a fuzzy expert system usually has multiple blocks of rules (rules grouped together to perform a certain job) and a network of blocks of rules. With many blocks of rules the output from certain blocks become input to other blocks of rules. KNOWLEDGE ACQUISITION Knowledge acquisition is the translation of information about a system into fuzzy production rules and expert data bases which are to model the system. Classically, this is done by a domain expert, thoroughly familiar with the application field, and a knowledge engineer, familiar with artificial intelligence techniques and languages. In one of the first fuzzy expert systems (1) the authors generated fuzzy rules like ‘‘If size is Large and vertical position is Low and region touches the border, then class is Lung.’’ These directly represent how an expert classifies regions seen in a medical echocardiogram, with Lung a member of the fuzzy set of region classifications. The fuzzy expert system is to automatically classify the regions. However, the linguistic variables ‘‘Large,’’ ‘‘Low,’’ and
FUZZY SYSTEMS
199
so on, are all defined by fuzzy numbers. Unlike control applications, a well-written fuzzy expert system for classification is usually insensitive to the precise values for the membership functions. An alternative method of rule generation, sometimes useful when there are masses of historical numeric input data, is to automatically generate the fuzzy rules from these data. These procedures may be useful in fuzzy control and also in what has become known as ‘‘Data Mining.’’ Suppose we have data on input and output of a process which are in the form of real numbers or fuzzy numbers. The procedures to automatically generate the rules involve various techniques such as gradient descent methods (2–5), least squares (6,7), genetic algorithms (8,9), fuzzy c-means (10), fuzzy-neural methods (11–14), heuristics (15), and other methods (16). We will briefly discuss two of the procedures to automatically generate the fuzzy rules. Perhaps the simplest method is the heuristic procedure presented in Ref. 15 since it does not require iterative learning. Suppose we have data (xp, yp), 1 ⱕ p ⱕ m, where xp ⫽ (x1p, x2p), on the process to be modeled. The inputs are xp and the outputs are yp. Assume that the x1p, x2p, yp are all in the interval [0, 1]. We wish to construct rules of the form
dence.’’ In fact, the confidence we place in a value is usually itself subject to uncertainty, as are the precise values we define for a membership function. Most fuzzy expert system shells can handle only one level of uncertainty; confidence, grades of membership, and membership functions are usually taken as accurate. While this is not intellectually satisfactory, in practice it poses few if any problems; providing for more than one level of uncertainty would cause a giant increase in system complexity, with little gain achieved. Let us now have a look at the different methods that can be used for uncertainty modeling. We will not discuss the differences between probability and fuzzy set theory. For more details on the ongoing debate on probabilities versus fuzzy sets in uncertainty modeling see Refs. 17 and 18. Whatever method of modeling uncertainty is selected, you will have the problem of deciding how the uncertainties are to be propagated through the system. For systems such as FLOPS, which do not routinely combine fuzzy numbers, uncertainty propagation is not usually a problem. If, however, fuzzy numbers are combined (as in fuzzy control), it may be a problem to be dealt with.
If x1 is A1i and x2 is A2 j , Then y = bi j
Probability theory has been used from the very beginning to handle uncertainty in expert systems. The goal of this approach is to find suitable probability distributions. There are two main and different interpretations of probability. First, probability can be seen as a relative frequency. In this context, probability describes an objective uncertainty. On the other hand, probability can be interpreted as a measure of belief. This way of interpreting probability leads to subjective imprecision. The last interpretation seems to be more suitable in knowledge-based systems. In order to define a probability distribution function, experts are asked about the numerical value of some parameters. By specifying these quantities a suitable subjective probability distribution function can be found. For combining individual uncertainties, Bayes’ theorem has often been employed.
for 1 ⱕ i, j ⱕ K. In the consequent bij is a real number. The A1i (A2j) are triangular fuzzy numbers which partition the interval [0, 1]. Given inputs (x1p, x2p) the output from this block of rules is computed as follows:
K K y=
i=1
j=1
K K i=1
Ai j (x1 p , x2 p )bi j
j=1
Ai j (x1 p , x2 p )
where Aij(x1p, x2p) ⫽ A1i(x1p)A2j(x2p). The A1i(x1p) (and A2i(x2p)) denote the membership value of the fuzzy set A1i (A2i) at x1p (x2p). The bij are defined as
m
p=1 Wi j (x1 p , x2 p )y p
b i j = m
p=1 Wi j (x1 p , x2 p )
where Wij(x1p, x2p) ⫽ (A1i(x1p)A2j(x2p))움, for some 움 ⬎ 0. This simple heuristic appears to work well in the examples presented. Another approach, also with no time-consuming iterative learning procedures, was presented in Ref. 16. They proposed a five-step procedure for generating fuzzy rules by combining both numerical data on the process and linguistic information from domain experts. They argued that the two types of information (numerical/linguistic) alone are usually incomplete. They illustrated their method on the truck backer-upper control problem. UNCERTAINTY REPRESENTATION The term ‘‘uncertainty’’ itself is not well defined, and may have several related definitions. Imprecision in a measurement causes uncertainty as to its accuracy. Vagueness is likely to refer to the uncertainty attached to the precise meaning of descriptive terms such as small or fast. We have lumped all these meanings together under the term ‘‘confi-
Probability Approach
Dempster–Shafer Approach Dempster–Shafer’s theory of evidence is another way of handling uncertainty. It is motivated by Dempster’s theorem (19,20), and it is based on the assumption that a partial belief in a proposition is quantified by a single number in the unit interval. Furthermore, different beliefs can be combined into a new one. However, this method leads to exponential complexity (21), and we refer the reader to Refs. 22, and 23 for more details. Fuzzy Set Approach The use of fuzzy sets for designing knowledge-based systems was suggested by Zadeh (24). His intention was to (a) overcome the problem of vague concepts in knowledge-based systems being insufficiently described by only using zeros and ones, and (b) use fuzzy sets for representing incomplete knowledge tainted with imprecision and uncertainty. Using fuzzy sets, uncertainty can easily be described in a natural way by membership functions which can take values in the whole unit interval [0, 1].
200
FUZZY SYSTEMS
Let us take a closer look at the representation of uncertainty in a knowledge-based system using the fuzzy set approach. Real Numbers. In this approach real numbers, instead of fuzzy sets, are used to describe uncertainty in the data. Let us assume that for the linguistic variable ‘‘human’s height’’ we have five terms: very short, short, medium, tall, and very tall. Each of these terms is defined by a fuzzy number Ai, where A1 defines very short, A2 defines short, and so on. Let x be a number that can be the height of a person. If A1(x) ⫽ 0, A2(x) ⫽ 0.3, A3(x) ⫽ 0.8, A4(x) ⫽ 0.2, and A5(x) ⫽ 0, then we can express x as (0, 0.3, 0.8, 0.2, 0) showing the uncertainty about the height of this person. So, maybe the person’s height is medium. Fuzzy Numbers. Another way of modeling uncertainty is to use fuzzy sets where the membership functions have to be chosen according to an expert’s knowledge. Fuzzy numbers with a limited menu of shapes can easily be used for this task because they can be represented by only three or four values. In order to minimize the amount of parameters, Gaussian fuzzy sets are used. The membership functions represent a Gaussian function which can be coded by a mean value (membership degree one) and a variance value (25). RULE-BASED REASONING We first discuss a very popular method of rule-based reasoning called approximate reasoning. Then we present an alternative procedure used in the fuzzy expert system shell called FLOPS. Consider a block of rules If x is Ai , then y is Bi for 1 ⱕ i ⱕ m. All the fuzzy sets will be fuzzy subsets of the real numbers. One can easily handle more clauses in the rule’s antecedent than simply ‘‘x is Ai,’’ but for simplicity we will only consider the single clause. Given an input x ⫽ A, the rules are executed and we obtain a conclusion y ⫽ B. There are two methods of obtaining y ⫽ B (26): (1) First infer, then aggregate (FITA); and (2) first aggregate, then infer (FATI). Let us first consider FITA. There are three steps in FITA: (1) First model the implication (Ai 씮 (Bi) as a fuzzy relation Ri; (2) combine A with Ri, called the compositional rule of inference, as A 폶 Ri ⫽ B⬘i ; and (3) then aggregate all the B⬘i into B. However, there are a tremendous number of different ways to get B from A. Numerous papers have been published comparing the various methods (see, for example, Refs. 27–34) and guidelines for picking a certain procedure to accomplish specific goals. In fact, 72 methods of modeling the implication have been studied (35). The methods of computing A 폶 Ri and aggregating the B⬘i usually employ t-norms and/or t-conorms (36), and we have plenty of more choices to make for these operators. FATI also has three steps: (1) Model the implication (Ai 씮 (Bi) as a fuzzy relation Ri; (2) aggregate the Ri into one fuzzy relation R; (3) compose A with R to get B as A 폶 R ⫽ B. Again, there are many choices on how to compute the Ri, R, and B from A and R. One solution to this dilemma of choosing the right operators to perform approximate reasoning is to decide on what properties you want your inferencing system to pos-
sess and then choose the operators that will give those properties. For a single rule system ‘‘If x is A, then y is B’’ we say the inferencing is consistent if given input x ⫽ A, then the conclusion is y ⫽ B. That is, A 폶 (A 씮 B) ⫽ B. If all the fuzzy sets are normalized (maximum membership is one), then there are a number of ways to perform approximate reasoning so that the rule is consistent (37). For FITA to be consistent we require B ⫽ Bk if A ⫽ Ak for some k, 1 ⱕ k ⱕ m. FATI is consistent if Ak 폶 R ⫽ Bk. In Ref. 38 the authors give a sufficient condition for FATI to be consistent, and in Ref. 39 the authors argue that, in general, FATI is not consistent, but FITA can be consistent if you use a consistent implication from the single rule case and you also use a special method of aggregating the B⬘i into B. Demanding consistency will greatly narrow down the operators you can use in approximate reasoning. An alternative method of rule-based inferencing is used in FLOPS (discussed below in the section on software). Here the rules are used to construct discrete fuzzy sets and approximate reasoning is not applicable. Initially, numeric input variables are fuzzified to create discrete fuzzy sets. From there on, reasoning can be done with fuzzy set member words such as Large (a member of discrete fuzzy set size) and Left (a member of discrete fuzzy set horizontal position) rather than in terms of numerical values. Consider the following slightly simplified FLOPS rule: IF size is Small AND x-position is Center AND y-position is Very-High, THEN class is Artifact; The confidence that each clause in the antecedent is valid is computed. For clauses such as ‘‘size is Small,’’ the confidence is simply the grade of membership of Small in discrete fuzzy set size. Other clauses might involve comparisons, Boolean or fuzzy, between two data items; in this case, the confidence that the clause is valid is the fuzzy AND (by default, minimum) of the confidences in the data items and the confidence that the comparison holds. The minimum of the confidences in the individual clauses (and the rule confidence, if less than 1) is the confidence that the entire antecedent is valid. This confidence is stored as the grade of membership of Artifact in discrete fuzzy set class. Since there is no fuzzy set in the consequence of the above rule (Artifact is not fuzzy), the implication cannot be modeled as a fuzzy relation and approximate reasoning is not applicable. FLOPS then reasons with confidences in discrete entities in which there is placed a single scalar confidence; many of these are discrete fuzzy sets. OPTIMIZATION Once a fuzzy expert system has been designed, it depends on a large set of parameters such as the weights (confidence values in the rules, etc.), the number of rules, the method of inference, and the position and shape of the fuzzy sets which define the linguistic variables in the rules. The optimization, also called the tuning or calibration, of the fuzzy expert system is a process of determining a best value for these parameters. ‘‘Best’’ is defined as those values of the parameters which maximize, or minimize, some objective functions. At first, researchers suboptimized with tuning some of the parameters while holding the rest fixed. A number of papers (5,40–44)
FUZZY SYSTEMS
presented various techniques to minimize the number of rules needed in the rule base. A basic method was the use of genetic algorithms. Another group of papers (8,9,45–50) was concerned with tuning the membership functions of the fuzzy sets with the use of a genetic algorithm, a popular technique. Gradient decent methods were also employed to tune the fuzzy sets. The next step, possibly using a genetic algorithm, will be to tune the whole fuzzy expert system. One would need to code a whole rule, its weights, and all the fuzzy sets in the rule, as part of one individual in a population of individuals evolving toward the optimal solution. Using binary coding a single rule will produce a fairly long vector of zeros and ones. Add to this vector all the other rules so that an individual in the population is the whole fuzzy expert system. Append to this vector the types of rule inferencing methods you wish to investigate. If we are to have 2000 individuals in the population and we wish to go through 10,000 generations in the genetic algorithm, we see that the computation becomes enormous. Hence researchers have been content to attack only parts of the whole optimization problem. Let us now briefly discuss two of these methods of tuning a fuzzy expert system. In Ref. 42 the authors tune the rules in a fuzzy expert system designed for a classification problem. The problem has two objectives: (1) Maximize the number of correctly classified patterns, and (2) minimize the number of rules. A set of fuzzy if-then rules is coded as one individual in the genetic algorithm. The fitness function for the algorithm is a convex combination of the two objectives (maximize the number of correctly classified patterns and minimize the number of rules). In Ref. 47 the authors optimize the fuzzy sets in a fuzzy expert system used for control. They assume that there is a data set available on the process, and the objective is to minimize the squared error function defined from the input– output data. The fuzzy if-then rules, the method of inference, and the defuzzifier are all held fixed. All the fuzzy sets are trapezoidal fuzzy numbers. A member of the population is a coded vector containing all the trapezoidal fuzzy numbers in the fuzzy expert system. Their tuning method worked well in the application (the inverted pendulum problem) presented where the population size was small, and there were only seven if-then rules in the system.
VALIDATION AND IMPLEMENTATION An expert system is a model of how an expert thinks; like all models, it must be tested before routine use (validation). It is of the utmost importance to use different data sets for tuning and validation. If the entire data set is gathered at one time, it is common to split the data set into two: one for tuning, and the other for validation. You may also employ a domain expert to validate the fuzzy expert system. Suppose the system was designed to classify regions seen in a medical echocardiogram. To validate the system we compare how it classifies regions to how an expert classifies the same regions on a new series of echocardiograms. Once the fuzzy expert system has been validated, it is ready for use. If it is to be used for control, then it will usually run on-line and have to be very fast and is now ready to be implemented in hardware. That is, to get the speed to be used on-line you may need to obtain hardware for the system.
201
Fuzzy expert systems not used for control have not (in the past) usually run on-line and do not need to be very fast. However, with the dramatic increases in computer speed continually occurring, running online in real time is now possible. A tremendous increase in speed can also be realized by hard coding an expert system into (for example) the C or C⫹⫹ languages. A further increase in speed can be achieved by the use of interval rather than fuzzy logic.
HARDWARE For the development of knowledge-based systems, special hardware is seldom needed. For this task a suitable software tool is more important than fast hardware. However, as knowledge-based systems become larger, it is no longer suitable to use a single processor. Therefore, the data collection and the actions of the system should be logically and geographically distributed in order to speed up the computational expense (51). Special fuzzy hardware can overcome this problem. In Ref. 52 the authors present a fuzzy component network which consists of fuzzy sensors, fuzzy actuators, and fuzzy inference components. All these components can be configured. However, a special language is needed for this task. In order to take advantage of special hardware, this fuzzy hardware configuration language has to be integrated into a fuzzy expert system developing tool. Another way is to transform the developed expert system into special hardware. However, this approach seems to be unsuitable because modifications of the system (which seem likely) can lead to needed changes in the hardware. Slowly, as fuzzy expert systems (and fuzzy controllers) were developed and became more sophisticated, special hardware was suggested to implement the various components of these systems. Today there is much more interest in obtaining hardware for fuzzy systems as evidenced by the recent edited book devoted solely to fuzzy hardware (53).
SOFTWARE While there are many excellent software packages available for constructing fuzzy control systems, there are only a few designed for more general fuzzy reasoning applications. Notable for being based in Artificial Intelligence technology are FRIL, a fuzzy superset of PROLOG; FLOPS, a fuzzy superset of OPS5; and Fuzzy CLIPS, a fuzzy superset of CLIPS. While not directly based in AI technology, METUS is highly developed from a computer science viewpoint and has achieved considerable success in noncontrol problems. All these systems are powerful, and they embody facilities which are not possible even to enumerate let alone describe in detail. The descriptions furnished here are certainly incomplete. While all systems are capable of application in diverse fields, the precise facilities furnished depend somewhat on the fields in which they have had the most use. FRIL has probably had the most diverse applications, ranging from aircrew modeling to vision understanding. FLOPS has been applied primarily to medical and technical fields. Fuzzy CLIPS has found its greatest use in engineering, and METUS has been used primarily in the financial world.
202
FUZZY SYSTEMS
FRIL Created originally by James F. Baldwin, FRIL offers the advantages of Prolog plus those of fuzzy systems theory. Prolog has been one of the two dominant AI languages (the other being LISP) and has been especially popular in Europe. FRIL provides the critical data types of discrete fuzzy sets, whose members may be symbols or numbers, and continuous fuzzy sets such as fuzzy numbers. FRIL’s fundamental data structure is a list, each term of which can be a data item or a list. Rules in FRIL reverse the IF (antecedent) THEN (consequent) construction discussed above. The first element of a FRIL rule corresponds to the consequent; the second element corresponds to the antecedent. For example, the rule ((illness of X is flu)(temp of X is high) (strength of X is weak)(throat of X is sore)): (0.9,1) corresponds to the following IF-THEN rule:
rule rconf 0.9 IF temp of X is high AND strength of X is weak AND throat of X is sore, THEN illness of X is flu In the first case, the symbols : (0.9,1) mean that if the antecedent clauses are true, we are at least 0.9 sure that the consequent clause (illness of X is flu) is true. Similarly, in the second case the symbols rconf 0.9 mean that we are 0.9 confident that the rule is valid—that is, that if the antecedent holds, the consequent is true. Being constructed as a superset of a powerful AI language, FRIL in turn can be very powerful, but is not an easy language to learn unless one has previous experience with Prolog. Fortunately there is excellent documentation in the form of a text which includes a demonstration diskette (54). FLOPS FLOPS was created by Douglas Tucker, William Siler, and James J. Buckley (55) to solve a pattern recognition problem involving very noisy images. FLOPS is a fuzzy superset of OPS5, a well-known AI production system for constructing expert systems. FLOPS added fuzzy sets, fuzzy numbers, and approximate numerical comparisons to OPS5’s capabilities. Most FLOPS applications have been medical or technical, as distinct from business or control applications. Two rule-firing modes are offered: sequential and parallel. Suppose (as is often the case) that more than one rule is concurrently fireable. In sequential mode, one rule is selected for firing; the rest are stacked for backtracking—that is, for firing if the path chosen does not work out. In parallel mode, all concurrently fireable rules are fired effectively in parallel; any resulting conflicts for memory modification are then arbitrated by the inference engine. Like OPS5, FLOPS is a forward-chaining system; however, backward chaining is easily emulated. As is usually the case with production systems, there is a lot of system overhead in checking which rules are newly fireable. FLOPS employs the popular RETE algorithm to reduce this overhead. The parallel mode of FLOPS also considerably reduces system overhead, since instead of checking for rule fireability after each
rule is fired we check only after each block of rules is fired. This typically reduces system overhead by roughly a factor of six. A simple basic blackboard system is employed to transfer data between external programs (for example, C⫹⫹) and FLOPS. A standardized simple relational database format is used for this purpose. FLOPS can call external programs for special purposes when a rule-based system is inappropriate, or it can call other FLOPS programs. Internally, FLOPS programs are organized by rule blocks, with metarules to control rule block fireability and activation of external non-AI programs and other FLOPS programs. Recursion can be used for problems where its use is indicated, ranging from the toy problem Tower of Hanoi through solution of ordinary differential equations. A special command is furnished for real-time on-line applications. To reduce the number of rules in an application, FLOPS programs may shift expert knowledge from rules to a database of expert knowledge, with rules written to interpret that database, or to generate rules automatically from the expert knowledge database. Program learning may involve writing rules to generate other rules. A program development environment TFLOPS is furnished for creating FLOPS programs. Debugging facilities include (a) inspection of data and fireable rule stacks and (b) a simple explain facility for tracing data backwards through modification and creation and checking why rules are or are not fireable. Fuzzy Clips Inspired by Robert Lea of NASA and created by the National Research Council of Canada under the direction of Robert Orchard, this language is a fuzzy superset of CLIPS, a nonfuzzy expert system shell developed by the NASA Johnson Space Flight Center. Its availability on the Internet without charge is certainly an added plus (56). A program development environment is furnished which permits editing, running, and viewing programs. While discrete fuzzy sets are not available as data types, the use of certainty factors attached to character strings creates fuzzy facts, which can be used individually in much the same manner as members of discrete fuzzy sets. A simple control rule, written in Fuzzy Clips, is (defrule rule pos pos (error positive)(rate positive) ⇒ (assert (output negative))) where rule_pos_pos is the name of the rule. We require that the error and rate both be positive; if these are true, then the confidence that output is negative is set to the antecedent confidence. A debugging facility is provided by a flexible WATCH command, amounting to a sophisticated trace of a program run. METUS METUS, written by Earl Cox (57,58), is a powerful tool for fuzzy reasoning even though it is not based on an existing AI system. Its use has been primarily in business and financial applications, in a client–server environment. Its origin is in Reveal, a fuzzy expert system by Peter Llewellyn Jones of the United Kingdom. Metus provides both forward and backward
FUZZY SYSTEMS
chaining, a sophisticated blackboard system, and program development facilities. METUS employs a flexible if-then-else rule syntax and is especially notable for its advanced use of hedges, which are modifying adjectives applied to fuzzy sets. Simple METUS rules are:
IF costs are High, THEN margins are Weak; else margins are Strong and if speed is very Fast, then stopping time is Increased. Also provided are time lags for a time sequence of numerical data, as in if sales [t − 1] are Low but inventory [t] is Moderate, then buying risk is Elevated. METUS supplies a number of multivalued logics in addition to the well-known Zadeh max–min operators, along with a number of defuzzification techniques. Metarules permit executing external non-METUS programs, executing other METUS programs (called policies), and enabling rules or lists of rules. BIBLIOGRAPHY 1. J. J. Buckley, W. Siler, and D. Tucker, Fuzzy expert system, Fuzzy Sets Syst., 20: 1–16, 1986. 2. H. Ichihashi and T. Watanabe, Learning control system by a simplified fuzzy reasoning model, Proc. IPUM’90, 1990, pp.417–419. 3. H. Ishibuchi et al., Emperical study on learning in fuzzy systems by Rice test analysis, Fuzzy Sets Syst., 64: 129–144, 1994. 4. J. S. R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Trans. Syst. Man Cybern., 23: 665–685, 1993.
203
16. L. X. Wang and J. M. Mendel, Generating fuzzy rules by learning from examples, IEEE Trans. Syst. Man Cybern., 22: 1414–1427, 1992. 17. D. Dubois and H. Prade, Fuzzy sets—A convenient fiction for modeling vagueness and possibility, IEEE Trans. Fuzzy Syst., 2: 16–21, 1994. 18. G. J. Klir, On the alleged superiority of probabilistic representation of uncertainty, IEEE Trans. Fuzzy Syst., 2: 27–31, 1994. 19. A. P. Dempster, Upper and lower probabilities induced by multivalued mapping, Ann. Math. Stat., 38: 325–339, 1967. 20. G. Shafer, Constructive probability, Synthese, 48: 1–60, 1981. 21. P. Orponen, Dempster’s rule of combination is #P-complete, Artif. Intell., 44: 245–253, 1990. 22. I. R. Goodman and H. T. Nguyen, Uncertainty Models for Knowledge-Based Systems, New York: Elsevier, 1985. 23. R. Kruse, E. Schwecke, and J. Heinsohn, Uncertainty and Vagueness in Knowledge Based Systems, Berlin: Springer-Verlag, 1991. 24. L. Zadeh, A theory of approximate reasoning, in D. Michie, J. Hayes, and L. Mickulich (eds.), Machine Intelligence, Amsterdam: Elsevier, 1979, Vol. 9, pp. 149–194. 25. D. Cayrac, D. Dubois, and H. Prade, Handling uncertainty with possibility theory and fuzzy sets in a satellite fault diagnosis application, IEEE Trans. Fuzzy Syst., 4: 251–269, 1996. 26. I. B. Turksen and Y. Tian, Constraints on membership functions of rules in fuzzy expert systems, Proc. 2nd IEEE Int. Conf. Fuzzy Syst., San Francisco, 1993, pp. 845–850. 27. Z. Cao and A. Kandel, Applicability of some implication operators, Fuzzy Sets Syst., 31: 151–186, 1989. 28. S. Fukami, M. Mizumoto, and K. Tanaka, Some considerations on fuzzy conditional inference, Fuzzy Sets Syst., 4: 243–273, 1980. 29. E. E. Kerre, A comparative study of the behavior of some popular fuzzy implication operators on the generalized modus ponens, in L. A. Zadeh and J. Kacprzyk (eds.), Fuzzy Logic for the Management of Uncertainty, New York: Wiley, 1992, pp. 281–295. 30. R. Martin-Clouaire, Sematics and computation of the generalized modus ponens: The long paper, Int. J. Approximate Reason., 3: 195–217, 1989.
5. H. Nomura, I. Hayashi, and N. Wakami, A learning method of fuzzy inference rules by decent method, Proc. 1st IEEE Int. Conf. Fuzzy Syst., San Diego, 1992, pp. 203–210.
31. M. Mizumoto and H.-J. Zimmermann, Comparison of fuzzy reasoning methods, Fuzzy Sets Syst., 8: 253–283, 1982.
6. M. Sugeno and G. T. Kang, Structure identification of fuzzy model, Fuzzy Sets Syst., 28: 15–33, 1988.
32. M. Mizumoto, Comparison of various fuzzy reasoning methods, Proc. IFSA Congr., 2nd, Tokyo, 1987, pp. 2–7.
7. T. Takagi and M. Sugeno, Fuzzy identification of systems and its application to modeling and control, IEEE Trans. Syst. Man Cybern., 15: 116–132, 1985.
33. D. Park, Z. Cao, and A. Kandel, Investigations on the applicability of fuzzy inference, Fuzzy Sets Syst., 49: 151–169, 1992.
8. C. L. Karr and E. J. Gentry, Fuzzy control of pH using genetic algorithms, IEEE Trans. Fuzzy Syst., 1: 46–53, 1993.
34. C. Romer and A. Kandel, Applicability of fuzzy inference by means of generalized Dempster–Shafer theory, IEEE Trans. Fuzzy Syst., 3: 448–453, 1995.
9. H. Nomura, I. Hayashi, and N. Wakami, A self-tuning method of fuzzy reasoning by genetic algorithm, Proc. Int. Fuzzy Syst. Control Conf., Louisville, KY, 1992, pp. 236–245.
35. J. B. Kiszka et al., The inference of some fuzzy implication operators on the accuracy of a fuzzy model I, II, Fuzzy Sets Syst., 15: 111–128, 223–240, 1985.
10. M. Sugeno and T. Yasukawa, A fuzzy-logic-based approach to qualitative modeling, IEEE Trans. Fuzzy Syst., 1: 7–31, 1993.
36. G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic, Upper Saddle River, NJ: Prentice Hall, 1995.
11. C. M. Higgins and R. M. Goodman, Fuzzy rule-based networks for control, IEEE Trans. Fuzzy Syst., 2: 82–88, 1994.
37. P. Magrez and P. Smets, Fuzzy modus ponens: A new model suitable for applications in knowledge-based systems, Int. J. Intell. Syst., 4: 181–200, 1989.
12. Y. Lin and G. A. Cunningham, A new approach to fuzzy-neural system modeling, IEEE Trans. Fuzzy Syst., 3: 190–198, 1995. 13. M. Russo, Comments on ‘‘A new approach to fuzzy-neural system modeling,’’ IEEE Trans. Fuzzy Syst., 4: 209–210, 1996.
38. A. DiNola, W. Pedrycz, and S. Sessa, An aspect of discrepancy in the implementation of modus ponens in the presence of fuzzy quantities, Int. J. Approximate Reason., 3: 259–265, 1989.
14. H. Takagi and I. Hayashi, NN-driven fuzzy reasoning, Int. J. Approximate Reason., 5: 191–212, 1991.
39. J. J. Buckley and Y. Hayashi, Can approximate reasoning be consistent?, Fuzzy Sets Syst., 65: 13–18, 1994.
15. K. Nozaki, H. Ishibuchi, and H. Tanaka, A simple but powerful heuristic method for generating fuzzy rules from fuzzy data, Fuzzy Sets Syst., 86: 251–270, 1997.
40. F. Guely and P. Siarry, Gradient decent method for optimizing fuzzy rule bases, Proc. IEEE Int. Conf. Fuzzy Syst., 2nd, San Francisco, 1993, pp. 1241–1246.
204
FUZZY SYSTEMS
41. C. C. Hung and B. R. Fernandez, Minimizing rules of fuzzy logic system by using a systematic approach, Proc. 2nd IEEE Int. Conf. Fuzzy Syst., San Francisco, 1993, pp. 38–44. 42. H. Ishibuchi et al., Selecting fuzzy if-then rules for classification problems using genetic algorithms, IEEE Trans. Fuzzy Syst., 3: 260–270, 1995. 43. R. Rovatti and R. Guerrieri, Fuzzy sets of rules for system identification, IEEE Trans. Fuzzy Syst., 4: 89–102, 1996. 44. R. Rovatti, R. Guerrieri, and G. Baccarani, An enhanced two-level Boolean synthesis methodology for fuzzy rules minimization, IEEE Trans. Fuzzy Syst., 3: 288–299, 1995. 45. S. Abe, M.-S. Lan, and R. Thawonmas, Tuning of a fuzzy classifier derived from data, Int. J. Approximate Reason., 14: 1–24, 1996. 46. H. Bersini, J. Nordvik, and A. Bornari, A simple direct fuzzy controller derived from its neural equivalent, Proc. 2nd IEEE Int. Conf. Fuzzy Syst., San Francisco, 1993, pp. 345–350. 47. F. Herrera, M. Lozano, and J. L. Verdegay, Tuning fuzzy logic controllers by genetic algorithm, Int. J. Approximate Reason., 12: 299–315, 1995. 48. C. L. Karr, Design of an adaptive fuzzy logic controller using a genetic algorithm, Proc. 4th Int. Conf. Genet. Algorithms, San Diego, 1991, pp. 450–457. 49. C. Perneel et al., Optimization of fuzzy expert systems using genetic algorithms and neural networks, IEEE Trans. Fuzzy System, 3: 300–312, 1995.
50. P. Thrift, Fuzzy logic synthesis with genetic algorithms, Proc. 4th Int. Conf. Genet. Algorithms, San Diego, 1991, pp. 509–513. 51. S. S. Iyengar and R. J. Kashyap, Distributed sensor networks: Introduction to the special section, IEEE Trans. Syst. Man Cybern., 21: 1027–1031, 1991. 52. J.-F. Josserand and L. Foulloy, Fuzzy component network for intelligent measurement and control, IEEE Trans. Fuzzy Syst., 4: 476–487, 1996. 53. A. Kandel and G. Langholz (eds.), Fuzzy Hardware Architectures and Applications, Boston: Kluwer Academic Publishers, 1998. 54. J. F. Baldwin, T. P. Martin, and B. Pilsworth, Fril: Fuzzy and Evidential Reasoning, New York: Wiley, 1995. 55. W. Siler, [Online]. Available www: FLOPS: http://users.aol.com/ wsiler 56. Fuzzy Clips: http://ai.iit.nrc.ca/fuzzy 57. E. Cox, Fuzzy Systems Handbook, San Diego: Academic Press, 1994. 58. E. Cox, [Online]. Available www: METUS: http://www.metus.com
JAMES J. BUCKLEY University of Alabama at Birmingham
WILLIAM SILER Southern Dynamic Systems
THOMAS FEURING University of Mu¨nster
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20ELECTRICA...LECTRONICS%20ENGINEERING/24.%20fuzzy%20systems/W3502.htm
}{{}}
●
HOME ●
ABOUT US ●
CONTACT US ●
HELP
Home / Engineering / Electrical and Electronics Engineering
Wiley Encyclopedia of Electrical and Electronics Engineering Possibility Theory Standard Article Didier Dubois1 and Henri Prade1 1Paul Sabatier University, Toulouse, France Copyright © 1999 by John Wiley & Sons, Inc. All rights reserved. DOI: 10.1002/047134608X.W3502 Article Online Posting Date: December 27, 1999 Abstract | Full Text: HTML PDF (174K)
●
●
● ●
Recommend to Your Librarian Save title to My Profile Email this page Print this page
Browse this title ●
Abstract The sections in this article are Fundamentals of Possibility Theory Possibility Theory Versus other Uncertainty Frameworks Approximate Reasoning Concluding Remarks About Wiley InterScience | About Wiley | Privacy | Terms & Conditions Copyright © 1999-2008John Wiley & Sons, Inc. All Rights Reserved.
file:///N|/000000/0WILEY%20ENCYCLOPEDIA%20OF%20EL...ICS%20ENGINEERING/24.%20fuzzy%20systems/W3502.htm17.06.2008 15:59:44
Search this title Enter words or phrases ❍
❍ ❍
Advanced Product Search Search All Content Acronym Finder
POSSIBILITY THEORY
551
POSSIBILITY THEORY In common use, the word possibility conveys two meanings. One is physical and refers to the idea of feasibility. Then, possible means achievable, as in the sentence ‘‘it is possible for Hans to eat six eggs for breakfast.’’ The other is epistemic and refers to the idea of plausibility. There, possible means logically consistent with (i.e., not contradicting) the available information, as in the sentence ‘‘it is possible that it will rain tomorrow.’’ These two meanings correspond to the difference between realizability and plausibility and are mostly unrelated. Moreover, the idea of attainment often occurs together with the idea of preference: considering mutually exclusive alternatives, the most feasible one (in some sense) is usually preferred. Then we may view preference as subjective feasibility (physical realizability corresponds clearly to objective feasibility). The epistemic understanding of possible as plausible that was just mentioned is subjective and goes along with the idea that something is possible insofar as it is not surprising (because consistent) with respect to what is known. An interpretation in terms of objective plausibility can be also encountered when possibility refers to (upper bounds of) frequencies, as we shall see. Although many people consider, mainly by habit, that possibility is always a binary notion (things are possible or are not possible), it makes sense, both with the feasibility and the plausibility interpretations, to consider that possibility may be a matter of degree and that one thing may be estimated or perceived as being more possible than another. Possibility theory was coined by L. A. Zadeh (1) in the late seventies as an approach to modeling flexible restrictions on the value of variables of interest constructed from linguistic pieces of information, described by fuzzy sets and representing the available knowledge. This approach offers a graded modeling of the idea of possibility. Physical possibility has been advocated by Zadeh (1) to justify the axiomatic rule of possibility measures, expressing that the possibility of A OR B should be equated to the maximum of the possibility of A and of the possibility of B (because the degree of ease of some action that produces A OR B is given by the easiest of two actions which produce A and B, respectively). However, the intended use of possibility theory as a nonclassical theory of uncertainty, different from probability theory, and its application to approximate reasoning advocated by Zadeh rather agrees with the epistemic interpretation. In fact, as we advocate, possibility theory can be especially useful for modeling plausibility and preference. J. Webster (ed.), Wiley Encyclopedia of Electrical and Electronics Engineering. Copyright # 1999 John Wiley & Sons, Inc.
552
POSSIBILITY THEORY
This brief introduction is organized in three main parts. The basic elements of the theory are given first. Then the relationships and differences with the other uncertainty frameworks are discussed, and lastly applications to approximate reasoning are presented. FUNDAMENTALS OF POSSIBILITY THEORY Possibility Distribution When a fuzzy set is used to express an incomplete piece of information E about the value of a single-valued variable, the degree attached to a value expresses the level of possibility that this value is indeed the value of the variable. This is what happens if the available information is couched in words, for example, ‘‘Tom is young.’’ Here the fuzzy set young represents the set of possible values of the variable x ⫽ ‘‘age of Tom.’’ The fuzzy set E is then interpreted as a possibility distribution (1), which expresses the levels of plausibility of the possible values of the ill-known variable x. If, rather, we are stating a requirement under the form of a flexible constraint, for example, we are looking for a young person, then the possibility distribution would represent our preference profile on the age of the person to be recruited. If the only available knowledge about x is that ‘‘x lies in E’’ where E 債 U, then the possibility distribution of x is defined by Eq. (1) πx (u) = µE (u), ∀ u ∈ U
(1)
where E (with membership function 애E) is interpreted as the fuzzy set of (more or less) possible values of x and where 앟x ranges on [0,1]. More generally, the range of a possibility distribution can be any bounded linearly ordered scale (which may be discrete with a finite number of levels). Fuzzy sets, viewed as possibility distributions, act as flexible constraints on the values of variables referred to in natural language sentences. Equation (1) represents a statement of the form ‘‘x lies in E’’ or more informally ‘‘x is E.’’ It does not mean, however, that possibility distributions are the same as membership functions. Equation (1) is an assignment statement because it means that given that the only available knowledge is ‘‘x lies in E,’’ the degree of possibility that x ⫽ u is evaluated by the degree of membership 애E(u). Note that distinct values may simultaneously have a degree of possibility equal to 1. 앟x(u) ⫽ 0 means that u is completely impossible as a value for x. If two possibility distributions 앟x and 앟⬘x pertaining to the same variable x, are such that 앟x ⬍ 앟⬘x, 앟x is said to be more specific than 앟⬘x in the sense that no value u is considered less possible for x according to 앟⬘x than to 앟x. This concept of specificity whose importance has been first stressed by Yager (2) underlies the idea that any possibility distribution 앟x is provisional in nature and likely to be improved by further information, when the available information is not complete. When 앟x ⬍ 앟⬘x, the information 앟⬘x is redundant and can be dropped. Numerical measures of (non) specificity have been introduced (see Refs. 3 and 4). In possibility theory, specificity plays a role similar to entropy in probability theory. When the available information stems from several sources that are considered reliable, the possibility distribution that accounts for it is the least specific possibility distribution to satisfy the set of constraints induced by the pieces
of information given by the different sources. This is the principle of minimum specificity. Particularly, it means that given a statement ‘‘x is E,’’ then any possibility distribution 앟 such that ᭙u, 앟(u) ⱕ 애E(u) is in accordance with ‘‘x is E.’’ However, choosing a particular 앟, such that ᭙u, 앟(u) ⱕ 애E(u), to represent our knowledge about x, would be arbitrarily too precise. Hence Eq. (1) is naturally adopted if ‘‘x is E’’ is the only available knowledge, and already embodies the principle of minimum specificity. Possibility and Necessity Measures The extent to which the information ‘‘x is E’’ is consistent with a statement like ‘‘the value of X is in subset A’’ is estimated by means of the possibility measure ⌸, defined from ᭙u, 앟x(u) ⫽ 애E(u), by (1) (A) = supu∈A πx (u)
(2)
where A is a classical subset of U. The value of ⌸(A) corresponds to the element(s) of A having the greatest possibility degree according to 앟x. In the finite case, ‘sup’ can be changed into ‘max’ in Eq. (2). ⌸(A) ⫽ 0 means that x 僆 A is impossible knowing that ‘‘x is E.’’ ⌸(A) estimates the consistency of the statement ‘x 僆 A’ with what we know about the possible values of x, as emphasized in Refs. 5 and 7. It corresponds to the epistemic view of possibility. Indeed, if 앟x models a nonfuzzy piece of incomplete information represented by an ordinary subset E, Eq. (2) reduces to
E (A) = 1 if A ∩ E = ? = 0 otherwise
(x ∈ A and x ∈ E are consistent) (A and E are mutually exclusive) (3)
Any possibility measure ⌸ satisfies the following max-decomposability characteristic property (A ∪ B) = max[(A), (B)]
(4)
When U is not finite, the axiom in Eq. (4) is replaced by ⌸(傼i僆IAi) ⫽ supi僆I ⌸(Ai) for any index set I. Among the features of possibility measures that contrast with probability measures, let us point out the weak relationship between the possibility of an event A and that of its complement A (‘not A’). Either A or A must be possible, that is, max[⌸(A), ⌸(A)] ⫽ 1 due to A 傼 A ⫽ U and ⌸(U) ⫽ 1 (normalization of ⌸). In the case of total ignorance, both A and A are fully possible: ⌸(A) ⫽ 1 ⫺ ⌸(A). Note that this leads to a representation of ignorance (E ⫽ U and ᭙ A ⬆ 0 兾, ⌸E(A) ⫽ 1) which presupposes nothing about the number of elements in the reference set U (elementary events), whereas the latter aspect plays a crucial role in probabilistic modeling. The case when ⌸(A) ⫽ 1, ⌸(A) ⬎ 0 corresponds to partial ignorance about A. Besides, ⌸() ⫽ 0 is a natural convention since ⌸(A) ⫽ ⌸(A 傼 ) ⫽ max[⌸(A), ⌸()] entails ᭙A, ⌸(A) ⱖ ⌸(). Note that we have only ⌸(A 傽 B) ⱕ min [⌸(A), ⌸(B)]. It agrees with the fact that in the case of total ignorance about A, for B ⫽ A, ⌸(A 傽 B) ⫽ 0 whereas ⌸(A) ⫽ ⌸(A) ⫽ 1. The weak relationship between ⌸(A) and ⌸(A) forces us to consider both quantities to describe uncertainty about the occurrence of A. ⌸(A) tells us about the possibility of ‘not A’, hence about the certainty (or necessity) of the occurrence of A since, when ‘not A’ is impossible, then A is certain. Thus it is
POSSIBILITY THEORY
natural to use this duality and define the degree of necessity of A [Dubois and Prade (8), Zadeh (9)] as N(A) = 1 − (A) = infu∈A / 1 − πx (u)
(5)
The duality relationship Eq. (5) between ⌸ and N expresses that A is all the more certain as A is less consistent with the available knowledge. In other words, A is all the more necessarily true as ‘not A’ is more impossible. This is a gradual version of the duality between possibility and necessity in modal logic. When 앟x models a nonfuzzy piece of incomplete information represented by an ordinary subset E, Eq. (5) reduces to NE (A) = 1 if E ⊆ A
(information x ∈ E logically entails x ∈ A)
= 0 otherwise
(6)
In the case of complete knowledge, that is E ⫽ 兵u0其 for some u0, ⌸E(A) ⫽ NE(A) ⫽ 1 if and only if u0 僆 A. Complete ignorance corresponds to E ⫽ U, and then ᭙ A ⬆ , ⌸E(A) ⫽ 1 and ᭙ A ⬆ U, NE(A) ⫽ 0 (everything is possible and nothing is certain). In the general case, N(A) estimates to what extent each value outside A, that is, in the complement A of A, has a low degree of possibility. Thus the values with the higher degrees of possibility should be included among the elements of A, which makes us somewhat certain that, indeed, x belongs to A. It is easy to verify that N(A) ⬎ 0 implies ⌸(A) ⫽ 1, that is an event is completely possible (completely consistent with what is known) before being somewhat certain. This property ensures the natural inequality ⌸(A) ⱖ N(A). N(A) ⬎ 0 corresponds to the idea of (provisionally) accepting A as a belief. The above definition of N from ⌸ makes sense only if ⌸, and thus 앟x, are normalized, that is, supu僆U 앟x(u) ⫽ 1. It means that U is fully consistent with the available knowledge, meaning that this knowledge itself is consistent if U is exhaustive as a referential. Necessity measures satisfy an axiom dual of Eq. (4), namely, N(A ∩ B) = min[N(A), N(B)]
(7)
It expresses that the conjoint event ‘A and B’ is all the more certain as A is certain and B is certain. This is the characteristic axiom of necessity measures. Mind that we have only N(A 傼 B) ⱖ max[N(A), N(B)]. Indeed, we may be somewhat certain that x lies in A 傼 B without knowing at all if x is in A or is rather in B (at least for A 傼 B ⫽ U, N(U) ⫽ 1).
553
where I is an implication operation of the form I(a, b) ⫽ 1 ⫺ [a ⴱ (1 ⫺ b)], which reduces to Eq. (5) when F ⫽ A is nonfuzzy. This choice preserves the identity N(F) ⫽ 1 ⫺ ⌸(F) for the usual fuzzy set complementation [᭙u, 애F(u) ⫽ 1 ⫺ 애F(u)]. Equation (8) has been proposed by (1) with ⴱ ⫽ min. Taking ⴱ ⫽ min, the characteristic properties Eqs. (4) and (7) of possibility and necessity measures with respect to disjunction and conjunction still hold. ⌸(F) ⱖ N(F), ᭙F still holds (if 앟x is normalized). The possibility of a fuzzy set (with ⴱ ⫽ min) is a remarkable example of a Sugeno integral (10) since (F ) = supα∈(0,1] min[α, (Fα )]
(10)
where F움 ⫽ 兵u, 애F(u) ⱖ 움其. The same formula holds (6,11) for the necessity measure N in place of ⌸. It points out that these definitions are compatible with the 움-cut view of a fuzzy set. Other Set Functions—Certainty and Possibility Qualification Apart from ⌸ and N, two other set functions can be defined using sup or inf, namely, a measure of ‘‘guaranteed possibility’’ (12): (A) = infu∈A πx (u)
(11)
which estimates to what extent all the values in A are actually possible for x according to what is known, that is, each value in A is at least possible for x at the degree ⌬(A). Clearly ⌬ is a stronger measure than ⌸, that is, ⌬ ⱕ ⌸, since ⌸ estimates only the existence of at least one value in A compatible with the available knowledge, whereas the evaluation provided by ⌬ concerns all the values in A. Note also that ⌬ and N are unrelated. A dual measure of potential certainty, ⵜ(A) ⫽ 1 ⫺ ⌬(A), estimates to what extent there exists at least one value in the complement of A which has a low degree of possibility. ⌬ and ⵜ are monotonically decreasing set functions (in the wide sense) with respect to set inclusion, for example ⌬(A 傼 B) ⫽ min[⌬(A), ⌬(B)]. It contrasts with ⌸ and N which are monotonically increasing. ⌬ agrees with the idea of explicit permission: if A or B is permitted, both A and B are permitted. The set function ⌬ plays an important role in the representation of possibility-qualified statements, as we are going to see. A possibility distribution can be implicitly specified through the qualification of ordinary, or fuzzy, subsets of the referential U, in terms of either certainty or possibility. Let A be an ordinary subset of U (with characteristic function 애A), namely,
Fuzzy Events Possibility and necessity measures naturally extend to fuzzy events. Using the ideas of consistency and entailment as the basis for possibility and necessity, the following extensions are obtained. If F is fuzzy set with membership function 애F, (F ) = supu µF (u) ∗ πx (u)
(8)
where ⴱ is a monotonic conjunctive-like operation (such that 1 ⴱ t ⫽ t and 0 ⴱ t ⫽ 0 to recover Eq. (2) when F ⫽ A is nonfuzzy), and N(F ) = infu I[πx (u), µF (u)]
(9)
1. The statement ‘‘it is certain at least to the degree 움 that the value of x is in A’’ will be interpreted as ‘‘any value outside A is at most possible at the complementary degree, namely 1 ⫺ 움,’’ that is, ᭙u 僆 A, 앟x(u) ⱕ 1 ⫺ 움, which leads to the following (7): “A is α-certain for x” is translated by ∀ u ∈ U, πx (u) ≤ max[µA (u), 1 − α]
(12)
It can be verified that this is equivalent to N(A) ⱖ 움. Note that for 움 ⫽ 1, the inequality ᭙u, 앟x(u) ⱕ 애A(u) is recovered. When 움 decreases from 1 to 0, our knowledge
554
POSSIBILITY THEORY
evolves from complete certainty in A to acknowledged ignorance about x. 2. The statement ‘‘A is a possible range for x at least at the degree 움’’ will be understood as ᭙u 僆 A, 앟x(u) ⱖ 움, which leads to the following: “A is α-possible for x” is translated by ∀ u ∈ U, min[µA (u), α] ≤ πx (u)
(13)
It can be verified that this is equivalent to ⌬(A) ⱖ 움. Note that for 움 ⫽ 1, Eq. (13) reduces to ᭙u, 앟x(u) ⱖ 애A(u). When 움 decreases from 1 to 0, our knowledge evolves from the certainty that A is the minimal range of our ignorance to a total lack of information whatsoever. The representation of possibility-qualified fuzzy statements by Eqs. (12) and (13), respectively, can be still justified when A becomes a fuzzy set; see (12).
simultaneously. Thus it expresses that B is believed in context A independently of C. Letting A ⫽ U, it can be verified that this notion is stronger than the condition ⌸(B 傽 C) ⫽ min[⌸(B), ⌸(C)], which expresses a form of unrelatedness; see Ref. 17. The previous definitions require only a purely ordinal view of possibility theory. In the case of a numerical scale, we can also use the product instead of min in the conditioning Eq. (14). It leads to ∀B, B ∩ A =
∩ B) ?, (B|A) = (A (A)
(16)
provided that ⌸(A) ⬆ 0. This is the Dempster rule of conditioning specialized to possibility measures, that is, consonant plausibility measures of Shafer (18). POSSIBILITY THEORY VERSUS OTHER UNCERTAINTY FRAMEWORKS
Qualitative Versus Quantitative Possibility Theory—Conditioning Besides, an ordinal uncertainty scale is sufficient for defining possibility (and necessity) measures, because possibility theory uses only max, min, and the order-reversing operation of the scale (1 ⫺ ( ⭈ ) on [0, 1]). This agrees with a rather qualitative view of uncertainty. Indeed it has been shown that possibility measures are the unique numerical counterparts of qualitative possibility relationships, defined as nontrivial, reflexive, complete, transitive binary relationships ‘‘B A,’’ expressing that an event B is at least as possible as another event A and satisfying the characteristic requirement ᭙A, B, C, B A ⇒ B 傼 C A 傼 C (13). Such relationships were introduced by Lewis in Ref. 14. Possibility theory can be interpreted either as a model of ordinal uncertainty based on a linear ordering (where statements can be ranked only according to their levels of possibility and their level of necessity in a given scale), or as a numerical model of uncertainty which can then be related to probability theory and other uncertainty frameworks. This distinction affects how conditioning is defined. Conditioning in possibility theory can be defined similarly to that in probability theory, namely, through an equation of the form ⌸(A 傽 B) ⫽ ⌸(B兩A) ⴱ ⌸(A). Clearly, the choice for ⴱ should be compatible with the nature of the scale which is used. A possible choice for ⴱ, in agreement with an ordinal scale, is min (15), namely,
Relationships with Probabilities and Belief Functions
(15)
Formally speaking, possibility measures clearly depart from probability measures in various respects. The former are max-decomposable for the disjunction of events, whereas the latter are additive (for mutually exclusive events). Dually, necessity measures are min-decomposable for conjunction. However, possibility (respectively, necessity) measures are not compositional for conjunction (respectively, disjunction) or for negation (whereas probabilities are compositional only for negation). In possibility theory, the representation of the uncertainty of A requires two weakly related numbers, namely, ⌸(A) and N(A) ⫽ 1 ⫺ ⌸(A), which contrasts with probabilities. Thus a distinction can be made between the impossibility of A(⌸(A) ⫽ 0 ⇔ N(A) ⫽ 1) and the total lack of certainty about A(N(A) ⫽ 0), which is entailed by (but not equivalent to) the impossibility of A (whereas in probability theory, Prob(A) ⫽ 1 ⇔ Prob(A) ⫽ 0). Despite the difference between probability and possibility, it is noteworthy that possibility measures can be given a purely frequentist interpretation (19,20). Given any statistical experiment with outcomes in U, assume that a precise observation of outcomes is out of reach for some reason. For instance, U is a set of candidates to an election, and the experiment is an opinion poll where individuals have not yet made up their minds completely and are allowed to express them by proposing a subset of candidates, containing their future choice. Let F 債 2U be the set of observed responses, and m(E), E 僆 F , be the proportion of responses of the form of a crisp set E(兺E僆F m(E) ⫽ 1, m(Ø) ⫽ 0). (F , m) defines a basic probability assignment in the sense of Shafer (18). It can be also viewed as a random set. In that case P∗ (A) = E∈F E (A) · m(E) = A∩E=? m(E) (17)
The conditional necessity function is defined by N(B兩A) ⫽ 1 ⫺ ⌸(B兩A), by duality. Note that N(B兩A) ⬎ 0 ⇔ ⌸(A 傽 B) ⬎ ⌸(A 傽 B), which expresses that B is an accepted belief in the context A if and only if B is more plausible than B when A is true. A notion of qualitative independence has been recently introduced (16), namely, N(B兩A) ⬎ 0 and N(B兩A 傽 C) ⬎ 0 hold
is the expected possibility of A in the sense of logical possibility. Formally, P*(A) is mathematically identical to an upper probability in the sense of Dempster (21) or to a plausibility function in the sense of Shafer (18), and P*(A) ⫽ 1 ⫺ P*(A) ⫽ 兺E僆F NE(A) ⭈ m(E) ⫽ 兺E債A m(E) has the same property as a Shafer belief function. To recover the max-decomposability (1) for P*, it is necessary and sufficient to assume that F defines
∀B, B ∩ A =
?, (A ∩ B) = min (B|A), (A))
(14)
This equation has more than one solution. Dubois and Prade (7) proposed selecting the least specific one, that is (for ⌸(A)⬎ 0), (B|A) = 1 if (A ∩ B) = (A) = (A ∩ B) otherwise
POSSIBILITY THEORY
a nested sequence of sets (18). Then P* is a possibility measure. Moreover, the guaranteed possibility function is a special case of the Shafer commonality function. Hence, possibility measures correspond to imprecise but coherent (due to the nestedness property), statistical evidence, that is, an ideal situation opposite to the case of probability measures (ideal, too) where outcomes form a partition of ⍀. More generally, necessity (and possibility) measures provide inner and outer approximation of belief (and plausibility) functions; see (22). Possibility measures can be also viewed as a particular type of upper probability system. An upper probability P* induced by a set of lower bounds 兵P(Ai) ⱖ 움i, i ⫽ 1, . . ., n其 (i.e., P*(B) ⫽ sup兵P(B)兩P 僆 P 其 where P ⫽ 兵P兩P(Ai) ⱖ 움i, i ⫽ 1, . . ., n其), is a possibility measure if the set 兵A1, . . ., An其 is nested, that is, after a suitable renumbering A1 債 A2 債 ⭈ ⭈ ⭈ 債 An. Conversely, any possibility measure on a finite set can be induced by such a set of lower bounds with nested Ai’s; see (23). This view leads to a definition of conditioning different from Eqs. (15) or (16); see (24). The problem of transforming a possibility distribution into a probability distribution and conversely is meaningful in the scope of uncertainty combination with heterogeneous sources (some supplying statistical data, other linguistic data, for instance). However, raising the issue means that some consistency exists between possibilistic and probabilistic representations of uncertainty. The basic question is whether this is a mere matter of translation between languages ‘‘neither of which is weaker or stronger than the other’’ (25). Adopting this assumption leads to transformations that respect a principle of uncertainty and information invariance. However, if we accept the fact that possibility distributions are weaker representations of uncertainty than probability distributions, the transformation problem must be stated otherwise. Going from possibility to probability increases the informational content of the considered representation, whereas going the other way around means an informational loss. Hence the principles behind the two transformations are different, and asymmetric transformations are obtained (26): From possibility to probability, a generalized Laplacean indifference principle is adopted. From probability to possibility, the rationale is to preserve as much information as possible which leads to selecting the most specific upper approximation of the probability. Taking advantage of the inequalities inf兵P(A兩b), b 僆 B其 ⱕ P(A兩B) ⱕ sup兵P(A兩b), b 僆 B其, it is possible to see a possibility measure (respectively, a ⌬ function) as the upper (respectively, lower) envelope of a family of likelihood functions (27). Several authors have previously suggested viewing likelihood functions as possibility distributions [e.g., (28,29)]. Lastly, Spohn (30) has proposed a theory of epistemic states with strong similarities to possibility theory, with which it shares the idea of ordering between possible worlds. What he calls an ordinal conditional function is a mapping from a finite set of events to the set of positive integers, denoted such that (A 傼 B) ⫽ min[(A), (B)]. (A) expresses a degree of disbelief in A and grows as A becomes less plausible. Moreover, there is an elementary event 兵u其 such that (兵u其) ⫽ 0. It is easy to check that for any real number c ⬎ 1, 1 ⫺ c⫺(A) is a degree of necessity [see (31)]. A probabilistic interpretation of (A) has been suggested by Spohn (30). (A) ⫽ n is interpreted as a small probability of the form ⑀n, that is, the probability of a rare event. Indeed, if A has a
555
small probability with order of magnitude ⑀n and B has also a small probability of the form ⑀m, then P(A 傼 B) is of the order of magnitude of ⑀min(m,n). These remarks may lead to an interpretation of possibility and necessity measures in terms of probabilities of rare events. Rough Sets Rough set theory (32) captures the idea of indiscernibility. Indiscernibility means the lack of discriminatory power between elements in a set. At a very primitive level, this aspect can be captured by an equivalence relationship R on the set U, such that u R u⬘ means that u and u⬘ cannot be told apart. Then R induces a partition on U, made of the elements of the quotient space U/R. As a consequence, any subset A of U can be described only by means of clusters in U given here by the equivalence classes [u] of R, namely,
the lower image of A : A∗ = {[u]|[u] ⊆ A} the upper image of A : A∗ = {[u]|A ∩ [u] =
?}
(18)
This aspect has been studied by Pawlak (32) under the name ‘‘rough set’’ and is also studied by Shafer (18) when he considers coarsening, refinements, and compatibility relationships between frames. Although possibility and fuzzy set theory are not directly concerned with indistinguishability, Aⴱ and A* can easily be interpreted in terms of necessity and possibility: [u] 僆 Aⴱ ⇔ N[u](A) ⫽ 1, and [u] 僆 A* ⇔ ⌸[u](A) ⫽ 1, where E ⫽ [u] (using the notations of Eqs. (3) and (6)). [u] belongs to Aⴱ (respectively, A*) if and only if it is certain (respectively, possible) that any (respectively, some) element close to u (in the sense of R) belongs to A. It is possible to extend the rough set framework with (fuzzy) similarity relationships or fuzzy partitions (33). Indiscernibility, which is also linked to Poincare´’s paradox of mathematical continuum, is clearly an important issue in knowledge representation, where information appears in a granular form whereas partial belief is measured on a continuous scale. This question is clearly orthogonal to that of modeling partial belief, because it affects the definition and the structure of frames of discernment. The coarsening of the referential U into the quotient space U/R (where R is a classical equivalence relationship) thus induces lower and upper approximations 앟* and 앟* for a possibility distribution 앟, defined by ∀ ω ∈ U/R, π ∗(ω) = infu∈ω π (u) π ∗(ω) = supu∈ω π (u)
(19)
Here what is fuzzified is the subset A to be approximated, agreeing with Eq. (18). Indeed letting 애F(u) ⫽ 앟(u), Eq. (19) corresponds to the necessity and the possibility of the fuzzy event F based on the nonfuzzy possibility distribution 애웆. These lower and upper approximations can be generalized by using fuzzy equivalence classes based on similarity relationships instead of the crisp equivalence classes 애웆. See Ref. 33. APPROXIMATE REASONING The theory of approximate reasoning, whose basic principles have been formulated by Zadeh (34) can be viewed as a direct
556
POSSIBILITY THEORY
application of possibility theory. Indeed, it is essentially a methodology for representing fuzzy and incomplete information in terms of unary or joint possibility distributions and inferring the values of variables of interest by applying the rules of possibility theory. What Zadeh’s representation and approximate reasoning theory mainly provides is a powerful tool for interfacing symbolic knowledge and numerical variables that has proved very useful in applications where qualitative knowledge pertains to numerical quantities (e.g., fuzzy logic controllers where precise input values are matched against the fuzzy conditions of rules). Neither classical rulebased systems nor classical logic are fully adapted to properly handling the interface between numbers and symbols without resorting to arbitrary thresholds for describing predicate extensions. Joint Possibility Distributions Let x and y be two variables (on U and V, respectively) linked together through a fuzzy restriction on the Cartesian product U ⫻ V, encoded by a possibility distribution 앟x,y, called a joint possibility distribution (the following can be easily extended to more than two variables). The unary possibility distribution 앟x, representing the induced restriction on the possible values of x, can be calculated as the projection of 앟x,y on U defined in Refs. 34 and 35 by πx (u) = ({u} × V ) = supv πx,y (u, v)
(20)
Generally, 앟x,y ⱕ min(앟x, 앟y). When equality holds, 앟x,y is then said to be min-separable, and the variables are said to be noninteractive (35). It is in accordance with the principle of minimal specificity, since 앟x(u) is calculated from the highest possibility value of pairs (x, y) where x ⫽ u. When modeling incomplete information, noninteractivity expresses a lack of knowledge about the links between x and y. If we start with the two pieces of knowledge represented by 앟x and 앟y and if we do not know whether or not x and y are interactive that is, 앟xy is not known, we can use the upper bound min(앟x, 앟y) instead, which is less informative (but which agrees with the available knowledge). This is minimal specificity again. Note that whereas in probability theory, independence plays the same role as noninteractivity in possibility theory (probabilistic variables can be assumed to be stochastically independent by virtue of the principle of maximum entropy), stochastic independence does not lead to bounding properties as noninteractivity does because stochastic independence assumes an actual absence of correlation whereas noninteractivity expresses a lack of knowledge. The noninteractivity of variables implies that the possibility and necessity measures become decomposable with respect to Cartesian products (A ⫻ B) and the associated coproducts (A ⫹ B ⫽ A ⫻ B), respectively: (A × B) = min x (A), y (B) (21) N(A + B) = max[Nx (A), Ny (B)]
(22)
where A and B are subsets of the universes U and V (the ranges of x and y) and ⌸x and Nx are possibility and necessity measures based on the normalized possibility distribution 앟x. This is useful in fuzzy pattern matching evaluation where
compound requirements of the form ‘A and B’ or ‘A or B’ are matched again fuzzy data pertaining only to different attributes x. Representing Fuzzy Rules A general approach to modeling fuzzy statements has been outlined by Zadeh (36) via the representation language PRUF. Of particular interest are fuzzy rules, which are rules whose conditions and/or conclusions have the form x is F. Simple fuzzy rules have the generic form ‘‘if x is F, then y is G’’ and express a fuzzy link between x and y. Several interpretation of fuzzy rules exist (37). An immediate application of possibility and certainty qualification is the representation of two kinds of fuzzy rules, called certainty and possibility rules (12). The fuzzy rule ‘‘the more x is F, the more certain y is G’’ can be represented by πx,y (u, v) ≤ max[µG (v), 1 − µF (u)]
(23)
and the fuzzy rule ‘‘the more x is F, the more possible y is G’’ by πx,y (u, v) ≥ min[µG (v), µF (u)]
(24)
Indeed, letting 움 ⫽ 애F(u) and changing 애A(u) into 애G(v) in the expressions of certainty qualification [Eq. (12)] and possibility qualification [Eq. (13)], these rules can be understood as ‘‘if x ⫽ u, then y is G is 애F(u)-certain’’ (respectively, 애F(u)-possible). Note that Eq. (24) is the representation of a fuzzy rule originally proposed by Mamdani (38) in fuzzy logic controllers. The principle of minimal specificity leads to representing the certainty rules by 앟x,y(u, v) ⫽ 애F(u) 씮 애G(v), where a 씮 b ⫽ max(1 ⫺ a, b) is known as the Dienes implication. The principle of minimal specificity cannot be applied to Eq. (24). On the contrary, the rule expresses that the values v, for instance, in the core of G (such that 애G(v) ⫽ 1) are possible at least at the degree 움 ⫽ 애F(u). This does not forbid other values outside G from being possible also. The inequality in Eq. (24) explains why conclusions are combined disjunctively in Mamdani’s treatment of fuzzy rules (and not conjunctively as with implication-based representations of fuzzy rules). Here a maximal informativeness principle consists in considering that only the values in G are possible and only at the degree 애F(u) (for the values in the core of G). This leads to 앟x,y(u, v) ⫽ min[애G(v), 애F(u)], that is, only what is explicitly stated as possible is assumed possible. Another type of fuzzy rule, which is of interest in interpolative reasoning, is gradual rules. They are of the form ‘‘the more x is F, the more y is G.’’ This translates into a constraint on 앟x,y acknowledging that the image of ‘x is F’ by 앟x,y (defined by combination and projection) is included in G, that is, ∀u, supu min[µF (u), πx,y (u, v)] ≤ µG (v)
(25)
Then the principle of minimal specificity leads to a representation by 앟x,y(u, v) ⫽ 1 if 애F(u) ⱕ 애G(v). When 애F(u) ⬎ 애G(v), 앟x,y(u, v) can be taken equal to 0 or 애G(v) depending whether or not the relationship between x and y is fuzzy. The Approximate Reasoning Methodology Inference in the framework of possibility theory is based on a combination/projection principle stated by Zadeh (35) for
POSSIBILITY THEORY
fuzzy constraints, namely, given a set of n statements S1 . . . Sn that form a knowledge base, inference proceeds in three steps: 1. Translate S1, . . ., Sn into possibility distributions restricting the values of involved variables. Facts of the form ‘x is F’ translate into 앟x ⫽ 애F, and rules of the form ‘‘if x is F, then y is G’’ translate into possibility distributions 앟x,y ⫽ 애R where 애R derives from the semantics of the rule, as discussed previously. 2. Combine the various possibility distribution conjunctively to build a joint possibility distribution expressing the meaning of the knowledge base, that is, π = min(π1 , . . ., πn ) 3. Project 앟 on the universe corresponding to some variable of interest. The combination/projection principle is an extension of classical deduction. For instance, if S1 ⫽ ‘‘x is F⬘ ’’ and S2 ⫽ ‘‘if x is F, then y is G’’, πx,y (u, v) = min[µF (u), µR (u, v)] where 애R represents the rule S2. Then, the fact ‘‘y is G⬘’’ is inferred such that 애G⬘(v) ⫽ supu min[애F⬘(u), 애R(u, v)]. This is called the generalized modus ponens and was proposed by Zadeh (39). This approach has found applications in many systems implementing fuzzy logic and in possibilistic belief networks, such as POSSINFER (40). Possibilistic Logic and Nonmonotonic Reasoning An important particular case of approximate reasoning considers only necessity-qualified classical propositions. It gives birth to possibilistic logic (41). A possibilistic knowledge base K is a set of pairs (p, s) where p is a classical logic formula and s is a lower bound of a degree of necessity (N(p) ⱖ s). It can be viewed as a stratified deductive data base where the higher s is, the safer is the piece of knowledge p. Reasoning from K means using the safest part of K to make inference, whenever possible. Denoting K움 ⫽ 兵 p, (p, s) 僆 K, s ⱖ 움其, the entailment K (p, 움) means that K움 p. K can be inconsistent and its inconsistency degree is inc(K) ⫽ sup兵움, K (⬜, 움)其 where ⬜ denotes the contradiction. In contrast with classical logic, inference in the presence of inconsistency becomes nontrivial. This is the case when K (p, 움) where 움 ⬎ inc(K). Then it means that p follows from a consistent and safe part of K (at least at level 움). Moreover, adding p to K and nontrivially entailing q from K 傼 兵 p其 corresponds to revising K upon learning p and having q as a consequence of the revised knowledge base. This notion of revision is exactly that studied by Ga¨rdenfors (42) at the axiomatic level (31). This kind of syntactic nontrivial inference is sound and complete with respect to a so-called preferential entailment at the semantic level. p is said to preferentially entail q if all the preferred interpretations, which make p true, make q true also. A preferred interpretation is one which maximizes the possibility distribution 앟K, the least specific possibility distribution satisfying the set of constraints N(p) ⱖ s where (p, s) 僆 K; see (43). Possibilistic logic does not allow for directly encoding pieces of generic knowledge, such as ‘‘birds fly.’’ However, it
557
provides a target language in which plausible inference from generic knowledge can be achieved in the face of incomplete evidence. In possibility theory, ‘‘p generally entails q’’ is understood as ‘‘p ∧ q is a more plausible situation than p ∧ ¬q.’’ It defines a constraint of the form ⌸(p ∧ q) ⬎ ⌸(p ∧ ¬q) that restricts a set of possibility distributions. Given a set S of generic knowledge statements of the form ‘‘pi generally entails qi,’’ a possibilistic base can be computed as follows. For each interpretation 웆 of the language, the maximal possibility degree 앟(웆) is computed that obeys the set of constraints in S. This is done by virtue of the principle of minimal specificity (or commitment) that assumes each situation as possible insofar as it has not been ruled out. Then each generic statement is turned into a material implication ¬pi ∨ qi, to which N(¬pi ∨ qi) is attached. It comes down, as shown by Benferhat et al. (44), to rank ordering the generic rules giving priority to the most specific ones, as done in Pearl’s system Z (45). A very important property of this approach is that it is exception-tolerant. It offers a convenient framework for implementing a basic form of nonmonotonic system called rational closure (46) and addresses a basic problem in the expert system literature, that is, handling exceptions in uncertain rules. Computing with Fuzzy Numbers Another important application of the combination/projection principle is computation with ill-known quantities. Given two ill-known quantities represented by the possibility distributions 앟x and 앟y on the real line, the possibility distribution restricting the possible values of f(x, y) where f is some function (e.g., f(x, y) ⫽ x ⫹ y), is given by
π f (x,y) (w) = [ f −1 (w)] = sup(u,v)∈ f −1 (w) min[πx (u), πy (v)]
(26)
where f ⫺1(w) ⫽ 兵(u, v)兩f(u, v) ⫽ w其 and x and y are assumed to be noninteractive. This is the basis for extending arithmetic operations to fuzzy numbers and their application to mathematical programming. Simple formulas exist for practically computing the sum of parameterized fuzzy intervals and other arithmetic operations; see (47–49). It also applies in flexible constraint satisfaction problems. Constraint satisfaction problems, which are a basic paradigm in artificial intelligence, can be extended to flexible and prioritized constraints in the setting of possibility theory. In this case, possibility distributions model preferences about the way constraints have to be satisfied. Ill-known parameters in the description of the problems can be also dealt with. Thus preferences and uncertainty are handled in the same framework (50). Defuzzification Fuzzy-set-based approximate reasoning usually yields conclusions in the form of possibility distributions. Then to make them more easy to interpret for the end user, we may approximate them linguistically in terms of fuzzy sets which represent terms in a prescribed vocabulary. We may also defuzzify these conclusions. A well-known method, often used in fuzzy control applications, is the center of gravity method which computes the scalar value defuzz(π ) = u · π (u) du/ π (u) du (27)
558
POSSIBILITY THEORY
if 앟 is the result to be defuzzified. However, this method has no justification in the framework of possibility theory. Another method, which agrees with this framework is the selection of the value(s) which maximize(s) 앟, but it generally yields a set larger than a singleton. Yet another method, which is especially of interest for fuzzy intervals of the real line, consists of computing lower and upper expectations of the variable restricted by the fuzzy interval. It is thus summarized by a pair of numbers interpretable as an ordinary interval. For example, let 앟 ⫽ 애F be a fuzzy interval. Its 움-cut F움 ⫽ 兵u, 애F(u) ⱖ 움其 is by definition an interval [inf F움, sup F움]. Then the lower and upper expectations can be defined by
E ∗(F ) =
1
inf Fα · dα
2. R. R. Yager, An introduction to applications of possibility theory, Human Syst. Manage., 3: 246–269, 1983. 3. R. R. Yager, On the specificity of a possibility distribution, Fuzzy Sets Syst., 50 (3): 279–292, 1992. 4. G. J. Klir and Y. Bo, Fuzzy Sets and Fuzzy Logic—Theory and Applications, Upper Saddle River, NJ: Prentice-Hall, 1995. 5. R. R. Yager, A foundation for a theory of possibility, J. Cybernetics, 10: 177–204, 1980. 6. H. Prade, Modal semantics and fuzzy set theory. In R. R. Yager (ed.), Fuzzy Set and Possibility Theory—Recent Developments, New York: Pergamon Press, 1982, p. 232–246. 7. D. Dubois and H. Prade, (with the collaboration of H. Farreny, R. Martin–Clouaire, and C. Testemale), Possibility Theory—An Approach to Computerized Processing of Uncertainty, New York: Plenum Press, 1988. 8. D. Dubois and H. Prade, Fuzzy Sets and Systems—Theory and Applications, New York: Academic Press, 1980.
0
and (28)
9. L. A. Zadeh, Fuzzy sets and information granularity. In M. M. Gupta, R. K. Ragade, and R. R. Yager (eds.), Advances in Fuzzy Set Theory and Applications, Amsterdam: North-Holland, 1979, pp. 3–18.
See (51) for a justification. Thus, the interval [Eⴱ(F), E*(F)] summarizes the imprecision of the fuzzy interval F. It can be also verified that
10. M. Sugeno, Fuzzy measures and fuzzy integrals—A survey, In M. M. Gupta, G. N. Saridis, and B. R. Gaines (eds.), Fuzzy Automata and Decision Processes, Amsterdam: North-Holland, 1977, pp. 89–102.
E ∗(F ) =
1
sup Fα · dα 0
E ∗(F ) + E ∗(F ) = 2
+∞ −∞
11. M. Inuiguchi, H. Ichihashi, and H. Tanaka, Possibilistic linear programming with measurable multiattribute value functions, ORSA J. Comput., 1 (3): 146–158, 1989.
x · pF (x) dx
with pF (x) =
1 0
µFα (x)
dα |Fα |
(29)
which provides a justification of the middle of the interval as the result of the defuzzification of F, since the density is based on a uniform probability on each 움-cut F움(兩F움兩 is the length of F움), just applying a generalized Laplacean indifference principle. With this definition, the defuzzification of the sum of two fuzzy intervals [in the sense of Eq. (26)] yields the sum of the values obtained by defuzzifying each fuzzy interval. CONCLUDING REMARKS This article provides a brief overview of the basic features of possibility theory, of its relationships with other uncertainty calculi, and of its applications to various types of approximate reasoning. However, some applications, such as data fusion, in which possibility theory offers a variety of combination modes (including weighted, prioritized, and adaptive aggregation rules) in poorly informed environments (52,53), and decision under uncertainty, where a (necessity-based) pessimistic and a (possibility-based) optimistic qualitative counterpart of classical expectation (in terms of Sugeno integrals), have been proposed in (54) but are not reviewed here. Recent theoretical developments of possibility theory can be found in (55), and a thorough presentation and discussion of possibility theory can be found in Ref. 56. BIBLIOGRAPHY 1. L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst., 1: 3–28, 1978.
12. D. Dubois and H. Prade, Fuzzy rules in knowledge-based systems—Modelling gradedness, uncertainty and preference. In R. R. Yager and L. A. Zadeh (eds.), An Introduction to Fuzzy Logic Applications in Intelligent Systems, Dordrecht. The Netherlands: Kluwer Academic Publ., 1992, pp. 45–68. 13. D. Dubois, Belief structures, possibility theory and decomposable confidence measures on finite sets, Comput. Artificial Intell., 5 (5): 403–416, 1986. 14. D. K. Lewis, Counterfactuals, Oxford: Basil Blackwell, 1973. 2nd edition, Worcester, UK: Billing and Sons, 1986. 15. E. Hisdal, Conditional possibilities—Independence and non-interactivity, Fuzzy Sets Syst., 1: 283–297, 1978. 16. D. Dubois et al., An ordinal view of independence with application to plausible reasoning. In R. Lopez de Mantaras and D. Poole (eds.), Proc. 10th Conf. Uncertainty Artificial Intell., Seattle, WA, July 29–31, 1994, pp. 195–203. 17. S. Nahmias, Fuzzy variables, Fuzzy Sets Syst., 1: 97–110, 1978. 18. G. Shafer, A Mathematical Theory of Evidence, Princeton, NJ: Princeton University Press, 1976. 19. P. Z. Wang and E. Sanchez, Treating a fuzzy subset as a projectable random subset. In M. M. Gupta and E. Sanchez (eds.), Fuzzy Information and Decision Processes, Amsterdam: North-Holland, 1982, pp. 213–219. 20. D. Dubois and H. Prade, Fuzzy sets, probability and measurement, Eur. J. Operations Res., 40: 135–154, 1989. 21. A. P. Dempster, Upper and lower probabilities induced by a multiple-valued mapping. Ann. Math. Statistics, 38: 325–339, 1967. 22. D. Dubois and H. Prade, Consonant approximations of belief functions, Int. J. Approximate Reasoning, 4 (5/6): 419–449, 1990. 23. D. Dubois and H. Prade, When upper probabilities are possibility measures, Fuzzy Sets Syst., 49: 65–74, 1992. 24. D. Dubois and H. Prade, Focusing vs. revision in possibility theory, in Proc. 5th Int. Conf. Fuzzy Syst. (FUZZ-IEEE), New Orleans, Sept. 8–11, 1996, pp. 1700–1705. 25. G. J. Klir and B. Parviz, Probability-possibility transformation: A comparison, Int. J. General Syst., 21: 291–310, 1992.
POSTAL SERVICES
559
26. D. Dubois, H. Prade, and S. Sandri, On possibility/probability transformations. In R. Lowen and M. Lowen (eds.), Fuzzy Logic: State of the Art, Kluwer Academic Publ., 1993, pp. 103–112.
49. R. Slowinski and J. Teghem (eds.), Stochastic Versus Fuzzy Approaches to Multiobjective Mathematical Programming Under Uncertainty. Dordrecht: Kluwer Academic Publ., 1990.
27. D. Dubois, S. Moral, and H. Prade, A semantics for possibility theory based on likelihoods, J. Math. Anal. Appl., 205: 359–380, 1997. 28. B. Natvig, Possibility versus probability, Fuzzy Sets Syst., 10: 31– 36, 1983. 29. S. F. Thomas, Fuzziness and Probability, Wichita, KS: ACG Press, 1995. 30. W. Spohn, Ordinal conditional functions: A dynamic theory of epistemic states. In W. Harper and B. Skyrms (eds.), Causation in Decision, Belief Change and Statistics, Kluwer Acad. Publishers, Dordecht, Netherlands, 1988, pp. 105–134.
50. D. Dubois, H. Fargier, and H. Prade, Possibility theory in constraint satisfaction problems: Handling priority, preference and uncertainty, Appl. Intell., 6: 287–309, 1996. 51. D. Dubois and H. Prade, The mean value of a fuzzy number, Fuzzy Sets Syst., 24: 279–300, 1987. 52. D. Dubois and H. Prade, Possibility theory and data fusion in poorly informed environments, Control Eng. Practice, 2 (5): 811– 823, 1994. 53. R. R. Yager, A general approach to the fusion of imprecise information, Int. J. Intell. Syst., 12: 1–29, 1997. 54. D. Dubois and H. Prade, Possibility theory as a basis for qualitative decision theory, Proc. 14th Int. Joint Conf. Artificial Intell. (IJCAI’95), Montre´al, Canada, Aug. 20–25, 1995, pp. 1924–1930.
31. D. Dubois and H. Prade, Epistemic entrenchment and possibilistic logic, Artificial Intell., 50: 223–239, 1991. 32. Z. Pawlak, Rough Sets—Theoretical Aspects of Reasoning about Data, Dordrecht: Kluwer Academic Publ., 1991. 33. D. Dubois and H. Prade, Rough fuzzy sets and fuzzy rough sets, Int. J. General Syst., 17 (2–3): 191–209, 1990. 34. L. A. Zadeh, A theory of approximate reasoning. In J. E. Hayes, D. Michie, and L. I. Mikulich (eds.), Machine Intelligence, New York: Elsevier, 1979, Vol. 9, pp. 149–194. 35. L. A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning, Information Sciences, Part 1: 8: 199–249; Part 2: 8: 301–357; Part 3: 9: 43–80, 1975. 36. L. A. Zadeh, PRUF—A meaning representation language for natural languages, Int. J. Man-Machine Studies, 10: 395–460, 1978. 37. D. Dubois and H. Prade, What are fuzzy rules and how to use them, Fuzzy Sets Syst., 84: 169–185, 1996. 38. E. H. Mamdani, Application of fuzzy logic to approximate reasoning using linguistic systems, IEEE Trans. Comput., 26: 1182– 1191, 1977. 39. L. A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. Systems, Man Cybernetics, 3: 28–44, 1973. 40. J. Gebhardt and R. Kruse, POSSINFER: A software tool for possibilistic inference. In D. Dubois, H. Prade, and R. R. Yager (eds.), Fuzzy Information Engineering—A Guided Tour of Applications, New York: Wiley, 1997, pp. 405–415. 41. D. Dubois, J. Lang, and H. Prade, Automated reasoning using possibilistic logic: Semantics, belief revision and variable certainty weights, IEEE Trans. Knowl. Data Eng., 6 (1): 64–71, 1994. 42. P. Ga¨rdenfors, Knowledge in Flux—Modeling the Dynamics of Epistemic States, Cambridge, MA: The MIT Press, 1988. 43. D. Dubois and H. Prade, Possibilistic logic, preferential models, non-monotonicity and related issues, Proc. Int. Joint Conf. Artificial Intell. (IJCAI’91), Sydney, Australia, Aug. 24–30, 1991, pp. 419–424. 44. S. Benferhat, D. Dubois, and H. Prade, Representing default rules in possibilistic logic, Proc. 3rd Int. Conf. Principles Knowl. Representation Reasoning (KR’92), Cambridge, MA, Oct. 26–29, 1992, pp. 673–684. 45. J. Pearl, System Z: A natural ordering of defaults with tractable applications to default reasoning, Proc. 3rd Conf. Theoretical Aspects Reasoning About Knowl. (TARK’90) (R. Parikh, ed.), Morgan and Kaufmann, 1990, pp. 121–135. 46. D. Lehmann and M. Magidor, What does a conditional knowledge base entail?, Artificial Intell., 55 (1): 1–60, 1992. 47. A. Kaufmann and M. M. Gupta, Introduction to Fuzzy Arithmetic—Theory and Applications, New York: Van Nostrand Reinhold, 1985. 48. D. Dubois and H. Prade, Fuzzy numbers: An overview. In J. C. Bezdek (ed.), The Analysis of Fuzzy Information—Vol. 1: Mathematics and Logic, Boca Raton, FL: CRC Press, 1987, pp. 3–39.
55. G. De Cooman, Possibility theory—Part I: Measure- and integraltheoretics groundwork; Part II: Conditional possibility; Part III: Possibilistic independence, Int. J. General Syst., 25: 291–371, 1997. 56. D. Dubois and H. Prade, Possibility theory: Qualitative and quantitative aspects, in Handbook of Defeasible Reasoning and Uncertainty Management. Volume I: Quantified Representation of Uncertainty and Imprecision, Dordrecht: Kluwer Academic Publ., 1998.
DIDIER DUBOIS HENRI PRADE Paul Sabatier University
POSSIBILITY THEORY DATABASES. See FUZZY INFORMATION RETRIEVAL AND DATABASES.