Pattern Recognition 33 (2000) 741}754
3-D object recognition using a new invariant relationship by single-view Kyoung Sig Roh!,*, In So Kweon" !System and Control Sector, Samsung Advanced Institute of Technology, P.O. Box 111, Suwon 440-600, South Korea "Department of Automation and Design Engineering, Korea Advanced Institute of Science and Technology, 207-43, Cheongryangri-dong, Dongdaemoon-gu, Seoul, South Korea Received 7 October 1998; accepted 18 March 1999
Abstract We propose a new method for recognizing three-dimensional objects using a three-dimensional invariant relationship and geometric hashing by single-view. We develop a special structure consisting of four co-planar points and any two non-planar points with respect to the plane. We derive an invariant relationship for the structure, which is represented by a plane equation. For the recognition of three-dimensional objects using the geometric hashing, a set of points on the plane, thereby satisfying the invariant relationship, are mapped into a set of points intersecting the plane and the unit sphere. Since the structure is much more general than the previous structures proposed by Rothwell et al. (Oxford University TR-OUEL 1927/92, 1992) and Zhu et al. (Proceedings of the 12th International Conference on Robotics and Automation, Nagoya, Japan, 1995, pp. 1726}1731), it gives enough many voting to generate hypotheses. We also show that from the proposed invariant relationship, an invariant for the structure by two-view and an invariant for a structure proposed by Zhu et al. (Proceedings of the 12th International Conference on Robotics and Automation, Nagoya, Japan, 1995, pp. 1726}1731) can also be derived. Experiments using three-dimensional polyhedral objects are carried out to demonstrate the feasibility of our method for three-dimensional objects. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: 3-D object recognition; One-viewed invariant relationship; Geometric hashing
1. Introduction Most of the invariants used so far in computer vision applications are based on plane-to-plane mappings. These invariants of the plane projective group have been very well studied and many forms have been known. They have also been successfully applied in working vision systems [1}3,14]. Constructing invariants for 3D structures from their 2D perspective images is much more di$cult and repres-
* Corresponding author. Tel.: #82-331-280-9275; fax: #82331-280-9257. E-mail addresses:
[email protected] (K.S. Roh),
[email protected] (I.S. Kweon)
ents the major goal of current research in the application of invariant theory to vision. Burns et al. [4] show that invariants cannot be measured for 3D point sets in general position from a single view, that is, for sets that contain absolutely no structure. There are three categories to solve this problem. In the "rst category, one basically deals with space projective invariants from two images, provided that the epipolar geometry of the two images is determined a priori [5}7]. Secondly, without computing the epipolar geometry space projective invariants from three images can be determined [8,9]. Thirdly, some special structures can provide projective invariants by single view [10}12]. Among the three categories, the third approach does not need the correspondence information between features in each image. Rothwell et al. [12] proposed two
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 2 - 8
742
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
special structures from which one-viewed projective invariant can be derived: One is for points that lie on the vertices of polyhedron, from which invariants are computed by using an algebraic framework of constraints between points and planes. The other is for objects that are bilateral symmetric. For the "rst class of objects, a minimum of seven points that lie on the vertices of a six-sided polyhedron, are required in order to recover the projective structure. For the second class of objects, a minimum of eight points, or four points and two lines that are bilateral symmetric, are needed. Zhu et al. [11] proposed an algorithm to compute an invariant based on a structure of six points on adjacent planes, which provided two sets of four coplanar points. The invariant is less constrained than the invariant proposed by Rothwell et al. [12], because it needs only six points instead of seven. In this paper, we propose a new invariant relationship for a structure that is even more general than one by Zhu et al. [11]. The structure consists of a set of six points; four coplanar points and two non-coplanar points with respect to the plane. In general, this structure provides an invariant by two viewed images for which a priori epipolar geometry is not required [7]. However, we derive an invariant relationship for the structure using just oneview. The relationship can be represented as an orthogonal plane for a vector that is computed uniquely from the structure. To recognize three-dimensional objects, we propose a model-base using the geometric hashing by using the invariant relationship.
2. A new invariant relationship In this section, we present a three-dimensional projective invariant relationship from a single view, which is based on a structure with six points: four coplanar points and two any other non-coplanar points. Also we present a new model-base using the invariant relationship. We derive an invariant for the structure, which is consisted of four coplanar points and two any other points, using a canonical frame concept [13]. Theorem 1. Let X , i"1}6 be six points on an object and i X , i"1}6 be the corresponding image points, where i X , X , X , and X are coplanar points and X , X are two 1 2 3 4 5 6 any other non-coplanar points, as shown Fig. 1. Then, an invariant relationship among the points sets of object and corresponding image points becomes a form of plane equation as follows: (V ]V ) V 2 ' 3 "0, ! 1 DDV ]V DD DDV DD 1 2 3 where V "(u , v , w ), V "(u , v , w ), V "(a6, bM , c6) and 1 5 5 5 2 6 6 6 3 all of these are represented in canonical coordinates.
Fig. 1. Projection of a set of six points, four coplanar points, X , X , X , and X and two any other non-coplanar points, 1 2 3 4 X ,X . 5 6
Proof. Let us assign canonical projective coordinates to the six points as follows: X "(X , > , Z , 1)P(1, 0, 0, 0), 1 1 1 1 X "(X , > , Z , 1)P(0, 1, 0, 0), 2 2 2 2 X "(X , > , Z , 1)P(0, 0, 1, 0), 3 3 3 3 X "(X , > , Z , 1)P(a, b, c, 0), 4 4 4 4 X "(X , > , Z , 1)P(0, 0, 0, 1), 5 5 5 5 X "(X , > , Z , 1)P(1, 1, 1, 1). 6 6 6 6
(1)
Thus, X , i"1}3 and X , X , form a canonical i 5 6 basis. We can obtain a unique space collineation A , 4C4 where det(A )O0, which transforms the original 4C4 "ve points into the canonical basis. The fourth point is transformed into its projective coordinates, (a, b, c, 0)T by A . For the projections of these six points onto 4C4 an image, we take x , i"1}4 as the canonical i projective coorindates in the image plane space. Then we can obtain a unique plane collineation A , where 3C3 det(A )O0, and A transforms the "fth and sixth 3C3 3C3 points to (u , v , w )T and (u , v , w )T. Let us assign ca5 5 5 6 6 6 nonical projective coordinates to the six image points as follows: x "(x , y , 1)P(1, 0, 0), 1 1 1 x "(x , y , 1)P(0, 1, 0), 2 2 2 x "(x , y , 1)P(0, 0, 1), 3 3 3 x "(x , y , 1)P(1, 1, 1), 4 4 4 x "(x , y , 1)P(u , v , w ), 5 5 5 5 5 5 x "(x , y , 1)P(u , v , w ). 6 6 6 6 6 6
(2)
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
The relationship between the object points and the corresponding image points is
C
u u 5 6 0 1 0 1 v v 5 6 0 0 1 1 w w 5 6
1 0 0 1
C
o
0 0 1 0 o 0 2 "T 0 0 o 3 0 0 0
C
D
o a 0 4 o b 0 4 o c 0 4 0 o 5
D
t t t t 11 12 13 14 T" t t t t . 21 22 23 24 t t t t 31 32 33 34
o
D
6
o
6 , where 6
o o
6
743
(V ]V ) V 1 2 ' 3 "0, (7) DDV ]V DD DDV DD 1 2 3 where V "(u , v , w ), V "(u , v , w ), V "(a6, bM , c6). 1 5 5 5 2 6 6 6 3 From the invariant relationship de"ned by Eq. (7), we can observe that V designed from the structured object 3 points is orthogonal to the cross product of V and V 1 2 which are extracted from the image. Therefore, all the vectors on the plane orthogonal to V satisfy the above 3 relationship. If the sixth point X is on the plane constructed by 6 (X , X , X ), the structure becomes the same one pro3 4 5 posed by Zhu et al. [11]. We can easily derive the invariant for the structure by adding the invariant relation to the coplanar condition. h
(3)
The right-hand side of Eq. (3) is rearranged and becomes
Theorem 2 (Coplanar Case including the fourth point). Let X , i"1}6 be six points on adjacent planes of i an object, (X , X , X , X ) and (X , X , X , X ). And let 1 2 3 4 1 4 5 6 x , i"1}6 be the corresponding image points. Then, the i
C
D
o t o t o t o (at #bt #ct ) o t o (t #t #t #t ) 1 11 2 12 3 13 4 11 12 13 5 14 6 11 12 13 14 o t o t o t o (at #bt #ct ) o t o (t #t #t #t ) . 1 21 2 22 3 23 4 21 22 23 5 24 6 21 22 23 24 o t o t o t o (at #bt #ct ) o t o (t #t #t #t ) 1 31 2 32 3 33 4 31 32 33 5 34 6 31 32 33 34
(4)
Therefore, from Eqs. (3) and (4), we can obtain each element of transformation matrix T as follows:
invariant is represented, uniquely:
t "1/o , t "1/o , 11 1 22 2
bM "c6"$
t "1/o , 33 3
t "t "t "t "t "t "0, 12 13 21 23 31 32 a6"G
u v w t " 5 , t " 5 , t " 5, 14 o 24 o 34 o 5 5 5 1 a6 1 bM " , " , o o o o 1 4 2 4
1 c6 " , o o 3 4
Proof. The coplanar condition becomes (5)
We can de"ne the invariant relationship from the sixth column in Eq. (4) and the elements computed in Eq. (5),
C
u 5 bM v 5 c6 w 5
!u 6 !v 6 !w 6
DC D
1/o 4 1/o "0. 5 1/o 6
K
u
bM v c6
5
5 w 5
K
!u 6 !v "!( V ]V ) ' V "0 or 6 1 2 3 !w 6
DX 1
X 4
K K 1 a
0 1
0 b
0 1
X X D" 5 6 0 c
0 0
(6)
From the condition for a non-trivial solution for the equation, we obtain the relationship a6
(< !< ) 42 43 J(2!(< #< )2) 42 43
where V "V ]V . 4 1 2
1 where a6" , a
1 1 bM " , c6" . b c
a6
< 41 , J(2!(< #< )2) 42 43
0 1
"0 or c"b. (8)
1 1
By substituting Eq. (8) into Eq. (7), the invariant is represented as bM "c6"$
a6"G
< 41 , J(2!(< #< )2) 42 43
< !< 42 43 . J(2!(< #< )2) 42 43
(9)
744
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
Likewise, if (X , X , X , X ) consists of a plane, then the 2 4 5 6 invariant is represented as a6"c6"$ bM "G
< 42 , J(2!(< #< )2) 41 43
< !< 41 43 , J(2!(< #< )2) 41 43
(10)
and if (X , X , X , X ) consist of a plane, then the invari3 4 5 6 ant is represented as a6"bM "$ c6"G
< 43 , J(2!(< #< )2) 41 42
< !< 41 42 . J(2!(< #< )2) 42 41
(11)
A transforms the "fth and sixth points to (u , v , w )T 3C3 5 5 5 and (u , v , w )T. 6 6 6 x "(x , y , 1)P(1, 0, 0), 1 1 1 x "(x , y , 1)P(0, 1, 0), 2 2 2 x "(x , y , 1)P(0, 0, 1), 3 3 3 (13) x "(x , y , 1)P(1, 1, 1), 4 4 4 x "(x , y , 1)P(u , v , w ), 5 5 5 5 5 5 x "(x , y , 1)P(u , v , w ). 6 6 6 6 6 6 The relationship between the object points and the corresponding image points is
C
1 0 0 1 u u 5 6 0 1 0 1 v v 5 6 0 0 1 1 w w 5 6
Theorem 3 (Coplanar Case not including the fourth point). Let X , i"1}6 be six points on adjacent planes of i an object, (X , X , X , X ) and (X , X , X , X ). And let 1 2 3 4 1 2 5 6 x , i"1}6 be the corresponding image points. Then, the i invariant is represented, uniquely: (V ]V ) V 2 ' 3 "0, ! 1 DDV ]V DD DDV DD 1 2 3 where V "(u , v , w ), V "(u , v , w ), V "(a6, bM , 0) 1 5 5 5 2 6 6 6 3 and all of these are presented in canonical coordinates.
C
o
0
0
0
o
0
0
0
o
0
0
0
"T
C
1
2
3
D
o a 4 o b 4 o c 4 0
D
0
o
0
o
0
0
o 5
o
t t t t 11 12 13 14 T" t t t t . 21 22 23 24 t t t t 31 32 33 34
D
6
6 , where 6
(14)
The right-hand side of Eq. (14) is arranged to
C
D
o t o t o t o (at #bt #ct ) o t o (t #t #t ) 1 11 2 12 3 13 4 11 12 13 5 14 6 11 12 14 o t o t o t o (at #bt #ct ) o t o (t #t #t ) . 1 21 2 22 3 23 4 21 22 23 5 24 6 21 22 24 o t o t o t o (at #bt #ct ) o t o (t #t #t ) 1 31 2 32 3 33 4 31 32 33 5 34 6 31 32 34
(15)
Proof. Let us assign canonical projective coordinates to the six poin ts as follows: X "(X , > , Z , 1)P(1, 0, 0, 0), 1 1 1 1 X "(X , > , Z , 1)P(0, 1, 0, 0), 2 2 2 2 X "(X , > , Z , 1)P(0, 0, 1, 0), 3 3 3 3 (12) X "(X , > , Z , 1)P(a, b, c, 0), 4 4 4 4 X "(X , > , Z , 1)P(0, 0, 0, 1), 5 5 5 5 X "(X , > , Z , 1)P(1, 1, 0, 1). 6 6 6 6
Therefore, from Eqs. (14) and (15), we can obtain each element of transformation matrix T as follows:
Thus, X , i"1}3 and X , X , form a canonical basis. i 5 6 We can obtain a unique space collineation A , where 4C4 det(A )O0, which transforms the original "ve points 4C4 into the canonical basis. The fourth point is transformed into its projective coordinates (a, b, c, 0)T by A . For 4C4 the projections of these six points onto an image, we take x , i"1,2,4 as the canonical projective coordinates in i the image plane space. Then we can obtain a unique plane collineation A , where det(A )O0. And 3C3 3C3
(16)
t "1/o , t "1/o , t "1/o , 11 1 22 2 33 3 t "t "t "t "t "t "0, 12 13 21 23 31 32 u v w t " 5 , t " 5 , t " 5, 14 o 24 o 34 o 5 5 5 1 a6 1 bM 1 c6 " , " , " , where o o o o o o 4 2 4 3 4 1 1 1 1 a6" , bM " , c6" . h a b c
We can de"ne the invariant relationship from the sixth column in Eq. (15) and the elements computed in Eq. (16),
C
a6
u 5 bM v 5 0 w 5
!u 6 !v 6 !w 6
DC D
1/o 4 1/o "0. 5 1/o 6
(17)
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
745
From the condition for a non-trivial solution for the equation, we obtain the relationship,
K
a6 bM 0
K
u !u 5 6 v !v "!( V ]V ) ' V "0 or 1 2 3 5 6 w !w 5 6 (V ]V ) V 1 2 ' 3 "0, DDV ]V DD DDV DD 1 2 3
(18)
where V "(u , v , w ), V "(u , v , w ), V "(a6, bM , 0). 1 5 5 5 2 6 6 6 3 Likewise, if (X , X , X , X ) consist of a plane, then the 1 3 5 6 invariant is represented as
K
K
a6 u !u 5 6 0 v !v "!( V ]V ) ' V "0 or 5 6 1 2 3 c6 w !w 5 6 (V ]V ) V 1 2 ' 3 "0, DDV ]V DD DDV DD 1 2 3
Fig. 2. An unit sphere as a structure of a model-base.
(19)
where V "(u , v , w ), V "(u , v , w ), V "(a6, 0, c6), 1 5 5 5 2 6 6 6 3 and if (X , X , X , X ), consist of a plane, then the invari2 3 5 6 ant is represented as
K
K
u !u 5 6 bM v !v "!( V ]V ) ' V "0 or 1 2 3 5 6 c6 w !w 5 6
0
(V ]V ) V 1 2 ' 3 "0. DDV ]V DD DDV DD 1 2 3
Fig. 3. The coordinates system.
(20)
where V "(u , v , w ), V "(u , v , w ), V "(0, bM , c6). 1 5 5 5 2 6 6 6 3
or
AB
h"tan~1 3. A new structure for model-base To use the invariant relationship, obtained in the previous section, for the recognition of three-dimensional polyhedral objects, we must construct an e$cient database or model-base. Given the invariant (a6, bM , c6)T for a set of points on a structured object, we must record the information about the structure; a model number, a plane number, and another two any other points on the plane orthogonal to (a6, bM , c6)T. But it is very ine$cient to consider all positions on the plane. Thus, we consider a surface on the unit sphere as the structure of a model-base. Fig. 2 shows the proposed model-base structure, where (a6, bM , c6)T is the normalized vector for the invariant of object points and the invariant circle (s) represents the group of vectors that are orthogonal to (a6, bM , c6)T. A vector in the model-base structure can be represented by two parameters (h, /) as follows: (a6, bM , c6)"(sin / cos h, sin / sin h, cos /)
bM , /"cos~1(c6). a6
(21)
We can compute vectors on the invariant circle by a coordinate transformation: Z@-axis of the new coordinate system is aligned with the (a6, bM , c6) and X@-axis is placed on the X}> plane of the old coordinate system. We then obtain
CD C X
> " (cos / sin h) Z
DC D
(cos / cos h) (!sin h) (sin / cos h) (cos h)
(sin / sin h)
0
(cos /)
(!sin /)
X
> . (22) Z
Fig. 3 shows the coordinate systems, where (X, >, Z) is the old coordinate system and (X@, >@, Z@) is the new coordinate system. Then vectors on the invariant circle are (X@, >@, Z@)"(cos (, sin (, 0), where ("0!180 X"(cos / cos h)(cos ()!sin h(sin (), or >"(cos / sin h)(cos ()#(cos h)(sin (), Z"(!sin /)(cos ().
(23)
746
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
Here, we only consider ("0}180 because of the symmetric property of Eq. (7). These vectors are represented in the (h, /)-space as
AB
h"tan~1
> , /"cos~1(Z). X
(24)
4. Preliminary test We use a simple three-dimensional object to test the feasibility test for 3-D object recognition. Fig. 11(a) shows the object and Table 8(a) presents the coordinates in Euclidean space, and sets of points consisting of a plane. Fig. 4(a) shows the (X, >, Z)-space of the model-base constructed for the structure consisting of four coplanar points (1, 2, 3, 4) and two non-coplanar points (9, 12).
Fig. 4. (a) ln (X, >, Z), (b) in (h, /) space, the model-base for a structure consisting of four coplanar points (1, 2, 3, 4) and two non-coplanar points space (9, 12).
Fig. 6. Indexing by the invariant vector.
Fig. 5. Seven images of the object from di!erent view.
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754 Table 1 Extracted indexing vector
Table 4 Pseudo code for model-base construction
Known V "(a6, bM , c6)"(!0.8966,!0.3472, 0.2747) 3
a b c d e f g
747
Computed V "(V ]V ) 4 1 2
Index values cos~1(V ' V ) Error 3 4 (h, /) deg. deg.
!0.42, 0.57,!0.70 !0.43, 0.67,!0.60 !0.43, 0.60,!0.67 !0.45, 0.71,!0.53 !0.43, 0.56,!0.71 !0.44, 0.49,!0.75 !0.45, 0.68,!0.58
126.66, 45.25 122.87, 53.45 125.55, 47.68 122.55, 57.70 127.76, 44.80 131.50, 41.06 123.67, 54.64
90.65 90.50 90.48 89.23 90.09 89.29 89.39
0.65 0.50 0.48 0.77 0.09 0.71 0.61
Table 2 Invariant for the structure consisting of points (1, 4, 8, 5) and (9, 10) Known V "(a6, bM , c6)"(0.6667, 0.6667, 0.3333) 3 Image
Computed V3 "(a68, bMI , c86 ) 3
Error cos~1(V ' V3 ) 3 3
b. d.
(0.6703, 0.6703, 0.3183) (0.6802, 0.6802,!0.2733)
0.9339 3.5581
For Model l For Plane j (be consisted of four points) for Point k (excepts four points on plane j) for Point l (excepts four points on plane j and point k) COMPUT (a, b, c, 0) STORE Mi, j, kN into the entries in hash table indicated by Eq. (24) End for end for end for end for
error denotes the angle di!erence between the computed and the true (a6, bM , c6)T. For the structure, (3, 4, 5, 6) are coplanar points, we can extract the invariant from Eq. (9). Tables 2 and 3 represent the known and computed invariant for the structures.
5. Experiments 5.1. Geometric hashing
Table 3 Invariant for the structure consisting of points (5, 11, 9, 10) and (12, 13) Known V "(a6, bM , c6)"(0.0990, 0.0990, 0.9901) 3 Image
Computed V3 "(a68, bMI , c86 ) 3
Error cos~1(V ' V3 ) 3 3
a. b.
(0.0598, 0.0598, 0.9964) (0.0518, 0.0518, 0.9973)
3.2658 deg 3.9010 deg
Fig. 4(b) shows (h, /)-space. For this structure, (a6, bM , c6)T is (!0.8966,!0.3472, 0.2747). Fig. 5 shows seven images of the same object from di!erent viewing directions. Fig. 6 shows indexing by the invariant vector computed by the corresponding points on each image. Even though apparent views of the object are very di!erent in each of the seven images, the computed invariant values from the image exactly correspond to the pre-computed invariant curve as shown in Fig. 6. Table 1 represents the cross products of two canonical coordinates (or vectors) computed in each image, and the dot product between the cross product vector and (a6, bM , c6)T, which is computed in advance by using the stored 3-D coordinate values of the object. In this table,
Geometric invariant provides an indexing function for an e$cient model-based object recognition, in which the time complexity is rarely a!ected by the number of models. This is based on two stages: the "rst stage is an intensive model preprocessing stage, done o!-line, where transformation invariant feature of the models are indexed into a hash table. The second is an actual recognition stage, which employs the e$cient indexing made. Table 4 represents the pseudo-code for model-base construction. Table 5 represents the pseudo-code for object recognition. In Table 5, the condition that a set of "ve points is feasible, is as follows: 5.1.1. Feasible condition A set of "ve points is feasible, if the convex hull for four points among the "ve point is four and the other one point is outside of a rectangle constructed by the four points. Fig. 7 shows one example of a feasible set. 5.2. Image processing and hypotheses generation To reduce time complexity of hypotheses generation, we search for corner points as well as closed polygons in image processing. We use an algorithm proposed by Etemadi [14] to extract corners and polygons.
748
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
Table 5 Pseudo code for object recognition Given a scene with N point features, for for for for for
point point point point point
i"1}N j"1}N (except i) k"1}N (except i, j) l"1}N (except i, j, k) m"1}N (except i, j, k, l)
CHECK whether a set of the "ve points is feasible. If the set is feasible.
In Fig. 8, if we select point features 1, 2, 5, 4 and 7 as a feasible set, it is the structure proposed by Rothwell [12], consisting of three adjacent planes (1, 2, 5, 4), (4, 5, 8, 11) and (1, 4, 11, 7). Also the structure proposed by Zhu [11] can be constructed by two adjacent planes (1, 2, 5, 4) and (1, 4, 11, 7). Unfortunately, they do not provide su$cient invariants for object recognition. For this particular scene, however, our proposed invariant can be de"ned up to nine di!erent structures, which can be used to generate many hypotheses for object recognition.
For point n"1}N (except i, j, k, l, m) COMPUT V "(u , v , w ), V "(u , v , w ), and 1 - - 2 . . . V "V ]V 4 1 2 INDEXING into the entry of hash table indicated by V 4 VOTING Mmodel d, plane d, point dN in the entry end for if d of VOTING'Threshold HYPOTHESIS GENERATION &VERIFICATION if VERIFICATION""Successful. EXIT. end if end if end if end end end end end
for for for for for
Fig. 7. A feasible point set to compute the invariant relationship.
Fig. 8. Image processing.
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754 Table 6 The result of the hypotheses generation
1st 2nd 3rd 4th 5th
Plane Point
Vote
1 1 1 3 3
8 7 7 7 7
5 9 13 12 13
6th 7th 8th 9th 10th
749
Table 7 The result of veri"cation Plane Point
Vote
5 5 5 6 6
7 7 7 6 6
1 2 3 1 2
d of Matching points 1st 2nd 3rd 4th 5th
13 8 6 6 7
d of Matching points 6th 7th 8th 9th 10th
6 7 6 6 6
Fig. 9. The result of veri"cation for 1st and 2nd hypotheses among 10 hypotheses in Table 6. Fig. 10. The registration of 3-D object onto the image.
We compute invariants for points set consisting of the (1, 2, 5, 4) and 7, and 3, 6, 8, 9,2,15. And we vote the information in the model-base indexed by these invariants, which include information for the plane number and another one point. Then, hypotheses are generated if the vote is greater than a prede"ned threshold. Table 6 represents ten generated hypotheses for scene features 1, 2, 5, 4, and 7. The plane means the plane number de"ned in Table 8(a). And the point represents the point stored in model-base as a basis, which is explained in Section 5.1. 5.3. Verixcation and registration For each generated hypothesis, we compute a transformation between the image and the model, and project the model onto the image plane. Then, we count points within an error bound, i.e. matching points. We select a hypothesis with a maximum number of matching points. Fig. 9 shows the results of transformation for the 1st and 2nd hypotheses of 10 hypotheses given in Table 6. The stars (*) represent detected corner points and the circles (L) represent the transformed model corners. Table 7 shows the number of matching points obtained by veri"cation. From the result of veri"cation, the "rst hypothesis is selected as the true hypothesis with 13 matching points.
Fig. 10 shows a registration of the three-dimensional model overlaid onto the third image. 5.4. Experiments Fig. 11 shows eight models for testing our algorithm. The numbers in the "gure represent the point number, and Table 8 represent the 3D coordinates of the points for each model. Fig. 12 shows the input images obtained from any camera view. Fig. 13 shows the result of preprocessing and hypotheses generation, and the veri"cation for each eight input image.
6. Conclusion In this paper, we proposed a new 3-D invariant relationship of a special 3D structure consisting of four coplanar points and any two non-coplanar points using only single-view. For some structures, Zisserman and Maybank [7] showed that the invariant can be constructed by two-view without computing the epipolar geometry. However, we derived an invariant relationship
750
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
Fig. 11. 3-D drawing of each model.
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
by one-view, which is represented as a form of plane equation. Based on this plane equation, we proposed a method for combining the relationship with geometric hashing concept for recognizing three-dimensional objects. We showed that the invariant for the structure proposed by Zhu et al. [11] can be easily derived from the invariant relationship. With two-view for the
751
structure, we can also derive the invariant from the relationship. Since the structure is more general than the previously proposed structures, a hashing-based method was feasible for 3-D object recognition. Experiments using 3-D polyhedral objects demonstrate that the proposed invariant relationship can be further extended to a real 3-D object recognition.
Fig. 11. (Continued.)
Table 8 3-D coordinates and the planes of each model No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
1. 2. 3. 4. 5. 6. 7. 8.
(46.50, 25.00, 67.86) (67.04, 25.00, 55.14) (67.04, 50.00, 55.14) (46.50, 50.00, 67.86) (22.50, 25.00, 24.00) (50.00, 25.00, 24.00) (50.00, 50.00, 24.00) (22.50, 50.00, 24.00)
9. 10. 11. 12. 13. 14. 15.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (0.00, 50.00, 24.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
1. 2. 3. 4. 5. 6. 7. 8.
(33.00, 15.50, 59.50) (50.00, 33.00, 74.00) (33.00, 50.00, 74.00) (15.50, 33.00, 59.00) (33.00, 15.50, 24.00) (50.00, 33.00, 24.00) (33.00, 50.00, 24.00) (15.50, 33.00, 24.00)
9. 10. 11. 12. 13. 14. 14. 16.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (50.00, 50.00, 24.00) (0.00, 50.00, 0.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
No.
Points on plane
No.
Points on plain
No.
Points on plane
No.
Points on plane
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 2, 5, 6 9, 10, 12, 13
5 6 7
9, 11, 15, 12 2, 6, 7, 3 10, 13, 14, 7
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 4, 3, 7, 8
5 6 7 8
9, 13, 14, 10 10, 14, 15, 11 11, 15, 16, 12 12, 16, 13, 9
(a) model No. 1
(b) model No. 2 (¹able continued in next page)
752
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
Table 8 (Continued) No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
1. 2. 3. 4. 5. 6. 7. 8.
(14.00, 12.50, 50.00) (50.00, 12.50, 50.00) (50.00, 37.50, 50.00) (14.00, 37.50, 50.00) (0.00, 12.50, 24.00) (50.00, 12.50, 24.00) (50.00, 37.50, 24.00) (0.00, 37.50, 24.00)
9. 10. 11. 12. 13. 14. 15. 16.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (50.00, 0.00, 24.00) (0.00, 50.00, 0.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
1. 2. 3. 4. 5. 6. 7. 8.
(27.50, 12.50, 67.86) (44.54, 12.50, 55.14) (44.54, 37.50, 55.14) (27.50, 37.50, 67.86) (0.00, 12.50, 24.00) (27.50, 12.50, 24.00) (27.50, 37.50, 24.00) (0.00, 37.50, 24.00)
9. 10. 11. 12. 13. 14. 15. 16.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (50.00, 50.00, 24.00) (0.00, 50.00, 0.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
No.
Points on plane
No.
Points on plain
No.
Points on plane
No.
Points on plane
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 4, 3, 7, 8
5 6 7 8
9, 13, 14, 10 10, 14, 15, 11 11, 15, 16, 12 12, 16, 13, 9
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 4, 3, 7, 8
5 6 7 8
9, 13, 14, 10 10, 14, 15, 11 11, 15, 16, 12 12, 16, 13, 9
(c) model No. 3
(d) model No. 4
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
1. 2. 3. 4. 5. 6. 7. 8.
(25.00, 25.00, 59.50) (50.00, 25.00, 74.00) (50.00, 50.00, 74.00) (25.00, 50.00, 59.00) (25.00, 25.00, 24.00) (50.00, 25.00, 24.00) (50.00, 50.00, 24.00) (25.00, 50.00, 24.00)
9. 10. 11. 12. 13. 14. 15.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (0.00, 50.00, 0.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 0.00, 0.00)
1. 2. 3. 4. 5. 6. 7. 8.
(14.00, 50.00, 50.00) (14.00, 50.00, 50.00) (50.00, 25.00, 50.00) (25.00, 50.00, 50.00) (0.00, 50.00, 24.00) (0.00, 25.00, 24.00) (50.00, 25.00, 24.00) (50.00, 50.00, 24.00)
9. 10. 11. 12. 13. 14.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
No.
Points on plane
No.
Points on plain
No.
Points on plane
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 9, 12, 13, 10
5
11, 15, 12, 9
1 2 4 4
1, 2, 3, 4 2, 6, 3, 7 1, 5, 6, 2 9, 11, 12, 10
(e) model No. 5
(f ) model No. 6
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
No.
Coordinate (X, >, Z)
1. 2. 3. 4. 5. 6. 7. 8.
(14.00, 25.00, 50.00) (50.00, 25.00, 50.00) (50.00, 50.00, 50.00) (0.00, 50.00, 50.00) (14.00, 25.00, 24.00) (50.00, 25.00, 24.00) (50.00, 50.00, 24.00) (0.00, 50.00, 24.00)
9. 10. 11. 12. 13. 14.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
1. 2. 3. 4. 5. 6. 7. 8.
(14.00, 12.50, 50.00) (50.00, 12.50, 50.00) (50.00, 37.50, 50.00) (0.00, 37.50, 50.00) (14.00, 12.50, 24.00) (50.00, 12.50, 24.00) (50.00, 37.50, 24.00) (0.00, 37.50, 24.00)
9. 10. 11. 12. 13. 14. 15. 16.
(0.00, 0.00, 24.00) (50.00, 0.00, 24.00) (50.00, 50.00, 24.00) (0.00, 50.00, 0.00) (0.00, 0.00, 0.00) (50.00, 0.00, 0.00) (50.00, 50.00, 0.00) (0.00, 50.00, 0.00)
No.
Points on plane
No.
Points on plain
No.
Points on plane
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 9, 11, 12, 10
1 2 3 4
1, 2, 3, 4 1, 4, 8, 5 1, 5, 6, 2 4, 3, 7, 8
5 6 7 8
9, 13, 14, 10 10, 14, 15, 11 11, 15, 16, 12 12, 16, 13, 9
(g) model No. 7
(h) model No. 8
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
753
Fig. 12. The input images.
Fig. 13. The result of recognition for each input image.
References [1] M.H. Brill, A.B. Barrett, Closed-form extension of the an-harmonic ratio to N-Space, Comput. Vision Graphics Image Process. 23 (1983) 92}98. [2] D. Forsyth, J.L. Mundy, A. Zisserman, C. Coelho, C. Rothwell, Invariant descriptors for 3-D object recognition and pose, IEEE Trans. Pattern Anal. Mach. Intell. 13 (10) (1991) 971}991. [3] J.L. Mundy, A. Zisserman (Eds.), Geometric Invariance in Computer Vision, MIT Press, Cambridge, MA, USA, 1992. [4] J.B. Buns, R.S. Weiss, E.M. Riseman, The non-existence of general-case view invariants, in: J.L. Mundy, A. Zisserman (Eds.), Geometric Invariance in Computer Vision, MIT Press, Cambridge, MA, USA, 1992.
[5] E.B. Barrett, G. Gheen, P. Payton, Representation of three-dimensional object structure as cross-ratios of determinants of stereo image points, in: J.L. Mundy, A. Zisserman, D. Forsyth (Eds.), Applications of Invariance in Computer Vision, Springer, Berlin, 1993, pp. 47}68. [6] O. Fauseras, What can be seen in three dimensions with an uncalibrated stereo rig?, in: G. Sandini (Ed.), Proceedings of the Second European Conference on Computer Vision, Santa Margherita, Italy, Springer, Berlin, 1992, pp. 563}578. [7] A. Zisserman, S.J. Maybank, A case against epipolar geometry, in: J.L. Mundy, A. Zisserman, D. Forsyth (Eds.), Applications of Invariance in Computer Vision, Springer, Berlin, 1993, pp. 69}88. [8] L. Quan, Invariants of six points from 3 uncalibrated images, Proceedings of the fourth European Conference
754
K.S. Roh, I.S. Kweon / Pattern Recognition 33 (2000) 741}754
on Computer Vision, Stocholm, Sweden, 1994, pp. 459}470. [9] L. Quan, Invariants of six points and projective reconstruction from three uncalibated images, IEEE Trans. Pattern Anal. Mach. Intell. 17 (1) (1995) 34}46. [10] S. Zhang, G.D. Sullivan, K.D. Baker, The automatic construction of a view-independent relational model for 3-D object recognition, IEEE Trans. Pattern Anal. Mach. Intell. 15 (6) (1993) 531}544. [11] Y. Zhu, L.D. Seneviratne, S.W.E. Earles, A new structure of invariant for 3D point sets from a single view, Proceed-
ings 12th International Conference on Robotics and Automation, Nagoya, Japan, 1995, pp. 1726}1731. [12] C.A. Rothwell, D.A. Forsyth, A. Zisserman, J.L. Mundy, Extracting projective invariant from single views of 3D point sets, Oxford University TR-OUEL 1927/92, April, 1992. [13] J.G. Semple, G.T. Kneebone, Algebraic Projective Geometry, Oxford Science Publication, Oxford, 1952. [14] F.C.D. Tsai, Geometric hashing with line features, Pattern Recognition 27 (3) (1994) 377}389.
About the Author*KYOUNG SIG ROH received the B.S. degree in mechanical engineering from Yonsei University, Seoul, Korea, in 1987 and the M.E. degree in mechanical engineering, and Ph.D. degree in automation engineering from Korea Advanced Institute of Science and Technology (KAIST), Seoul, Korea, in 1989, 1998, respectively. He worked as a research engineer from 1989}1993 for the Samsung Advanced Institute of Technology. And he is currently a research sta! of System and Control sector at SAIT. His current research interests include object recognition and geometric invariant for intelligent system. About the Author*IN SO KWEON received the B.S. and M.E. degrees in mechanical design and production engineering from Seoul National University, Seoul, Korea, in 1981, 1983, respectively, and the Ph.D. degree in robotics from Carnegie Mellon University, Pittsburgh, PA, in 1990. During 1991 and 1992, he was a visiting scientist in the Information Systems Laboratory at Toshiba Research and Development Center, where he worked on behavior-based mobile robots and motion vision research. Since 1992 he has been an Associate Professor of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST). His current research interests include image sequence analysis, physics-based vision, invariants and geometry, and 3D range image analysis. He is a member of the IEEE and the computer society.
Pattern Recognition 33 (2000) 755}765
A chain code for representing 3D curves Ernesto Bribiesca* Department of Computer Science, Instituto de Investigaciones en Matema& ticas Aplicadas y en Sistemas, Universidad Nacional Auto& noma de Me& xico, Apdo. 20-726, Me& xico, D.F., 01000, Mexico Received 14 October 1998; accepted 25 February 1999
Abstract A chain code for representing three-dimensional (3D) curves is de"ned. Any 3D continuous curve can be digitalized and represented as a 3D discrete curve. This 3D discrete curve is composed of constant straight-line segments. Thus, the chain elements represent the orthogonal direction changes of the constant straight-line segments of the discrete curve. The proposed chain code only considers relative direction changes, which allows us to have a curve descriptor invariant under translation and rotation. Also, this curve descriptor may be starting point normalized for open and closed curves and invariant under mirroring transformation. In the content of this work the main characteristics of this chain code are presented. This chain code is inspired by the work of GuzmaH n (MCC Technical Report Number: ACA-254-87, July 13, 1987) for representing 3D Stick Bodies. Finally, we present some results of this chain code to represent and process 3D discrete curves as linear features over the terrain by means of digital elevation model (DEM) data. Also, we use this chain code for representing solids composed of voxels. Thus, each solid represents a DEM which is described by only one chain. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Chain code; 3D discrete curves; 3D shape description; 3D digitalization scheme; 3D curve representation
1. Introduction The study of 3D curve representations is an important part in computer vision. This work deals with 3D shape representation based on chain coding. Chain-code techniques are widely used because they preserve information and allow considerable data reduction, chain codes are the standard input format for numerous shape analysis algorithms. The "rst approach for representing digital curves using chain code was introduced by Freeman in 1961 [2]. Many authors have been using techniques of chain coding, this is due to the fact that various shape features may be computed directly from this representation [3}9]. The representation of 3D discrete curves by means of chain coding is an important challenge in computer
* Tel.: #525-622-3617; fax: #525-622-3620. E-mail address:
[email protected] (E. Bribiesca)
vision. A method for representing 3D digital curves using chain code was introduced by Freeman in 1974 [10]. GuzmaH n de"nes a canonical shape description for 3D stick bodies, which are those 3D bodies characterized by a juxtaposition of more or less elongated limbs meeting together at more or less corners [1]. Digital representation schemes for 3D curves have been presented by Jonas et al. [11]. A method for reconstructing 3D rigid curve using epipolar parameterization was presented by Zhao [12]. Other authors have been using di!erent techniques related with 3D shape description [13}17]. In this work, we present a chain code for representing 3D discrete curves. Discrete curves are composed of constant straight-line segments, two contiguous straightline segments de"ne a direction change and two direction changes de"ne a chain element. There are only "ve possible orthogonal direction changes for representing any 3D discrete curve. The proposed chain code only considers relative direction changes, which allows us to have a curve description invariant under translation and rotation. Also, it may be starting point normalized and invariant under mirroring transformation. This paper is
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 3 - X
756
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
organized as follows. In Section 2 we present the concepts and de"nitions of the proposed chain code. In Section 3 we describe some results of the proposed notation using real data. Finally, in Section 4 we give some conclusions.
2. Concepts and de5nitions Our purpose in this section is to present the proposed chain code for representing 3D discrete curves and its main characteristics. An important simpli"cation in this work is the assumption that discrete curves have been isolated from the real world, and are de"ned as a result of previous processing. Fig. 1(a) shows an example of a 3D continuous curve and (b) illustrates the discrete representation of the curve shown in (a). Notice that the 3D discrete curve in Fig. 1(b) is composed of straight-line segments of the same length. In the content of this work, the length l of each straight-line segment is considered equal to one. The boundaries or contours of any 3D discrete shape composed of constant straight-line segments can be represented by chains. In order to introduce the proposed chain code, a number of de"nitions are presented below: De5nition 1. An element a of a chain indicates the i orthogonal direction changes of the contiguous straightline segments of the discrete curve in that element position. There are only "ve possible direction changes for representing any 3D discrete curve, which indicate relative direction changes, such as shape numbers [8], but speci"ed in three dimensions. Freeman chains [10] use absolute directions for representing discrete curves.
Fig. 2. The "ve possible direction changes for representing 3D discrete curves: (a) the element `0a; (b) the element `1a; (c) the element `2a; (d) the element `3a; (e) the element `4a; (f ) an example of a discrete curve; (g) the "rst element of the chain; (h)}(m) the next elements of the chain.
Fig. 2 illustrates the "ve possible relative direction changes (which are represented by numbers) for representing 3D curves: in (a) the element `0a represents the direction change which goes straight through the contiguous straight-line segments following the direction of the last segment; (b) shows the element `1a which indicates a direction change to the right; (c) illustrates the element `2a which represents a direction change upward (staircase fashion); in (d) the element `3a indicates a direction change to the left; "nally, the element `4a shown in (e) means that the direction change is going back. Therefore, two contiguous straight-line segments de"ne a direction change, and two direction changes de"ne a chain element. The de"nition of these direction changes was based on GuzmaH n's notation for turns [1]. However, in order to improve the proposed chain code, we have changed GuzmaH n's proposed-digits for turns. De5nition 2. A chain A is an ordered sequence of elements, and is represented by A"a a 2a "Ma : 1)i)nN, 1 2 n i where n indicates the number of chain elements.
(1)
2.1. How to obtain the chain of a given curve Fig. 1. An example of a 3D curve: (a) a 3D continuous curve; (b) the discrete representation of the curve shown in (a).
The chain of a curve is obtained by calculating the relative direction changes around the curve. Thus, the
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
757
obtained chain will be composed of a "nite number of elements represented by base-"ve digits mentioned above. Fig. 2(f ) shows an example of a discrete curve, the origin of this curve is considered at the lower side and is represented by a point. Fig. 2(g) illustrates the "rst element of the chain which corresponds to the element `2a, note that the "rst direction change (which is composed of two contiguous straight-line segments) is used only for reference. Fig. 2(h) shows the next element obtained of the chain, which is based on the last direction change of the "rst element; this second element corresponds to the number `3a which indicates a direction change to the left. Fig. 2(i)}(m) illustrate the next elements obtained of the chain step by step. Fig. 2(m) shows the discrete curve and its corresponding chain, which is composed of seven elements. De5nition 3. The length ¸ of a chain is the sum of the lengths of its elements, i.e. ¸ may be expressed as ¸"(n#2) l,
(2)
where l is the length of each straight-line segment, which is considered equal to one. The length ¸ of the chain shown in Fig. 2(m) is 9.
Fig. 3. Independence of rotation: (a) a discrete curve and its chain; (b)}(d) rotations of the curve shown in (a) on the axis `Xa; (e)}(g) rotations on the axis `>a; (h)}( j) rotations on the axis `Za.
2.2. Independence of rotation The chain code here proposed is invariant under rotation, this is due to the fact that relative direction changes are used. Fig. 3 illustrates the invariance under rotation using this chain code. Fig. 3(a) shows the discrete curve presented in Fig. 2(m) and its corresponding chain. Fig. 3(b)}(d) show some rotations of the discrete curve shown in Fig. 3(a) as rigid transformations of R3 about the axis of rotation`Xa. Fig. 3(e)}(g) show rotations of the curve performed on the axis of rotation `>a. Finally, Fig. 3(h)}( j) illustrate rotations of the curve performed on the axis of rotation `Za. Note that all chains are equal. Therefore, they are invariant under rotation. 2.3. The inverse of a chain The inverse of a chain is another chain formed of the elements of the "rst chain arranged in reverse order, i.e., the chain obtained by traveling the discrete curve in one direction is just the reverse of the chain obtained traveling the same discrete curve in the opposite direction [1]. Fig. 4(b) shows the inverse of the chain presented in (a), notice that the elements of the inverse of the chain shown in (a) are arranged in reverse order. Fig. 4(c) shows a discrete curve and its chain, which has some zero elements. When we are traveling a curve in order to obtain its chain elements and "nd zero elements, we need to know what non-zero element was the last one in order to de"ne the next element. In the case shown in Fig. 4(c), the
"rst found element `4a was obtained with reference to the previous element (`2a) which is not a `0a. In this manner orientation is not lost. So, the inverse of the chain shown in Fig. 4(c) corresponds to the chain presented in (d), notice that the order of one element of the inverse of the chain is shifted when there are zero elements. 2.4. Independence of starting point for open curves Using the concept of the inverse of a chain, this notation may be starting point normalized by choosing the starting point so that the resulting sequence of elements forms an integer of minimum magnitude [18]. For instance, the chain of the open curve shown in Fig. 4(a) represents the following integer number: 2334123; and the chain shown in (b) represents the number 3214332, respectively. Thus, the integer of minimum magnitude corresponds to the chain shown in Fig. 4(a). Therefore, this chain is starting point normalized. 2.5. Independence of starting point for closed curves The closed curves described via the proposed chain code may be made invariant under starting point, by choosing the starting point so that the resulting sequence of elements forms an integer of minimum magnitude. Therefore, the chain of the 3D discrete curve presented in Fig. 5(a) may be invariant under starting point by
758
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
Fig. 4. The inverse of a chain: (a) a discrete curve and its chain; (b) the inverse of the chain presented in (a); (c) a discrete curve and its chain, which has some zero elements; (d) the inverse of the chain shown in (c).
rotating the digits until the number is minimum. Finally, Fig. 5(b) shows the chain of the 3D discrete curve shown in Fig. 5(a), which is already invariant under starting point.
2.7. Curve comparison De5nition 5. Two discrete curves in R3 are isoperimetric if they have the same chain length, or perimeter.
2.6. Invariance under mirroring transformation The proposed curve representation may be made invariant under mirroring transformation by means of the following de"nition. De5nition 4. The chain of the mirror of a 3D discrete curve is another chain (termed mirroring chain) whose elements `1a are replaced by elements `3a and vice versa. Fig. 6 illustrates the mirroring transformation. In Fig. 6(a) the mirroring plane is aligned with the standard plane `X>a, notice that the elements `1a and `3a of the mirroring chain were changed. In Fig. 6(b) the mirroring plane is aligned with the plane `XZa and in (c) with the plane `>Za, respectively.
Using the above-mentioned invariants, we have already a unique curve descriptor based on the proposed chain code. Thus, to determine if two isoperimetric curves have the same shape, it is only necessary to see if their chains are equal. Also, it is possible to decide whether or not a certain given local shape occurs within another shape to compare their chains or parts of them.
3. Results In this section, we present some examples of representation of 3D curves by means of the proposed chain code using Digital Elevation Model (DEM) data. DEMs
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
759
Fig. 6. The invariance under mirroring transformation: (a) the mirroring plane is aligned with the standard plane `X>a, (b) with the plane `XZa, and (c) with the plane `>Za, respectively.
Fig. 5. Independence of starting point for closed curves: (a) an example of a 3D discrete curve and its corresponding chain; (b) the chain of the closed curve shown in (a), which is already invariant under starting point.
are digital representation of the Earth's surface. Generally speaking, a DEM is generated as a uniform rectangular grid organized in pro"les. In this case, DEMs are represented as binary solids composed of a large number of voxels. The digitalization of these models is based on 1 : 250,000 scale contours. In the presented examples, we use DEM data provided by the Instituto Nacional de Estadistica, Geogra"a e InformaH tica, MeH xico (INEGI). Fig. 7 shows the DEM of the volcano `Iztaccihuatla (which means `sleeping womana). This volcano is to the east of the Valley of MeH xico. In Fig. 7(a) this volcano is represented by a 3D mesh of 150]150 elements. Fig. 7(b) illustrates the volcano `Iztaccihuatla as a binary solid composed of 428292 voxels. The method for transforming DEM data into voxels was presented in [19] and is as follows: `(1) calculate the minimum elevation of the given DEM; (2) subtract the minimum elevation from all elevations of the model and increase them by one; and (3) generate a 3D array of voxels considering the same resolution of the model, where each elevation value is equivalent
to the number of voxels in that position, which are located at spatial coordinates (row, column, slide). Thus, each proxle of a given DEM corresponds to a slide of its 3D array of voxelsa. In order to plot our DEM data e$ciently, we use the concept of contact surface area for binary solids composed of a larger number of voxels, which was presented in Ref. [19]. Thus, there is a relation between the areas of the enclosing surface and the contact surface, which is as follows: 2A #A"Fn, #
(3)
where A is the contact surface area, A is the area of the # enclosing surface, F is the number of the faces of the voxel times the area of the face (in this case, the area of the face is considered equal to one), and n is the number of voxels. Thus, the contact surfaces corresponds to the hidden faces of the solid and the enclosing surface area to the sum of the areas of visible faces, respectively. Therefore, when a solid is plotted, the contact surfaces must be eliminated from the plotting, this decreases greatly the computation. The voxels have a structural problem, there are three ways of connecting voxels: by edges, vertices, and faces (these forms of connectivity are shown in the Fig. 8(a), (b), and (c), respectively), the combination of these forms of connectivity produces the twenty six connectivity, which is shown in Fig. 8(e). Fig. 8(d) illustrates the six connectivity, i.e. face-connected voxels. In the content of this paper we use voxels with six connectivity.
760
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
Fig. 7. The DEM of the volcano `Iztaccihuatla: (a) the volcano represented by a 3D mesh of 150]150 elements; (b) the volcano represented by a binary solid composed of 428292 voxels.
3.1. 3D curve description as linear features over the terrain
Fig. 8. The structural problems of voxels: (a) connectivity by edges; (b) connectivity by vertices; (c) connectivity by faces; (d) six connectivity; (e) twenty six connectivity.
Many linear features over the terrain may be described using the proposed chain code notation. These linear features are described as 3D discrete curves, which are represented by the only "ve possible direction changes mentioned above. Fig. 9(a) illustrates the DEM of the volcano `Iztaccihuatla as a binary solid composed of voxels, and a 3D discrete curve as an example of a linear feature over the terrain. This 3D discrete curve is composed of 211 straight-line segments and is marked with bold lines. Fig. 9(b) shows this 3D discrete curve and its corresponding chain elements. Notice that in order to observe the chain elements the curve was scaled up. Thus, this discrete curve is represented by a chain composed of 209 elements, is invariant under translation and rotation, and is starting point normalized. Furthermore, this curve representation preserves information and allows considerable data reduction.
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
761
Fig. 9. 3D curve description: (a) the DEM of the volcano `Iztaccihuatla as a binary solid and a 3D discrete curve as an example; (b) the 3D discrete curve composed of 209 elements.
3.2. How to represent binary solids composed of voxels via the proposed chain code When 3D objects are represented by means of spatial occupancy arrays, much storage is used if resolution is high, since space requirements increase as the cube of linear resolution [18]. In order to have a better representation for binary solids, we will try to describe binary solids composed of voxels by means of the proposed chain code. Most binary solids composed of voxels require one or more chains to be described them. In the content of this paper we present solids which may be
described by only one chain. Fig. 10(a) presents an example of a binary solid composed of voxels. Fig. 10(b) illustrates the 3D discrete curve which represents the solid shown in Fig. 10(a), the chain of this curve was obtained using the concepts of the proposed chain code. Fig. 10(c) illustrates another orientation to obtain the discrete curve which encloses the solid. Notice that curves representing solids depend on the selected orientation, this produces di!erent curves and therefore di!erent chains. Fig. 11(a) shows the solid presented in Fig. 10(a). Fig. 11(b) illustrates the selected orientation and (c)
762
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
Fig. 10. An example: (a) a binary solid; (b) the 3D discrete curve which encloses the solid shown in (a); (c) other orientation to obtain the 3D discrete curve.
&&&&&&&&&&&&&&&&&&&&&&&&c Fig. 11. The 3D discrete curve already invariant under rotation: (a) the binary solid presented in Fig. 10(a); (b) the selected orientation; (c) the visible lines of the 3D discrete curve; (d)}(i) di!erent rotations of the curve.
Fig. 12. The DEM of the volcano `Popocatepetla: (a) the volcano represented by a 3D mesh of 70]100 elements; (b) the volcano represented by a binary solid composed of 149691 voxels; (c) the volcano represented by only one 3D discrete curve.
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
763
Fig. 13. The 3D discrete curve of the volcano `Popocatepetla: (a)}(d) di!erent rotations of the discrete curve.
presents the visible lines of the discrete curve. Finally, Fig. 11(d)}(i) present di!erent rotations of the curve which is already invariant under rotation. DEMs may be represented by only one chain. Firstly, we have to select the appropriate orientation, if we select the same orientation like contours (this orientation corresponds to the orientation of the standard plane `X>a), then several chains may be produced depending on the number of hills of the terrain. On the contrary, if we select the orientation which corresponds to the orientation of the standard plane `XZa, then we can represent the model by means of only one chain. This is due to the fact that this orientation has not protruding voxels. Fig. 12 illustrates the DEM of the volcano `Popocatepetla, which is to the east of the Valley of MeH xico. Fig. 12(a) shows this volcano represented by a 3D mesh of 70]100 elements. Fig. 12(b) illustrates the volcano as a binary solid composed of 149691 voxels. Fig. 12(c) illustrates the 3D discrete curve, which encloses the solid. Notice that the model is represented by only one discrete curve which has no inner crossings. This curve is now represented by the proposed chain code and is composed of 18883 elements. A large amount of chain elements correspond to zero elements which may be compacted. Thus, this allows considerable data reduction.
Fig. 13 illustrates some capabilities of the proposed chain code such as its invariance under translation and rotation. Fig. 13(a)}(d) show di!erent rotations of the discrete curve, which represents the volcano `Popocatepetla. Finally, in order to observe in detail the discrete curve: Fig. 14 shows a zoom and inner view of the 3D discrete curve. Notice that this view is presented in perspective.
4. Conclusions In this work, a chain code for representing 3D discrete curves is de"ned. The proposed chain code is invariant under translation and rotation, and optionally, under starting point and mirroring transformation. Thus, a unique curve descriptor is generated, which allows us to perform curve comparison easily. A number of concepts, de"nitions, and examples are presented, which allow us to "nd some interesting properties of curves, such as: curve comparison, discrete curve representation, and object representation for binary solids composed of voxels. We use the proposed chain code for representing a DEM as a binary solid by means of only one chain. This may be extended to represent range images.
764
E. Bribiesca / Pattern Recognition 33 (2000) 755}765
Fig. 14. A zoom and inner view of the 3D discrete curve, which encloses the DEM of the volcano `Popocatepetla.
Acknowledgements This work was in part supported by the REDII CONACYT. I thank Dr. Adolfo GuzmaH n for his valuable comments. Also, I wish to express my gratitude to Dr. Richard G. Wilson for his help in reviewing this work. DEM data used in this study was provided by INEGI.
References [1] A. GuzmaH n, Canonical shape description for 3-d stick bodies, MCC Technical Report Number: ACA-254-87, Austin, TX 78759, 1987. [2] H. Freeman, On the encoding of arbitrary geometric con"gurations, IRE Trans. on Electron. Comput. EC-10 (1961) 260}268. [3] J.W. Mckee, J.K. Aggarwal, Computer recognition of partial views of curved objects, IEEE Trans. Comput. C-26 (1977) 790}800. [4] M.D. Levine, Vision in Man and Machine, McGraw-Hill, New York, 1985. [5] F. Kuhl, Classi"cation and recognition of hand-printed characters, IEEE Int. Conv. Record. Part 4 (1963) 75}93. [6] R.D. Merrill, Representation of contours and regions for e$cient computer search, Commun. ACM 16 (1969) 534}549.
[7] G.S. Sidhu, R.T. Boute, Property encoding: applications in binary picture encoding and boundary following, IEEE Trans. Comput. C-21 (1972) 1206}1216. [8] E. Bribiesca, A. GuzmaH n, How to describe pure form and how to measure di!erences in shapes using shape numbers, Pattern Recognition 12 (1980) 101}112. [9] A. Blumenkrans, Two-dimensional object recognition using a two-dimensional polar transform, Pattern Recognition 24 (1991) 879}890. [10] H. Freeman, Computer processing of line drawing images, ACM Computing Surveys 6 (1974) 57}97. [11] A. Jonas, N. Kiryati, Digital representation schemes for 3D curves, Pattern Recognition 30 (1997) 1803}1816. [12] C.S. Zhao, Epipolar parameterization for reconstructing 3D rigid curve, Pattern Recognition 30 (1997) 1817}1827. [13] C.E. Kim, Three-dimensional digital segments, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-5 (1983) 231}234. [14] A. Rosenfeld, Three-dimensional digital topology, Inform. and Control 50 (1981) 119}127. [15] R. Vaillant, O. Faugeras, Using extremal boundaries for 3D object modeling, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-14 (2) (1992) 157}173. [16] B. Bascle, R. Deriche, Stereo matching, reconstruction and re"nement of 3D curves using deformable contours, in: Proceedings of the Fourth International Conference on Computer Vision, Berlin, Germany May 1993, pp. 421}430.
E. Bribiesca / Pattern Recognition 33 (2000) 755}765 [17] F. Cohen, J. Wang, Part I: Modeling image curves using invariant 3D object curve models-a path to 3D recognition and shape estimation from image contours, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-16 (1) (1994) 1}12.
765
[18] D.H. Ballard, C.M. Brown, Computer Vision, PrenticeHall, Englewood Cli!s, NJ, 1982. [19] E. Bribiesca, Digital elevation model data analysis using the contact surface area, Graphical Models Image Process. 60 (1998) 166}172.
About the Author*ERNESTO BRIBIESCA received the B.Sc. degree in electronics engineering from the Instituto PoliteH cnico Nacional in 1976. He received the Ph.D. degree in mathematics from the Universidad AutoH noma Metropolitana in 1996, he was researcher at the IBM Latin American Scienti"c Center, and at the DireccioH n General de Estudios del Territorio Nacional (DETENAL). He is associate editor of the Pattern Recognition journal. He has twice been chosen Honorable Mention winner of the Annual Pattern Recognition Society Award. Currently, he is Professor at the Instituto de Investigaciones en MatemaH ticas Aplicadas y en Sistemas (IIMAS) at the Universidad Nacional AutoH noma de MeH xico (UNAM), where he teaches graduate courses in Pattern Recognition.
Pattern Recognition 33 (2000) 767}785
Hybrid stereo matching with a new relaxation scheme of preserving disparity discontinuity Kyu-Phil Han, Tae-Min Bae, Yeong-Ho Ha* School of Electronic and Electrical Eng., Kyungpook Nat'l University, Taegu 702}701, South Korea Received 5 November 1998; accepted 29 March 1999
Abstract A hybrid stereo matching algorithm using a combined edge- and region-based method is proposed to take the advantage of each technique, i.e. an exactly matched point and a full resolution disparity map. Region-based matching is typically more e$cient than edge-based matching, however, a region-based matcher lacks the capability of generating an accurate "ne resolution disparity map. The generation of such a map can be better accomplished by using edge-based techniques. Accordingly, regions and edges both play important and complimentary roles in a binocular stereo process. Since it is crucial that an e$cient and robust stereo system utilizes the most appropriate set of primitives, a nonlinear Laplacian "lter is modi"ed to extract proper primitives. Since each pixel value of a second-order di!erentiated image includes important information for the intensity pro"le, information such as edge-, signed-, and zero-pixels obtained by the modi"ed nonlinear Laplacian "lter, is used to determine the matching strategy. Consequently, the proposed matching algorithm consists of edge-, signed-, and zero- or residual-pixel matching. Di!erent matching strategies are adopted in each matching step. Adaptive windows with variable sizes and shapes are also used to consider the local information of the pixels. In addition, a new relaxation scheme, based on the statistical distribution of matched errors and constraint functions which contain disparity smoothness, uniqueness, and discontinuity preservation, is proposed to e$ciently reduce mismatched points in unfavorable conditions. Unlike conventional relaxation schemes, the erosion in an abrupt area of a disparity map is considerably reduced because a discontinuity preservation factor based on a survival possibility function is added to the proposed relaxation. The relaxation scheme can be applied to various methods, such as block-, feature-, region-, object-based matching methods, and so on, by modifying the excitatory set of the smoothness constraint function. Experimental results show that the proposed matching algorithm is e!ective for various images, even if the image has a high content of noise and repeated patterns. The convergence rate of the relaxation and the output quality are both improved. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Stereo matching; Edge- and region-based matching; Adaptive window; Relaxation; Smoothness and uniqueness constraints; Excitatory and inhibitory inputs; Disparity discontinuity preservation
1. Introduction In a pair of eyes, each eye receives slightly di!erent images of the world due to its distinct position. Di!erences between the left and right images create binocular
* Corresponding author. Tel.: #82-53-950-5535; fax: #8253-957-1194. E-mail address:
[email protected] (Y-H. Ha)
disparities. The Human Visual System (HVS) can detect and use these disparities to recover information about the three-dimensional structure of the scene being viewed. The stereo correspondence problem makes the explicit disparities of all points common to both images. A great deal of computer vision research has addressed this problem, because disparity contains useful information for various applications such as object recognition, inspection, and manipulation, etc. A range-sensing system for these tasks is often required for the accurate and e$cient provision of a complete
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 5 - 3
768
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
depth or disparity map for an entire "eld of view. Range"nding techniques can be loosely classi"ed as either active or passive. Active techniques utilize arti"cial sources of energy, such as ultrasonic and laser, to illuminate the workspace, however, passive techniques do not require such energy sources. Popular active techniques include: contrived lighting approaches and direct range "nders based on time-of-#ight measurements. Common examples of passive techniques include: stereo, both binocular and photometric: shape from shading, shape from texture, and focusing methods. The contrived-lighting approach involves illuminating the scene with a controlled lighting pattern and interpreting the projection of the pattern to derive a depth pro"le of the scene [1,2]. Such active illumination can be disadvantageous in an outdoor or hostile environment. In these situations, this method may also fail because of the specular re#ectivity of the objects appearing in the scene. Passive techniques for range sensing typically require a simpler and less expensive setup than active approaches. The binocular stereo approach falls into this category. Yet the disadvantages of this approach are that it requires many photometric assumptions and has a high computational cost. In the binocular stereo approach, the di!erence between the relative positions of two digital or digitized images taken from di!erent points is measured to "nd range information like HVS. The existing techniques for stereo matching are grouped into two categories. One is feature-based methods and the other is intensity- or area-based methods [3]. The feature-based methods use zero-crossing points [4], edges [5], line segments, corner points, and conics [6], etc. Since these types of primitives are relatively sparse in images, a complicated interpolation process including occlusion modeling and disparity continuity should be taken into account to obtain a full resolution disparity map; plus they require more careful and explicit matching rules to eliminate false targets. However, they do have accurate disparity values at the feature points. Marr and Poggio [7], Grimson [4], Frisby and Pollard [8], etc. used these primitives. Since intensity-based methods use dense low-level features and intensity values themselves, a feature extraction and an interpolation process are not necessary and a dense disparity map can be obtained; however, they are sensitive to noise and small intensity di!erences. Consequently, recently proposed enhancements of stereo approaches include a coarse-to-"ne strategy [4,9,10] and some constraints [4,8,11,12] such as uniqueness, ordering, and smoothness, etc. Other matching strategies using a windowed Fourier phase [13], segmented regions [14], wavelet transformed images [12], chromatic information [15], neural networks [11,16], and a multiple-baseline [17] have been studied. In this paper, a hybrid approach including an edgeand region-based matching method is proposed. The
proposed method includes the advantages of both edgebased methods, which give accurate matched points, and region-based methods, which can produce a full-resolution disparity map [14]. In order to extract the proper features for stereo matching, the nonlinear Laplacian "lter [18] is modi"ed and used. The nonlinear Laplacian "lter is more e$cient than the family of Gaussian "lters, because it has no multiplication and can be easily implemented by mathematical morphology operators such as dilation and erosion. The Modi"ed Nonlinear Laplacian (MNL) operator includes an odd Hierarchical Discrete Correlation (HDC) for fast "ltering [19], thresholding for weak edge elimination, region growing for strong edge linking, and an edge re"nement process. After MNL "ltering, zero-crossing points, positive-, negative-, and zero-regions are used as the matching primitives in the proposed algorithm. Then, three matching strategies are accomplished according to the type of pixel, i.e. zero crossing, signed-, and zero-pixels. Since the primitives are obtained using a second-order di!erentiation, they include important topological, information, i.e. edge (zero-crossing point), signed-, and zero-pixels imply a transition point, a convex or concave area, and a smooth area of the intensity pro"le, respectively. The size and shape of the windows are also important factors for signal matching [12,20]; thus locally adaptive windows are used in each matching step. In addition, a relaxation algorithm is proposed which can reduce false matches based on a distribution of matched errors and a possibility value subject to various constraints including uniqueness, smoothness, and a discontinuity preservation factor of the disparity. The general scheme of stereo matching is mentioned in Section 2. The proposed feature extraction "lter and the proposed stereo matching algorithm are illustrated in Section 3. Section 4 presents the feasibility of the proposed algorithm demonstrated through experimental results for synthetic and real scene images, plus the e$ciency of the proposed relaxation scheme is evaluated. Finally, the conclusion is given in Section 5.
2. Stereo matching scheme Stereo matching is typically achieved by "ve steps which include (1) image formation, (2) feature extraction, (3) feature identi"cation under some criteria such as similarity and consistency, (4) disparity calculation, and (5) calculation of the actual range according to camera geometry. The third step, which deals with matching or correspondence, is the most important part of the binocular stereo approach. All approaches for image matching follow these procedures but use di!erent image features, matching measures, and strategies. In general, since matching measures and matching strategies strongly depend on the attribute of features, the selection of
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
a matching strategy according to a feature is important [3]. There are some additional schemes such as interpolation, relaxation [21], and dynamic programming [22], etc. An interpolation scheme based on a consistency criterion can obtain a dense disparity "eld from a sparse feature map. Relaxation schemes are commonly applied to acquire more #exible solutions in complex optimization problems and take both similarity and consistency into account. However, since most relaxation methods are iterative, dynamic programming either assists a relaxation method to speed up convergence or optimizes the cost function. Therefore, possible multi-schemes and both intensity- and feature-based methods may be considered to obtain stable and accurate matching results.
3. The proposed matching algorithm The proposed stereo matching algorithm consists of feature extraction, three matching steps, and relaxation, as shown in Fig. 1. First, in order to extract features suitable for stereo matching, some processes that decrease matching ambiguities are added to the nonlinear Laplacian "lter. Since the characteristics of edge, signed, and residual pixels extracted by MNL are di!erent, varying strategies are applied in each matching step. Locally adaptive windows varying in size and shape are also considered. If a point with a minimum matching error is determined as a disparity of the pixel, a result that only considers similarity can be achieved. Accordingly, to acquire stable results, a relaxation scheme based on some constraints is inserted to consider both similarity and
Fig. 1. Block diagram of the proposed stereo matching algorithm.
769
consistency. The Mean of the Absolute Di!erences (MADs) of intensity obtained in each pixel matching, is normalized according to the size of the matching window and then transformed into a possibility value based on the statistical distribution of the MADs. Finally, a disparity is determined by the reciprocal action between the possibility of the current point and its neighbor possibility values. 3.1. Feature extraction using a modixed nonlinear Laplacian xlter The matching primitives, used as features, including the edge, positive, negative, and zero pixels for a 1-D signal are illustrated in Fig. 2. They all contain topological information on the intensity pro"le, such as smooth and transition areas. A new "lter has thus been designed to extract these matching primitives from an image. The "lter consists of four parts, including low-pass "ltering using an odd HDC, second-order di!erentiation with a nonlinear Laplacian "lter, weak-edge elimination using local variance and strong edge linking by region growing, and edge and region determination.
Fig. 2. The relation between an intensity and a feature pro"le. (a) Original intensity pro"le, (b) after low pass "ltering, (c) after "rst-order di!erentiation, (d) after second-order di!erentiation.
770
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
3.1.1. Odd hierarchical discrete correlation A mathematical problem is de"ned to be well-posed in the Hadamard sense if its solution both exists and satis"es uniqueness and continuity for the initial data. However, there are some ill-posed conditions of the sense in a di!erentiation operator. Accordingly, a regularization process has been studied to minimize this di!erentiation problem. Torre and Poggio [23] found that a regularized di!erentiation of image data could be performed by convolving the data with the "rst derivative of a cubic spline "lter, which is very similar to the Gaussian function. Generally, the stabilizing operator, such as the Gaussian or some other low-pass "lter, is convolved with the original image before the "rst- or the second-order di!erentiation. There is a method for computing correlations which is particularly well suited for image processing [19]. This method, called Hierarchical Discrete Correlation, or HDC, is computationally e$cient, typically requiring one or two orders of magnitude fewer computational steps than direct correlation or correlation computed in the spatial frequency domain using the Fast Fourier Transform (FFT) [19]. In addition, the method simultaneously generates correlations for kernels of many sizes. Some of these kernels closely approximate the Gaussian probability distribution, so that the correlation is equivalent to low-pass "ltering. The principle underlying the HDC is that the correlation of function with certain large kernels can be computed as a weighted sum of correlations with smaller kernels, and these in turn can be computed as weighted sums of correlations with still smaller kernels. The kernels at each iteration of the HDC computation di!er in size by a factor r, or the order of the hierarchical correlation. Let f (x) be a function de"ned only at integer values
of x. Also let w(x) be a discrete weighting function de"ned at integral x and nonzero for !m)x)m. The odd hierarchical discrete correlation is de"ned as a set of correlation functions g (x) which are obtained from f and l w as follows: g (x)"f (x), o m g (x)" + w(i)g (x#irl~1) for l'1. l l~1 i/~m
(1)
Function g is obtained from f through l recursions of l a correlation-like operation using the weighting function w(x). Thus l is the level of g (x) in the HDC. And g (x) is l l de"ned as the sum of k"2m#1 values of g (x) which l~1 are separated by multiples of the distance rl~1. This sample distance grows geometrically by the factor r from level to level, so r is the order of the HDC and k is called the width of the generating kernel. This odd HDC is illustrated graphically in Fig. 3. In order to insure convergence and low-pass "ltering, the generating kernel must satisfy four constraints including unimodality, symmetry, normalization, and equal distribution. When a+0.4, the best "t Gaussian is obtained [19]. Therefore, a"0.4, b"0.25, and c"0.05 are used as the weights of the generating kernel in this paper. 3.1.2. A nonlinear Laplacian xlter A discrete version of a one-dimensional di!erentiation can be generally represented as *l(m, n) "I(m#l, n)!I(m, n) or I(m, n)!I(m!1, n) *m (2)
Fig. 3. Graphical representation of an odd HDC. The generating kernel is shown as a pattern of arrows between successive levels, sample values at level l are weighted by a, b, c and summed to obtain the value of a single sample at level l#1. The order, r, is 2 in this example.
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
where I(m, n) denotes the gray level in a (m, n) point of an image. A nonlinear gradient [18] is de"ned by NG[I(m, n)]" max [I(m#k, n#l)]!I(m, n) (k,l)|W or I(m, n)! min [I(m#k, n#l)] (3) (k,l)|W where = denotes M]M windows. And k and l represent a searching range in row and column directions respectively. Its characteristics include nonsensitivity to noise and granularity. It detects valleys as well as edges. A nonlinear Laplacian [18] can be de"ned by N¸[I(m, n)] " max [I(m#k, n#l)]!I(m, n)!MI(m, n) (k,l)|W ! min [I(m#k, n#l)]N. (4) (k,l)|W Its implementation is very simple because there is no multiplication and its responses are integer values. It also has a close relation to the mathematical morphological gradient operator and can, therefore, detect a correct edge point due to unbiased characteristics [18]. 3.1.3. Weak edge elimination and region growing When an edge operator is convolved in an image, its response relates to the window size of the operator. In general, a very sensitive and noisy response occurs with a small window as shown in Fig. 4, so that matching
771
ambiguities are increased. With a large window, even if it is insensitive to noise and small intensity di!erence, the edge pixel is shifted as shown in Fig. 5. Consequently, the intensity pro"le will not match with the edge image. This problem is critical in signal matching. In order to prevent edge pixels from shifting, a di!erentiation operator with a small window size should be used and weak edges must be eliminated to reduce matching ambiguities. Since a strong edge point includes large edgeness and a notable variation of intensity, the elimination process can be conducted by local variance. Accordingly, a simple threshold technique is adopted to eliminate a weak edge that has a small local variance. However, one edge contour can be separated into several segments by thresholding as shown in Fig. 6 and thus it is di$cult to "nd a proper threshold. Therefore, to reduce the in#uence of the threshold value and satisfy the connectivity of an edge, a region growing process is inserted starting from the remaining pixels after thresholding. The spacee$cient two-pass labeling algorithm [24] is used as the region growing method. In this paper, a region is de"ned as a blob that has the same sign or value after secondorder di!erentiation. Since zero-crossing points are typically detected by a sign change in the di!erentiated image, a region that includes signed- or zero-pixels is important as well as an edge. Several experiments were conducted to "nd a proper threshold. It was shown that the mean is appropriate as the threshold because the variance is quite diverse according to the image characteristics. Fig. 4 shows the
Fig. 4. Second-order di!erentiated images of (a) `girla, (b) `lennaa, and (c) `pentagona processed by a conventional nonlinear Laplacian operator with a 5]5 mask. White, gray, and black blobs denote the zero-, negative-, and positive-regions, respectively. (d), (e), and (f ) are edge images extracted from (a), (b), and (c), respectively. The size of all three images is 256]256.
772
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 5. Region and edge images processed by a conventional nonlinear Laplacian operator with a 15]15 mask.
Fig. 6. Region and edge images after the elimination of weak edge points using the mean of local variances.
edge image obtained by a conventional NL operator with a 5]5 mask. Figs. 6 and 7 show images with eliminated weak point edges and processed in growing steps, respectively. 3.1.4. Merging of isolated zero-crossing points and edge determination After second-order di!erentiation, edges are typically formed at the intersection of two regions that have a different sign. However, after nonlinear Laplacian "ltering,
about 10% of all edge pixels are correctly quantized to zero between di!erent regions. Though these pixels are correct zero-crossing points and have accurate matching points, they rarely appear on the image. Therefore, these isolated points do not act as dominant pixels in the relaxation process. The reason is because there are no neighboring homogeneous pixels. In order to preserve these points in the latter relaxation process, these pixels are merged into the neighbor region that has the nearest value to zero. Fig. 8 shows the merging process. As
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
773
Fig. 7. Region and edge images after region growing.
Fig. 8. An example of isolated zero-crossing points merging. The signs of `#a, `!a, and `0a are the signs of the pixels after second-order di!erentiation, respectively. The circles denote the pixel which has the nearest value to zero among 8-neighbor pixels: (a) Before merging, (b) after merging.
a result, there are no isolated zero-crossing points on the entire edge. In conventional edge operators, edge determination only depends on the "ltered sign. In this paper, the pixel response as well as the sign is considered for edge determination. Thus, the pixel with the minimum value between two pixels with opposite signs is determined as the edge. An example of feature images obtained by a MNL "lter for matching is shown in Fig. 9. 3.2. Matching 3.2.1. Edge pixel matching General edge features such as direction and intensity are not the only features used in edge-based methods,
Fig. 9. Feature images of the `pentagona pair extracted by a MNL operator: (a) Left edge, (b) right edge, (c) left region, and (d) right region image, respectively.
as variable windows are also considered in edge pixel matching. The sign change of an edge has been used as a good feature, however, it often changes in occlusion boundaries. Consequently, sign is excluded from the feature set. Eight-directional compass operators based on a Sobel operator, as shown in Fig. 10, are used to "nd the direction of an edge pixel. The angle with the maximum response among eight masks is determined as the direction. Then, edge pixel matching is performed. Several
774
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 10. Eight-directional compass operator.
Fig. 12. An example of making a matching mask in a signed region. `0a, `na, `pa, `ea, and `#a denote zero, negative, positive, edge pixels, and the center of the mask, respectively: (a) Feature image, (b) the generated mask. Fig. 11. Windows used in edge pixel matching. The sign `#a indicates the center of the window and the degree shows the direction of the edge.
points of the target image, which lie within the search range and have a !1 to #1 di!erence of direction, are selected as matching candidates. Okutomi and Kanade [20] simulated the relation between mask size and signal variance relative to noise ratio in a matching environment. They concluded that a small window is more appropriate for a disparity change region and a large window is more appropriate for a disparity smooth region. In general, since disparity changes are detected by intensity changes, it is assumed that disparity changes may or may not occur at an intensity edge. Thus, a small window is more e$cient than a large one in edge pixel matching. 3]5 windows, shown in Fig. 11, are used in this matching step. 3.2.2. Signed pixel matching There are signed pixels around an edge after secondorder di!erentiation and the intensity slopes of the pixels are either monotonous or #at. Though a pixel is located in a #at zone, the sign of the pixel "ltered by a Laplacian operator may not be zero since the window of the di!erential operator can include an inhomogeneous pixel beyond the edge. Since occluded areas generally exist in a disparity discontinuity region and the disparity discontinuity matches with the edge, a mask composed of pixels homogeneous to the center pixel is e$cient in this matching step. Fig. 12 shows a region shape as considered by a mask generation. Since the intensity of a zero region adjacent to a signed region is similar to that of a signed one, the pixel situated in a zero region is included in the mask generation. A 9]9 window whose origin point is at the center is used in the generation. Fig. 12(b) shows an example of the generated mask, which includes in the feature image pixels with the same sign as the center pixel with a zero value. When the size of the generated mask is smaller than 20 pixels, a 7]7 square mask is used in the
Fig. 13. Three-dimensional relaxation structure.
pixel matching to accommodate the insensitiveness of small variances. 3.2.3. Residual pixel matching After edge and signed pixel matching, the residuals are zero pixels. They exist far away from the edge and those intensity "gures are very smooth. If small windows are used in this matching step, the matching response will be sensitive to small di!erences. Thus, the size of the matching window should be large enough [20]. Square type windows varying only in size from 7]7 to 11]11 are used. The minimum MAD among the windows is selected as the matched error in each disparity. 3.3. Relaxation MADs, selected in each matching step, are normalized by the window size and they are stored into the relaxation structure as shown in Fig. 13. In the relaxation stage, the normalized MADs are transformed into initial possibilities and the possibilities are updated by three constraint functions. Finally, the point with the
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
775
A B
(5)
maximum possibility is determined as the disparity value after several iterations.
x2 x2 f (x)" exp ! , x'0 x p2 p2 and
3.3.1. Transformation In order to assign a possibility to each MAD, which denotes a selected grade as the correct matching point, the distribution of the matched di!erences should be considered. However, since the distribution varies with image characteristics, the analysis of the distribution in each matching is laborious work. Thus, the MADs obtained in several images are approximated to a certain distribution for a fast transformation. If MADs and squared MADs are accumulated while the matching proceeds, the approximate distribution of MADs can be calculated. Let X be the random variable of MAD, the variance of MADs becomes E[X2]!(E[X])2. Then the distribution is represented by the variance. It is approximated to the Rayleigh distribution from some experiments. Fig. 14 shows examples of actual MADs and the bold curves are the approximated Probability Distribution Functions (PDF). From these examples, it can be observed that the approximation of the distribution is a reasonable one. Also, it has the advantage that the Rayleigh distribution can represent a Laplacian to Gaussian probability distribution function by varying its variance. The probability distribution function and the Cumulative Distribution Function (CDF) of Rayleigh are expressed as
A B
x2 F (x)"1!exp ! , x'0, x 2p2
(6)
respectively. A CDF is used to transform the MADs into possibility. Since the value with the smaller di!erence must be mapped to the higher possibility, the transformation function, h (x), can be de"ned as d h (x)"1!F (x) d x
C A BD A B
x2 "1! 1!exp ! 2p2 x2 "exp ! . 2p2
(7)
All MADs, saved in the 3-D relaxation structure as shown in Fig. 13, are transformed into possibilities by Eq. (7). Fig. 15 shows the curve-"tted PDF using the Rayleigh distribution, the CDF, and the transformation function of `pentagona, respectively. 3.3.2. Update possibility As mentioned above, possibilities transformed by Eq. (7) are recursively updated by neighbor values. The
Fig. 14. Examples of the distribution of MADs using the proposed matching algorithm. (a) `Pentagona, (b) `Stripea, (c) 30% random dot stereogram which has `0a or `255a gray level. The bold lines are a curve-"tted graph of the actual distribution.
776
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
will be increased according to the amplitude of the mean value. If the center pixel is located in a region R , the c smoothness function is represented as
C
D
1 m m F (i, j, k)"w + + % (i, j, k) t s s N e i/~m j/~m
(i, j)O(0, 0)
and (i, j)3R , c
(9)
where w , N , and m are the weighting constants of the s e smoothness, the number of excitatory inputs, and the search range, respectively. The uniqueness implies that a pixel must be matched with one point. Therefore, other nodes on the disparity axis have to act exclusively with one another. Consequently, the point with the maximum possibility will remain.
C
D
1 m F (i, j, k)"!w + % (i, j, k) u u N t i k/~m
Fig. 15. An example of PDF, CDF, and a transformation function: (a) The PDF and CDF of Fig. 14(a), (b) and its transformation function.
possibility of a current node is updated by three constraint functions which include smoothness, uniqueness, and discontinuity preservation. The updating rule is similar to the cooperative algorithm of Marr and Poggio [7]. However, in the proposed method, a disparity discontinuity term is added to the updating function to preserve an edge of disparity. Let %, F , Fu, and F denote the possibility, smooths d ness, uniqueness, and discontinuity preservation functions, respectively. The possibility of each node in the next iteration is represented as % (i, j, k)"% (i, j, k)#F (i, j, k)#F (i, j, k) t`1 t s u #F (i, j, k), (8) d where i, j, k, and t represent row, column, disparity axis of the relaxation structure, and iteration number, respectively. Therefore, a possibility value in the next iteration is determined by the sum of the previous possibilities and the three constraint functions. Since disparities in a region are similar to one another, the smoothness function is strongly dependent on the region map, or the intensity pro"le. Thus, it has to be excited by the possibility of the pixel which is located in the same region as the center pixel on the image plane. If the mean value of the neighbor possibility included in the excitation set is large, the possibility of the current node
kO0
(10)
is de"ned as the uniqueness function, where w and N are u i the weighting factor and the number of inhibitory input, respectively. This reduces the possibility of the current node and relates only to the disparity axis. The last term of Eq. (7) is the discontinuity preservation function. It assigns a survival possibility to a node according to the appearance of possibilities in the current state to preserve edge points on a disparity map from erosion caused by consecutive iterations. Since this term has not been used in conventional relaxation algorithms, it is, therefore, di$cult to determine the weights of each constraint function and a proper stop condition to avoid over-smoothing. Thus, heuristic approaches are used in the determination. Generally, if there are many nodes surrounding the current node that have the maximum possibility on the disparity axis, it must have a positive survival possibility. A simple threshold function,
G
w Nl (i, j, k)*¹, k F (i, j, k)" d d !w Nl (i, j, k)(¹, d k
(11)
is used as the discontinuity preservation function. w , d Nl and ¹ are the weight, the number of nodes which k belong to the excitatory set and have the maximum possibility on k-direction at t-iteration, and the threshold value, respectively. When the center pixel of an odd window is in a corner point as shown in Fig. 16(a), in order to preserve the center point, the threshold is set as ¹"M(=!1)/2#1N2"(m#1)2,
(12)
where = represents the length of the window. If there are fewer pixels than the threshold, the possibility of the center node is going to decrease. Therefore, F (i, j, k) acts d as either an excitatory or an inhibitory input according to the appearance of neighbor possibilities. Plus, Nl k should be considered as in the same region as that of the
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
777
Fig. 16. Examples of excitation sets having a positive survival possibility when = is 3. The black boxes are the pixels which have the maximum possibility in each disparity direction.
applied to various matching algorithms such as block-, edge-, and region-based methods. For instance, if a block matching algorithm is used in the matching, the excitation set includes all the pixels within the block. Also, an edge-segment, region, and object, etc. can compose the excitatory inputs. Temporary disparity maps are calculated in each iteration to check the termination of the recursive process. The maps consist of the disparity point which has the maximum possibility along the disparity axis in the iteration. When there are no isolated spike pulses on the map, the iteration process is terminated.
4. Experimental results The proposed relaxation scheme and matching algorithm were tested by arti"cial and real scene stereoscopic images. Arti"cial images, with random and repeating patterns, and real scene images were used. The relaxation scheme was compared with the cooperative algorithm proposed by Marr and Poggio [7]. In the experiments with the matching algorithm, three methods classi"ed by the type of matching window and the presence of relaxation scheme were compared. 4.1. The relaxation scheme
Fig. 17. A region map and connection diagram where the search range is 2: (a) Region map, (b) connection diagram of (a).
Figs. 18 and 19 are Random Dot Stereograms (RDS) and Fig. 20 is the `stripea image pair. Table 1 shows the properties of these stereograms and the matching information. The updating rule of the cooperative algorithm was de"ned as
G
C(n`1)"p xyd center pixel. As shown in Fig. 16, the number of the maximum possibility node is important, not the pattern. Since both smoothness and the discontinuity preservation function relate to excitatory inputs, except for uniqueness, the connection diagram of a current node is depicted as shown in Fig. 17. The excitatory inputs can be modi"ed by a matching method, so that the proposed relaxation algorithm is
H
+ C(n) !e + C(n) #C(0) , x{y{d{ x{y{d{ xyd x{y{d{|S(xyd) x{y{d{|0(xyd) (13)
where C(n`1) represents the state of the node at position xyd (x, y) with disparity d at iteration n#1, S and O denote the excitation and inhibition set, e is the inhibition constant, and p is a sigmoid function. The results of the cooperative algorithm are shown in Figs. 21}23, when e is 2.0. The results are plotted before over-smoothing. If
778
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
p, e, C(0) and both S and O are carefully selected, the xyd outputs were better than those in Figs. 21}23. However, erosion due to smoothing at the disparity edge area was not eliminated. For the comparison of the relaxation scheme with Marr and Poggio's algorithm, matching strategies were not used and a simple Block Matching Algorithm (BMA) using a 3]3 window was used to "nd the corresponding point in this experiment. The results for arti"cial images are shown in Figs. 24}26. Disparities in each iteration were determined by the possibility which had the maximum value along the disparity axis and were displayed
Fig. 18. 30% random dot stereogram. 10% of the dots of the right image are randomly decorrelated.
Fig. 19. 50% random dot stereogram. 20% of the dots of the right image are randomly decorrelated.
Fig. 20. `Stripea image pair with white Gaussian noise (p"50).
Fig. 21. Results of the cooperative algorithm for a 30% random dot stereogram. Iteration number is (a) 0, (b) 1, (c) 6, and (d) 9.
Table 1 The properties of arti"cial stereograms and matching information. Image Item
30% RDS
50% RDS
`Stripea image
Size Noise type Actual disparity Searching range Matching method Excitatory set
128]128 Random noise (10%) 0}3 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
128]128 Random noise (20%) 0}3 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
128]128 Gaussian noise (p"50) 0}2 !4}8 BMA (3]3) All pixels within the considered range 0.1, 0.1, 0.1
Weights (w , w , w ) s u d
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
779
Fig. 22. Results of the cooperative algorithm for a 50% random dot stereogram. Iteration number is (a) 0, (b) 1, (c) 6, and (d) 8.
Fig. 23. Results of the cooperative algorithm for the `stripea image pair. Iteration number is (a) 0, (b) 10, (c) 11, and (d) 12.
Fig. 24. The proposed relaxation results for a 30% random dot stereogram. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
with an intensity and height map. Since a small window was intentionally used in the matching, the initial results were very noisy as shown in Fig. 24(a), 25(a), and 26(a). The line patterns of the intensity map in Fig. 26(a) represent mismatched points which were matched to the next stripe, but they gradually disappear in the map through repeated iteration. Some experiments were executed to check the in#uence of the discontinuity preservation function. Figs. 27 and 28 show the results for a 30%
random dot stereogram without the discontinuity function. When this function is not used, the weight of the function and the iteration number must be carefully determined because oscillation or over-smoothing may occur in a disparity map according to the amplitude of the weight. However, if the discontinuity preservation function is inserted to the updating rule, the output is insensitive to the weights and over-smoothing will not occur after more than 100 iterations. When both w and s
780
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 25. The proposed relaxation results for a 50% random dot stereogram. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
Fig. 26. The proposed relaxation results for the `stripea image. Iteration number is (a) 0, (b) 2, (c) 5, and (d) 20.
w were between 0.1 to 0.3, there were few di!erences in u the results. 4.2. The proposed matching algorithm In the experiments with the arti"cial images, three methods were compared to evaluate the proposed matching algorithm. The methods were as follows. Method 1: Using a "xed size window (3]3) with relaxation, Method 2: Using a variable size window (3]3 to 11]11) without relaxation,
Method 3: Using a variable size window with relaxation (the proposed algorithm). In general, the intensity-based methods are similar to method 1 except for the di!erence of the relaxation scheme. Method 1 is the same as the proposed relaxation scheme as mentioned in the above section (Figs. 24}26). Since it is impossible for a random dot stereogram to use edge and region information, variable square windows, changing only in size, are used in the random dot stereogram matching. The results for methods 2 and 3 are shown in Figs. 29}32. To numerically compare the matching results, the Mean of the Squared Error (MSE) and the Sum of the Squared Error (SSE) between the true
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
781
Fig. 27. The results for a 30% random dot stereogram without the discontinuity preservation function (3]3 window). Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30 (w "w "0.05). s u
Fig. 28. The results for a 30% random dot stereogram without the discontinuity preservation function (3]3 window). Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30 (w "w "0.4). s u
disparity and the estimated one were used as the distance measure. N SSE" + (d !d) )2 i i i/1 and
(14)
1 N MSE" + (d !dK )2 (15) i i N i/1 are the measures, where N, d , and dK denote the number i i of disparity, true disparity, and the estimated one, respec-
tively. The MSEs and SSEs of the three methods are shown in Table 2. The initial disparity maps of method 3 are the same as those of method 2. The `beara and `pentagona image pairs were used for real scene stereo image matching. The images are shown in Figs. 33 and 34. Table 3 shows the parameters of the images and Figs. 35 and 36 are the results of the proposed matching method. There are some mismatched points, but stable outputs were obtained to a certain degree. As compared with Lee [11] and Kim's method [12], both the bookstand in the `beara image and the bridge in the `pentagona image disappeared using Lee's method and
782
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
Fig. 29. The results using method 2: (a) 30% random dot stereogram, (b) 50% random dot stereogram, (c) `Stripea image pair.
Fig. 30. The results for a 30% random dot stereogram using method 3. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
the bookstand was not seen using Kim's method. The bookstand, ball, and bear appeared in the result of the `beara image pair. On the right-top side of the `pentagona image, the bridge became visible.
5. Conclusion A hybrid approach for a stereo matching method based on edge and region information was proposed. A modi"ed N¸ operator, including a HDC, N¸ "lter, and weak edge elimination, etc., was used to extract proper matching primitives. According to the type of the current pixel in feature image, di!erent matching strat-
egies using variable windows were applied to the pixel matching. The local information of input images was considered. To acquire more stable results under similarity and consistency constraints, normalized MADs, obtained in a matching step, were transformed into possibilities. Final disparities were determined by the reciprocal actions of neighbor possibilities in the relaxation step. Unlike conventional relaxation schemes, the proposed relaxation algorithm did not only use disparity smoothness and uniqueness, but also introduced a disparity preservation factor. Because of the use of the preservation factor, the erosion in abrupt areas of a disparity map was considerably reduced. In addition, the proposed relaxation could be applied to block-, feature-, region-, and
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
783
Fig. 31. The results for a 50% random dot stereogram using method 3. Iteration number is (a) 0, (b) 5, (c) 10, and (d) 30.
Fig. 32. The results for the `stripea image. Iteration number is (a) 0, (b) 2, (c) 5, and (d) 20. Table 2 Matched errors of the three methods for synthetic images Method 1
Method 2
Method 3
Image
SSE
SSE
MSE
SSE
MSE
30% RDS 50% RDS `Stripea image pair
1094 0.067 1375 0.084 1048 0.064
2149 0.131 3012 0.184 1220 0.074
703 754 682
0.043 0.046 0.042
Table 3 Parameters of real scene stereograms and the matching information Image
MSE
segment-based matching methods by modifying the excitation set. In experiments using the proposed matching algorithm for random dot stereograms with a random pattern, the `stripea image with a repeating pattern, and
Item Size Noise type Actual disparity Searching range Matching method Excitatory set
`Beara image
200]200 None About 0 to 10 !15 to 25 The proposed Within the same region and the considering window Weights (w , w , w ) 0.1, 0.1, 0.1 s u d
`Pentagona image 512]512 None About !15 to 15 !25 to 25 The proposed Within the same region and the considering window 0.1, 0.1, 0.1
784
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785
indoor and outdoor images, stable outputs were obtained.
Acknowledgements This work was partially supported by the Korea Research Foundation under grant number 1997-001E00374. Fig. 33. `Beara image pair.
Fig. 34. `Pentagona image pair.
Fig. 35. The result map of the `beara image.
Fig. 36. The result map of the `pentagona image.
References [1] A.C. Kak, Handbook of Industrial Robotics, Chapter on Depth Perception for Robots, Wiley, New York, 1985. [2] G. Stockman, S. Chen, G. Hu, N. Shrikhande, Recognition of rigid objects using structured light, in Proceedings of 1987 IEEE International Conference on Systems Man and Cybernetics 1987, pp. 877}883. [3] R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, part 2, 1992, (Chapter 16). [4] W. Eric L. Grimson, Computational experiments with a feature based stereo algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1) (1985) 17}34. [5] G. Medioni, R. Nevatia, Segment-based stereo matching, Compu, Vision Graphics Image Process. 31 (1985) 2}18. [6] Song De Ma, Conics-based stereo, motion estimation, and pose determination, Int. J. Comput. Vision 10 (1) (1993) 7}25. [7] D. Marr, T. Poggio, Cooperative computation of stereo disparity, Science 194 (1976) 283}287. [8] J.P. Frisby, S.B. Pollard, Computational Issues in solving the stereo correspondence problem, Computational Models of Visual Processing, part 7, 1990, pp. 331}357 (Chapter 22). [9] D. Marr, T. Poggio, A computational theory of human stereo vision, Proc. Roy. Soc. London B204 (1979) 301}328. [10] D. De Vleeschauwer, An intensity-based, coarse-to-"ne approach to reliably measure binocular disparity, CVGIP: Image Understanding 57 (2) (1993) 204}218. [11] Jun-Jae Lee, Jae-Chang Shim, Yeong-Ho Ha, Stereo correspondence using hop"eld neural network of new energy function, Pattern Recognition 27 (1994) 1513}1522. [12] Yong-Suk Kim, Jun-Jae Lee, Yeong-Ho Ha, Stereo matching algorithm based on modi"ed wavelet decomposition process, Pattern Recognition 30 (1997) 929}952. [13] John (Juyang) Weng, Image matching using the windowed fourier phase, Int. J. Comput. Vision 11 (3) (1993) 211}236. [14] S.B. Marapane, M.M. Trived, Region-based stereo analysis for robotic applications, IEEE Trans. Systems Man Cybernet 19 (1989) 1447}1464. [15] J.R. Jordan, A.C. Bovik, Using chromatic information in edge-based stereo correspondence, CVGIP: Image Understanding 54 (1) (1991) 98}118. [16] A. Knotanzad, A. Bokil, Y.W. Lee, Stereopsis by constraint learning feed-forward neural networks, IEEE Trans. Neural Networks 4 (1993) 332}342. [17] M. Okutomi, T. Kanade, A multiple-basedline stereo, IEEE Trans. Pattern. Anal. Mach. Intell. 15 (4) (1993) 353}363.
K-P. Han et al. / Pattern Recognition 33 (2000) 767}785 [18] L.J. Van Vliet, I.T. Young, A nonlinear laplace operator as edge detector in noisy images, Comput. Vision Graphics Image Process. 45 (1989) 167}195. [19] P.J. Burt, Fast "lter transforms for image processing, Comput. Graphics Image Process. 16 (1981) 20}51. [20] M. Okutomi, T. Kanade, A locally adaptive window for signal matching, Int. J. Comput. Vision 7 (2) (1992) 143}162. [21] Kyeong-Hoon Do, Yong-Suk Kim, Tae-Uk Uam, YeongHo Ha, Iterative relaxational stereo matching based on
785
adaptive support between disparities, Pattern Recognition 31 (8) (1998) 1049}1059. [22] Shing-Huan Lee, Jin-Jang Leou, A dynamic programming approach to line segment matching in stereo vision, Pattern Recognition 27 (8) (1994) 961}986. [23] V. Torre, T. Poggio, On edge detection, IEEE Trans. Pattern. Anal. Mach. Intell. PAMI-8 (1986) 147}163. [24] R.M. Haralick, L.G. Shapiro Computell and Robot Vision, part 1 (1992) 37}48.
About the Author*KYU-PHIL HAN received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1993 and 1995, respectively, and is currently a Ph.D. student in the Department of Electronic Engineering of Kyungpook National University. He was a Researcher at the SindoRicoh Advanced Institute of Technology from 1995 to 1996. He was awarded a bronze prize in the 5th Samsung Humantech Thesis competition in February 1999. His main interests are in digital image processing, 3-D image compression, and computer vision. About the Author*TAE-MIN BAE received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1996 and 1998, respectively, and is currently a Ph.D. student in the Department of Electronic Engineering of Kyungpook National University. His main interests are in 3-D image compression and computer vision. About the Author*YEONG-HO HA received the B.S. and M.S. degrees in Electronic Engineering from Kyungpook National University, Taegu, Korea, in 1976 and 1978, respectively, and Ph.D. degree in Electrical and Computer Engineering from the University of Texas at Austin, TX, 1985. In March 1986, he joined the Department of Electronic Engineering of Kyungpook National Univeristy, as an Assistant Professor, and is currently a Professor. He served as TPC co-chair of 1994 IEEE International Conference on Intelligent Signal Processing and Communication Systems and he is now chairman of IEEE Taegu section. His main research interests are in image processing, computer vision, and video signal processing. He is a member of IEEE, Pattern Recognition Society, IS&T, Institute of Electronics Engineers of Korea and Korean Institute of Communication Sciences.
Pattern Recognition 33 (2000) 787}807
An adaptive logical method for binarization of degraded document images Yibing Yang*, Hong Yan School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia Received 29 October 1998; accepted 29 March 1999
Abstract This paper describes a modi"ed logical thresholding method for binarization of seriously degraded and very poor quality gray-scale document images. This method can deal with complex signal-dependent noise, variable background intensity caused by nonuniform illumination, shadow, smear or smudge and very low contrast. The output image has no obvious loss of useful information. Firstly, we analyse the clustering and connection characteristics of the character stroke from the run-length histogram for selected image regions and various inhomogeneous gray-scale backgrounds. Then, we propose a modi"ed logical thresholding method to extract the binary image adaptively from the degraded gray-scale document image with complex and inhomogeneous background. It can adjust the size of the local area and logical thresholding level adaptively according to the local run-length histogram and the local gray-scale inhomogeneity. Our method can threshold various poor quality gray-scale document images automatically without need of any prior knowledge of the document image and manual "ne-tuning of parameters. It keeps useful information more accurately without overconnected and broken strokes of the characters, and thus, has a wider range of applications compared with other methods. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Document images; Image thresholding; Image segmentation; Image binarization; Adaptive logical thresholding
1. Introduction Document images, as a substitute of paper documents, mainly consist of common symbols such as handwritten or machine-printed characters, symbols and graphics. In many practical applications, we only need to keep the content of the document, so it is su$cient to represent text and diagrams in binary format which will be more e$cient to transmit and process instead of the original gray-scale image. It is essential to threshold the document image reliably in order to extract useful information and make further processing such as character recognition and feature extraction, especially for those poor quality document images with shadows, nonuniform illumination, low contrast, large signal-dependent noise,
* Corresponding author. Tel.: #61-2-9351-6210; fax: #612-9351-3847. E-mail address:
[email protected] (Y. Yang)
smear and smudge. Therefore, thresholding a scanned gray-scale image into two levels is the "rst step and also a critical part in most document image analysis systems since any error in this stage will propagate to all later phases. Although many thresholding techniques, such as global [1}4] and local thresholding [5}7] algorithms, multi thresholding methods [8}11] and adaptive thresholding techniques [12,13] have been developed in the past, it is still di$cult to deal with images with very low quality. Most common problems in poor quality document images are: (1) variable background intensity due to nonuniform illumination and un"t storage, (2) very low local contrast due to smear or smudge and shadows in the capturing process of the document image, (3) poor writing or printing quality, (4) serious signal-dependent noise, and (5) gray-scale changes in highlight and color areas. It is essential to "nd thresholding methods which can correctly keep all useful information and remove noise and background. Meanwhile, most document
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 4 - 1
788
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
processing systems need to process a large number of documents with di!erent styles and layouts every day, thus, they require that whole processing procedure is achieved automatically and adaptively without prior knowledge and pre-speci"ed parameters. Global thresholding methods cannot meet these requirements, and local, or adaptive thresholding methods, which need to be tuned with di!erent parameters according to di!erent image classes, cannot be used for automated process either. In this paper, we propose a thresholding method based on adaptive logical level technique to binarize seriously degraded and very poor quality gray-scale document images. Our method can deal with complex signal-dependent noise, variable background intensity caused by nonuniform illumination, shadow, smear or smudge and very low contrast without obvious loss of useful information. The paper is organized as follows. Section 2 brie#y reviews related works on image thresholding techniques with an emphasis on the document image binarization based on local analysis and adaptive thresholding. Section 3 analyses various factors which can cause poor quality and inhomogeneous gray-level background in an image and propose a rule to select the local area for analysis and to produce run-length histograms and to extract stroke width information of a document image. Section 4 describes the principle and implementation process of our modi"ed adaptive logical level technique to threshold various degraded and poor quality document images, and a simple and e!ective method for post processing of binary image. Section 5 discusses and evaluates the experimental results of the proposed method by comparison with some related thresholding techniques according to implementation complexity, character size and stroke width restriction, the number of pre-speci"ed parameters and their meanings and setting, and human subjective evaluation of thresholded images, with experiments on some typical poor quality document images with bad illuminating condition (Fig. 1), and with shadows and signal-dependent noise (Figs. 2 and 3). The last section includes the summary and conclusion of our work.
2. Related work We brie#y review some related works on image thresholding, particularly for poor quality document image binarization, which will be evaluated and compared with our thresholding method later. More complete reviews of image thresholding techniques can be found in [2,4,14}16]. Image binarization methods can be divided into two classes: global and local thresholding techniques. The simplest and earliest method is the global thresholding technique. The most commonly used global thresholding
Fig. 1. A 768]576]8 original document image under bad illuminating condition.
Fig. 2. A 768]576]8 original document image under bad illuminating condition and signal-dependent noise.
techniques are based on histogram analysis [1,3,4]. Threshold is determined from the measure that best separates the levels corresponding to the peaks of the histogram, each of which corresponds to image pixels of a di!erent part like background or objects in the image. Some global multi-threshold techniques are based on edge analysis [9,10] and histogram distribution function [8,11]. Sahoo et al. [2] analysed and evaluated the performance of over 20 popular global thresholding algorithms. All these algorithms need to have a priori knowledge of the image processed about the number of peaks in the gray-level histogram. The modality of the document image histogram, however, may change from image to image. Thus, an obvious drawback of these global techniques is that it cannot separate those areas which
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
789
tive of thresholding is to preserve connectivity within local regions. This algorithm is implemented in three steps.
Fig. 3. A 768]576]8 original document image under bad illuminating condition and noise and shadow.
have the same gray level but do not belong to the same part. These methods do not work well for the document images with shadows, inhomogeneous backgrounds, complex background patterns and di!erent types of fonts and typesettings, which may have a histograms that contains a single peak (Fig. 4). In this case, a single threshold or some multilevel thresholds could not result in an accurate binary document image as shown in Figs. 5}7 no matter how to tune the threshold parameters. In local and adaptive thresholding techniques, local threshold levels are determined by optimizing some local statistical measures of separation. The criterion function may include the local intensity change (max/min and contrast) [13], stroke width of the characters [17], spacial measures like connectivity and clustering [18,19] and some gradient and edge information [12,20,21]. For complex document image analysis, Kamel and Zhao [22] compared four local adaptive thresholding algorithms for document images with shadows and complex background patterns and proposed two new thresholding techniques } logical level technique and mask-based subtraction technique. Trier and Jain [14,15] evaluated 11 popular local thresholding methods and four global thresholding techniques. For all local thresholding techniques, it appears that none could threshold all images well with a set of operating parameters. In the following, we review a few related local thresholding algorithms particularly for poor quality document image with shadow, signal-dependent noise and inhomogeneous background and their results, which will be compared with our method later. 2.1. Connectivity-based thresholding algorithm This algorithm was proposed in Ref. [18]. It uses the local connectivity as an information measure. The objec-
(1) Determine a histogram of the number of horizontal and vertical runs that result from thresholding the original image at each intensity level. It is equivalent to count all black and white runs along all rows and columns for all binary images corresponding to each intensity level. (2) Calculate the `sliding pro"lea from the runs histogram to "nd plateaus or lack of variation of runs, some ranges around each intensity level can be determined. (3) Determine the number of thresholds as the number of peaks on the sliding pro"le. The thresholds are chosen at the peaks that the sliding pro"le have local maximum values. The image are thresholded into n#1 intensity levels by the n thresholds. This algorithm produces global thresholds, but uses local connectivity information. It can be used for local thresholding if multi-thresholds are used in di!erent areas of the image. It could not segment the document images well which are badly illuminated, especially when they contain both shadows and noise, as the shadow itself can be regarded as a connected part and the noise can a!ect the run histogram. We tested this algorithm for some poor quality document images. Some results are shown in Section 5 (from Figs. 19}21). 2.2. Local intensity gradient method (LIG) This method as presented in Ref. [20] and evaluated and slightly modi"ed in Ref. [21] is based on the principle that objects in an image provide high spatial frequency components and illumination consists mainly of lower spatial frequencies. It "rst detects edges, and then the interior of objects between edges is "lled. First, for each pixel (x, y) in the input image f (x, y), calculate d(x, y)" min [f (x, y)!f (x , y )], i i i/1,2,8 where (x , y ), i"1,2, 8 are the 8-connected neighbours i i of (x, y). Then the image d(x, y) of minimum local di!erence is broken up into the regions of size N]N. For each region, the mean m and standard deviations p are computed. Both values are smoothed by weighted mean and then bilinearly interpolated to produce two new images M and S from m and p, respectively. Then for all pixels (x, y), if M(x, y)*m or S(x, y)(p , the pixel is regarded 0 0 as part of a #at region and remains unlabeled, else, if d(x, y)(M(x, y)#k(x, y), then (x, y) is labeled as print; else (x, y) remains unlabeled. The resulted binary image highlights the edges. This is followed by pixel aggregation and region growing steps to locate the remaining parts of the print objects. This method needs three predetermined
Fig. 4. Some local histograms. (a), (b) and (c) correspond to the local histograms of Figs. 1}3, respectively.
790 Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
791
Fig. 5. Binary document image extracted using the global method from the original image of Fig. 1.
Fig. 7. Binary document image extracted using the global method from the original image of Fig. 3.
Fig. 6. Binary document image extracted using the global method from the original image of Fig. 2.
Fig. 8. Binary document image extracted using local intensity gradient method with N"16, m "!1.0, p "1.0 and 0 0 k"!1.0 from the original image in Fig. 1.
parameters m , p and k and block size N. We tested this 0 0 method for several images with N"16, m "!1.0, 0 p "1.0 and k"!1.0. The results are shown in 0 Figs. 8}10. It can deal with slowly changing background with bad illumination. It will, however, intensify some noise e!ects and could not work well for fast changing background with bad illumination due to gradient-based analysis. 2.3. Intergrated function algorithm and its improvement This technique as described in Ref. [12] and as improved and evaluated in Refs. [14,21] applies a gradientlike operator, de"ned as the activity A(x, y), which is the absolute sum of approximated derivatives for both scan
and raster directions taken over a small area, on the image. Pixels with activity below a predetermined threshold ¹ are labelled &0'. The other pixels are further a tested by the Laplacian edge operator dd (x, y). The pixel xy is labelled ' if dd (x, y)'0; otherwise &!'. Thus, xy a three-level label-image with pixel levels ', &0' and &!' is produced. In a sequence of labels along with some straight line passing through the currently processed points (x, y), edges are identi"ed as &!#' or !' transitions. Object pixels are assumed to be ' and &0' labelled pixels between a &!#' and !' pair. The distance between this pair can be regarded as the `strokewidtha along this line for document images. Background pixels tend not to be included between this pair.
792
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
2.4. Local contrast technique Giuliano et al. [23] presented the local contrast technique to extract binary image in their patent for a character recognition system. This technique is implemented in a 9]9 window for an input image f (x, y). Each pixel in the output binary image b(x, y) is determined on the 3]3]5 local pixels within a 9]9 window as shown in Fig. 11. We use gray level 1 to represent foreground (print) and 0 to represent background (no print) in an output binary image. This method can be implemented as follows:
Fig. 9. Binary document image extracted using local intensity gradient method with N"16, m "!1.0, p "1.0 and 0 0 k"!1.0 from the original image in Fig. 2.
if f (x, y)(¹ , then, b(x, y)"1; 1 otherwise, A "M(x, y)D(x, y)3A and f (x, y)'¹ N; 2t 2 2 a " mean of 9 pixels in area A ; a " mean of the 1 1 2 pixels in area A ; 2t if ¹ a #¹ '¹ a , then, b(x, y)"1; 3 2 5 4 1 otherwise, b(x, y)"0. where, ¹ }¹ are "ve predetermined parameters, ¹ is 1 5 1 equivalent to the threshold in the global technique, ¹ is 2 used to detect all pixels in A with gray levels over 2 ¹ itself, other parameters are used to compare the mean 2 a of the central region A of processed pixel with the 1 1 mean a of the pixels over ¹ in the four corner regions. 2 2 The biggest di$culty of this method is how to choose predetermined parameters. Di!erent parameter setting could produce quite di!erent results. This method is sensitive to inhomogeneous background, large shadows and noise. Figs. 25}27 in Section 5 show some tested results produced by this method. The above four methods will be compared with our thresholding method.
Fig. 10. Binary document image extracted using local intensity gradient method with N"16, m "!1.0, p "1.0 and 0 0 k"!1.0 from the original image in Fig. 3.
According to this analysis, a 2]2 region is classi"ed at a time, it is needed that all the four pixels are inside either horizontal or vertical object pixel sequences. Trier [21] improved this algorithm mainly in that all ' marked regions are labelled print, and &!' marked regions are labeled background; a &0' marked region is labeled print if a majority of the pixels with 4-connected are ' marked, otherwise it is labelled background. It is sensitive to noises and fast changing background due to the Laplacian edge operator. Some thresholding results using this algorithm are shown in later section on the experiment results of this paper (from Figs. 22}24).
Fig. 11. Neighbour analysis in the local contrast technique.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
3. Document image background and stroke width analysis For some poor quality document images with variable or inhomogeneous background intensity like shadows, smear or smudge, complex background patterns and signal-dependent noise, a practical problem is that no thresholding algorithm could work well for all kinds of document images. Most commonly, some methods or some parameters for the document image with variable or inhomogeneous background intensity and noises will result in a thresholded image in which printed characters have nonuniform stroke width and possibly even lost strokes or false character and connection caused by background noise as shown in Figs. 8 and 9, 28(a) and 29(a) in Section 5. This will result in low character recognition rate and document image compression rate in later processing. Background and stroke width analysis of the characters for the document image can overcome or reduce this problem and improve thresholding accuracy
793
and robustness. Here, we present a simple and e$cient method for the backgroundand character stroke width analysis.
3.1. Background analysis When an image consists of only objects and the background, the best way to pick up a threshold is to search a histogram, assuming it is bimodal, and "nd a gray level which separates the two peaks. However, problems arise when the object area is small compared to the background area or when both the object and the background assume some broad range of gray levels as the background gray-level distributions and character gray-level distributions. Two examples are shown in Figs. 12 and 13. In these cases, the histogram is no longer bimodal as shown in Fig. 4. But, in some local areas, the bimodality of the local histogram could be more obvious if this area
Fig. 12. Examples of gray-scale distribution of document image backgrounds with bad illuminated condition and signal-dependent noise. (a), (b) and (c) correspond to the backgrounds of the document images in Figs. 1}3, respectively.
794
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 13. Examples of gray-scale distribution of document image foregrounds with bad illuminated condition and signal-dependent noises. (a), (b) and (c) correspond to the gray-scale distribution for the character line of the document images in Figs. 1}3, respectively.
contains separatable background and objects/characters. We divide an image into N]N, (N"4,2, 8) regions in order to "nd some local areawith quasi-bimodal local histogram or higher local contrast, and then make local histogram analysis for the regions in the two diagonal directions in Fig. 14(a) if N is even and for the regions in the two diagonal, horizontal and vertical directions in Fig. 14(b) if N is odd. We can gradually increase the local regions or the directions of analysis until N"8 if no quasi-bimodal local histogram is found in the case for N(8. In smaller regions, those regions with the same patterns in Fig. 14 are simultaneously analysed in each analysis. From the local region analysis, some local region histograms with quasi-bimodal property are shown in Fig. 15. With local quasi-bimodal histogram, the character stroke widths and background changes can be analysed using run-length histograms from these areas.
3.2. Stroke width and noise analysis The document image can be accurately thresholded if the average or maximum stroke width of the characters in the document image may be determined, because highly structured-stroke units frequently appear in most document images. We have found some regions with quasi-bimodal local histograms in the poor quality document images by using local region analysis, then, local run-length information can be extracted to form a runlength histogram from those selected local regions with quasi-bimodal local histograms. The stroke width information and background noise can be achieved by analysing run-length histogram. Here, we only analyse those selected image regions and only consider black runs related to the characters or other objects. We denote a run-length histogram as a one-dimensional array
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
795
Fig. 14. Local region histogram analysis to "nd some regions with quasi-bimodal local histogram or higher local contrast. (a) local region analysis in the two diagonal directions; (b) "ne local region analysis in the two diagonal, horizontal and vertical directions.
R(i), i3I, I"M1, 2,2, ¸N, where ¸ is the longest run to be counted. R(i) is frequency of the run of length i. Black run-length can be counted from the one-dimensional gray level distributions across the selected local regions as shown in Fig. 16 in horizontal and vertical directions. The number of the directions, which are across the character/object region in the selected image regions with quasi-bimodal local histograms, can be increased to four directions, including horizontal, vertical and two diagonal directions if the document image contains complex symbol patterns. The stroke width (SW) is de"ned as the run-length with the highest frequency in the run-length histogram excluding the unit run-length. That is, SW"i, if R (i)"max R(i), iO1. It actually re#ects the aver.!9 i|I age width of strokes in a document image. Fig. 17 illustrates the run-length histograms, from which, stroke width can be easily determined. If an image contains some complex background patterns or noises, the highest peak may be formed by these factors instead of the characters. In this case, selected region analysis and only black run-length analysis become necessary to prevent producing wrong stroke width. Statistical study shows that the mean stroke width is usually over one pixel, accordingly, all unit-runs should be removed as background in resulting binary image no matter how it is produced by noise or other background changes. We can use the unit-run noise (URN) [17] to measure background noise and changes. R(1) URN" , iO1. max R(i) i|I
A high number of unit runs means that a document image contains high noise background and/or fast changing background.
4. Adaptive logical level thresholding technique 4.1. Logical level technique Logical level technique proposed by Kamel and Zhao [22] is developed on the basis of analysing integrated function algorithm [12]. It is based on the idea of comparing the gray level of the processed pixel or its smoothed gray level with some local averages in the neighborhoods about a few other neighbouring pixels. More than once the comparison results are regarded as derivatives. Therefore, pixel labeling, detection and extraction using the derivatives, the logical bound on the ordered sequences and the stroke width range can be adopted. This technique processes each pixel by simultaneously comparing its gray level or its smoothed gray level with four local averages in the (2SW#1)] (2SW#1) window centered at the four points P , i P@, P , P@ shown in Fig. 18. We use 1 to represent i i`1 i`1 character/object and 0 to represent background in the resulting binary image. Mathematically, this technique can be described as follows: b(x, y)
G
"
1 if S3 [¸(P )'¸(P@)'¸(P )'¸(P@ )] is true, i/0 i i i`1 i`1 0 otherwise,
796
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 15. Some local region histograms with quasi-bimodal property or the regions with larger contrast in the document images with bad illuminated condition and signal-dependent noise. (a) and (b) correspond to some local region histograms of Fig. 1, (c) and (d) correspond to some local region histograms of Fig. 2.
where SW is the predetermined maximal stroke width, P@"P , for i"0,2,7, ¸(P)"ave(P)!g(x, y)' i (i`4).0$ 8 ¹, ¹ is a predetermined parameter, ave(P)" + + f (P !i, P !j) x y ~SWx*xSW ~SWx+xSW /(2]SW#1)2, P , P are the coordinates of P and g(x, y)"f (x, y) or its x y smoothed value. In order to reduce the computation, fast algorithms are used to calculate the local averages and logical levels. 4.2. Adaptive improvement of logical level technique We propose some improvements for the original logical level technique in order to achieve automatic and
adaptive thresholding and accurate binary images for various poor quality document images. Our modi"cation is made in two aspects. The "rst one is to determine average maximal stroke width SW automatically by run-length histograms in the selected local regions of the image as described in the preceding section. This stroke width can be tuned automatically for di!erent document images. As usual, we use the run-length with highest peak of the run-length histogram in the selected regions of the document image, SW"i, if R (i)"max R(i), iO1, .!9 i|I as stroke width. In some cases, we may also use the run-length SW "j of the second highest peak 2nd R ( j) right to the highest peak, which is the 2nd~right~peak second high peak on the right of the highest peak in the run-length histogram as the stroke width, if 1) ( j!i))2, i, jO1 and R ( j)/R (i)*0.8. 2nd~right~peak .!9
Fig. 16. Some gray-scale distributions across the characters in the selected regions of the document images. We can get the average stroke width information from the run-length of gray-level change. (a) corresponds to gray-scale distributions across the selected region of Fig. 1, (b) corresponds to gray-scale distributions across the selected region of Fig. 2.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807 797
798
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 17. Local run-length histograms, (a), (b) and (c) correspond to the run-length histograms of the selected regions from the document images in Figs. 1}3, respectively.
The other improvement is to automatically and adaptively produce local parameter ¹ instead of using predetermined global parameter. It can overcome the uneven thresholding e!ect and false thresholding result for the document images with bad illumination, inhomogeneous and fast changing background and big noise by the original logical level technique. Parameter ¹ is adaptively and automatically produced as follows: f (x , y ) 1. Calculate f (x, y)"max i i SW~.!9 x , y |(2SW`1) window i i and f (x, y)"min i i f (x , y ) in SW~.*/ x , y |(2SW`1) window i i the (2SW#1)](2SW#1) window centered at the processed point P. 2. Calculate D f (x, y)!ave(P)D and D f (x, y) SW~.!9 SW~.*/ !ave(P)D;
3. If D f (x, y)!ave(P)D'D f (x, y)!ave(P)D, SW~.!9 SW~.*/ the local (2SW#1)](2SW#1) window region tends to contain more local low gray levels, then, ¹" a(2 f (x, y)#1ave(P)). Here, a can be a "xed 3 SW~.*/ 3 value between 0.3 and 0.8. It can be taken as 1 for very 3 poor quality images with high noise and low contrast like in our examples. In most cases, it can be taken as 2. 3 4. If D f (x, y)!ave(P)D(D f (x, y)!ave(P)D, SW~.!9 SW~.*/ the local (2SW#1)](2SW#1) window region tends to contain more local high graylevels, then, ¹"a(1 f (x, y)#2ave(P)). 3 SW~.*/ 3 5. If D f (x, y)!ave(P)D"D f (x, y)!ave(P)D; SW~.!9 SW~.*/ 1. C If f (x, y)"f (x, y), expand the window SW~.!9 SW~.*/ size to (2SW#3)](2SW#3), then, repeat from
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
799
where ¸ is a constant, it may be determined larger than 0 average character size. We use LRN to measure if the binary image contains more long-run noise. If it is larger than 1 or close to 1, we will revise the width parameter to smaller in thresholding so that the resulting binary image may be more clear. This process is automatic.
5. Experimental results and evaluation
Fig. 18. Processing neighbourhood of logical level thresholding technique.
step 1 but using new widow size. If still f (x, y)"f (x, y) in the new window, SW~.!9 SW~.*/ then, P is regarded as background pixel (or ¹"a ) ave(P)). 1. C If f (x, y)Of (x, y), the local (2SW# SW~.!9 SW~.*/ 1)](2SW#1) window region tends to contain the same quota of low and high gray levels, expand the window size to (2SW#3)](2SW#3), then, repeat from step 1 but using new window size. If D f (x, y)!ave(P)D"D f (x, y)!ave(P)D SW~.!9 SW~.*/ and f (x, y)Of (x, y) in the new SW~.!9 SW~.*/ window, then, ¹"a ) ave(P). 4.3. Postprocessing of binary image The aim of postprocessing for binary image is to remove binary noise and false print information to improve binary quality. In our method, we use run-length information to decide false information. First, the runlength histograms only for print information of the binary image in horizontal and vertical directions are extracted, we compare them with the local run-length histograms from the original document image, the unitrun parts in both horizontal and vertical directions are removed. Then, those runs only with unit and two pixel width combined in both horizontal and vertical directions are removed. Furthermore, we analyse possibly big false print information caused by fast changing backgrounds, which we call long-run noise. A run is considered to be long if it is substantially longer than the maximum run-length of the characters. The number of long runs should be quite small even if underlines, tables and graphics exist in document images. Here, we use a long-run noise (¸RN) feature [17] to describe if there are some long-run noises in the resulting binary image. + R(i) LRN" i;L0 , max R(i) i|I
iO1,
We have tested six local adaptive thresholding algorithms including the logical level technique and our modi"ed logical technique for a number of document images with poor quality such as with bad illumination, shadow, signal-dependent noise and various variable backgrounds in di!erent parameters. All algorithms were implemented and tested through software written in C programming language in the UNIX on a Sun Sparc Station IPX. Figs. 8}10 and 19}38 illuminate the experiment results respectively using Local Intensity Gradient Method, Connectivity-Based Thresholding Algorithm, Intergrated Function Algorithm, Local Contrast Technique, Logical Level Technique and our Adaptive Logical Thresholding Algorithm. Table 1 gives the average computation time (CPU time in seconds) for the algorithms mentioned above. All images tested are with width of 768, height of 576 and gray-level range of [0,255]. Connectivity-Based Thresholding Algorithm could not segment badly illuminated document images well, especially when they contain both shadows and noises as in Figs. 19}21, since the shadow itself can be regarded as a connected part and the noise could give a big e!ect for the run histogram. Moreover, its e$ciency of implementation is limited by a large number of calculations and decisions, since it needs to calculate the run-length
Fig. 19. Binary document image extracted using connectivitybased thresholding algorithm from the original document image in Fig. 1.
800
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 20. Binary document image extracted using connectivitybased thresholding algorithm from the original document image in Fig. 2.
Fig. 22. Binary document image extracted using intergrated function algorithm from the image in Fig. 1.
Fig. 23. Binary document image extracted using intergrated function algorithm from the image in Fig. 2. Fig. 21. Binary document image extracted using connectivitybased thresholding algorithm from the original document image in Fig. 3.
histograms for two directions (horizontal and vertical) at each intensity level. Local Intensity Gradient Method can work well with slowly changing background and bad illumination. It will, however, intensify some noise e!ects and could not work well for fast changing background with bad illumination as shown in Figs. 8}10 due to gradient-based analysis. Besides, the calculation of local minimum di!erence image (for each pixel) and local mean m and standard deviation p particularly with the increase of the blocked region size N are quite timeconsuming. The selection of pre-speci"ed parameters are image-directed and region-directed. Di!erent pre-speci"ed parameters and region size could result in quite di!erent result images.
Fig. 24. Binary document image extracted using intergrated function algorithm from the image in Fig. 3.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
801
Intergrated Function Algorithm has fully considered the stroke width information so that it can remove all the large dark areas completely. The labeling and logical detecting assure that every large dark area is a connected black blob which is removed from the image in the "nal extraction phase. Therefore, the resulting images have no unwanted edges of large dark areas, but it is sensitive to noises and fast changing background due to using Laplacian edge operator. It could produce some small noise
Fig. 27. Binary document image extracted using local contrast analysis from the image in Fig. 3.
Fig. 25. Binary document image extracted using local contrast analysis from the image in Fig. 1.
Fig. 26. Binary document image extracted using local contrast analysis from the image in Fig. 2.
c Fig. 28. Binary document image extracted using original logical level technique from the document image in Fig. 1. The results in (a) and (b) are quite di!erent due to di!erent predetermined parameters.
802
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Table 1 Implementation results and evaluation. Execution time is the average processing time for several images of size 768]576 on a Sun Sparc Station IPX Method
Average CUP Subjective evaluation time (s)
Connectivity-based Local intensity gradient (postprocessing) Integrated function Local contrast
46.583 51.166
Shadows Noise, unwanted edges
18.743 35.383
Logical level (fast algorithm [22]) Adaptive logical level (fast algorithm [22] and postproc.)
12.166
Noise Noise, over-removal of shadow area Good, a little bit overremoval of shadow area Best
15.533
Fig. 30. Binary document image extracted using modi"ed logical level thresholding method from the original Fig. 1. SW is taken as the run-length of the highest peak in the run-length histogram.
Fig. 31. Binary document image extracted using modi"ed logical level thresholding method from the original in Fig. 2. SW is taken as the run-length of the highest peak in the run-length histogram.
Fig. 29. Binary document image extracted using original logical level technique from the document image in Fig. 2. The results in (a) and (b) are quite di!erent due to di!erent predetermined parameters.
Fig. 32. Binary document image extracted using modi"ed logical level thresholding method from the original in Fig. 3. SW is taken as the run-length of the highest peak in the run-length histogram.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 33. Binary document image extracted using modi"ed logical level thresholding method from the original in Fig. 1. SW is taken as the run-length of the second highest peak right to the highest peak in the run-length histogram.
803
Fig. 34. Binary document image extracted using our modi"ed logical level thresholding method from the original in Fig. 2. SW is taken as the run-length of the second highest peak right to the highest peak in the run-length histogram.
Fig. 35. An original gray-scale document image with a few of line graphics and shadows.
804
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
prints for noisy images and complex background images as shown in Figs. 20}24. The biggest di$culty for Local Contrast Technique is how to choose predetermined parameters, because it needs to set "ve parameters manually. Since only two out of three parameters ¹ , ¹ and 3 4 ¹ are independent, only ¹ and ¹ have clear physical 5 1 2 meaning like in global thresholding or multi-threshold techniques and can be easily set. For a given image, it appears not to have some rules to set other parameters ¹ , ¹ and ¹ , and di!erent parameters could produce 3 4 5 quite di!erent results. This method is also sensitive to inhomogeneous background and large shadows as shown in Figs. 25}27. Logical Level Technique appears to work well for a wide range of document images, even though it also uses some derivatives in comparisons, its comparison is made for the local average and is not sensitive to noise. The result, however, could be changed from image to image as shown in Figs. 28 and 29 when the image contains some complex background or large change in illumination, because it uses a global predetermined parameter ¹. Our method improves its adaptivity and robustness from a predetermined global parameter to a local one. It can be implemented and tuned automati-
cally from image to image, and is more insensitive to the local noises in images. The stroke width can be selected and adjusted automatically according to di!erent document images and later pattern recognition requirements, that is, we can select the highest or second highest peak right to the highest peak of the run-length as stroke width under some conditions, therefore, it has a wider range of applications. Figs. 30}34 show the experimental results obtained using our method.
6. Conclusions In this paper, we have presented a modi"ed logical thresholding method based on adaptive logical level technique to binarize seriously degraded and very poor quality gray-scale document image. Our method can threshold gray-scale document images with complex signal-dependent noise, variable background intensity caused by nonuniform illumination, shadow, smear or smudge and very low contrast without obvious loss of useful information. It can adaptively tune the size of local analysing area and logical thresholding level according to the local run-length histogram for the selected regions
Fig. 36. Binary document image extracted using our modi"ed logical level thresholding method from the original in Fig. 35.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
with quasi-bimodal local histograms and the analysis of grayscale inhomogeneity of the background. For di!erent test images with various noises and di!erent inhomogeneous backgrounds, experiments and evaluations have shown that our method can automatically threshold various poor quality gray-scale document images without need of any prior knowledge of the document image and manual "ne-tuning of parameters. It is nonparametric and automatic. It keeps useful information more accurately without overconnected and broken stroke of the characters, thus, it has the wider range of applications and is more robust for document images comparing with other thresholding methods based on connectivity and background analysis. It is worth noting that our method is based on stroke width analysis, thus it can be used to process the document images with tables and line or block graphics and works well. Figs. 35 and 36 show an example of thresholding the document image with line graphics using our method. It may, however, not be suitable to threshold such gray-level images as scanned human or scenic
805
photographies. Our method is a local adaptive technique, which is the modi"ed logical level method. Its computation e$ciency is much higher than connectivity-based thresholding method, as our method only needs to calculate a run-length histogram directly of the gray levels in the selected regions, instead of thresholding the whole original image at each intensity level to get its run-length histogram as in connectivity-based method. Experimental results show that user-de"ned parameter in our method is robust for various document images. Although our method is designed to process document images with very poor quality, it can perform equally well and work more e$ciently on document images with good or normal quality because the background analysis of the document image, the run-length histogram construction and post-processing process are simpler. The average processing time for document images with good or normal quality can be reduced by 20}30%. Figs. 37 and 38 give an example of thresholding the document image with normal quality using our method.
Fig. 37. An original gray-scale document image with normal quality.
806
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807
Fig. 38. Binary document image extracted using our modi"ed logical level thresholding method from the original in Fig. 37.
Acknowledgements This work is supported by the Australian Research Council.
References [1] N. Ostu, A thresholding selection method from gray-level histogram, IEEE Trans. Systems Man Cybernet. SMC-8 (1978) 62}66.
[2] P.K. Sahoo, S. Soltani, A.K.C. Wong, A survey of thresholding technique, Comput. Vision Graphics Image Process. 41 (1988) 233}260. [3] J.N. Kapur, P.K. Sahoo, A.K.C. Wong, A new method for gray-level picture thresholding using the entropy of the histogram, Computer Vision Graphics Image Process. 29 (1985) 273}285. [4] S.U. Lee, S.Y. Chung, R.H. Park, A comparative performance study of several global thresholding techniques for segmentation, CVGIP 52 (1990) 171}190. [5] F. Deravi, S.K. Pal, Gray level thresholding using secondorder statistics, Pattern Recognition Lett. 1 (1983) 417}422.
Y. Yang, H. Yan / Pattern Recognition 33 (2000) 787}807 [6] J. Kittler, J. Illingworth, Threshold selection based on a simple image statistic, CVGIP 30 (1985) 125}147. [7] Y. Nakagawa, A. Rosenfeld, Some experiments on variable thresholding, Pattern Recognition 11 (1979) 191}204. [8] S. Boukharouba, J.M. Rebordao, P.L. Wendel, An amplitude segmentation method based on the distribution function of an image, Computer Vision Graphics Image Process. 29 (1985) 47}59. [9] S. Wang, R.M. Haralick, Automatic multithreshold selection, Computer Vision Graphics Image Process. 25 (1984) 46}67. [10] R. Kohler, A segmentation system based on thresholding, Computer Graphics Image Process. 15 (1981) 319}338. [11] N. Papamarkos, B. Gatos, A new approach for multilevel threshold selection, CVGIP: Graphical Models Image Process. 56 (5) (1994) 357}370. [12] J.M. White, G.D. Rohrer, Imager segmentation for optical character recognition and other applications requiring character image extraction, IBM J. Res. Dev. 27 (4) (1983) 400}411. [13] Y. Yasuda, M. Dubois, T.S. Huang, Data compression for check processing machine, Proc. IEEE 68 (7) (1980) 874}885. [14] O.D. Trier, A.K. Jain, Goal-directed evaluation of binarization methods, IEEE Trans. Pattern Anal. Mach. Intell. 17 (12) (1995) 1191}1201.
807
[15] O.D. Trier, T. Taxt, Evaluation of binarization methods for document images, IEEE Trans. Pattern Anal. Mach. Intell. 17 (3) (1995) 312}315. [16] J.S. Weszka, A. Rosenfeld, Threshold evaluation techniques, IEEE Trans, System Man Cybernet. SMC-8 (8) (1978) 622}629. [17] Y. Liu, S.N. Srihari, Document image binarization based on texture features, IEEE Trans. Pattern Anal. Mach. Intell. 19 (5) (1997) 540}544. [18] L. O'Gorman, Binarization and multithresholding of document images using connectivity, CVGIP: Graphical Models Image Process. 56 (6) (1994) 494}506. [19] T. Taxt, P.J. Flynn, A.K. Jain, Segmentation of document image, IEEE Trans. Pattern Anal. Mach. Intell. 11 (12) (1989) 1322}1329. [20] J.R. Parker, Gray level thresholding in badly illuminated images, IEEE Trans. Pattern Anal. Mach. Intell. 13 (8) (1991) 813}819. [21] O.D. Trier, T. Taxt, Improvement of &intergrated function algorithm' for binarization of document images, Pattern Recognition Lett. 16 (3) (1995) 277}283. [22] M. Kamel, A. Zhao, Extraction of binary character/ graphics images from grayscale document images, CVGIP: Graphical Models Image Process. 55 (3) (1993) 203}217. [23] E. Giuliano, O. Paitra, L. Stringa, Electronic character reading system, U. S. Patent 4, 047,15, 6 September, 1977.
About the Author*YIBING YANG received her B.S., M.S. and Ph.D. degrees from Nanjing University of Aeronautics and Astronautics, China, in 1983, 1986 and 1991 respectively, all in electrical engineering. From 1986 to 1988, she worked as an assistant professor in Nanjing University of Aeronautics and Astronautics, China. From 1992 to 1993 she was a postdoctor, and from 1994, she has worked as an associate professor, both in the Department of Radio Engineering at Southeast University, China. Meanwhile, she was on leave and worked as a research associate in Electronics Department of The Chinese University of Hong Kong from 1995 to 1996. She is currently working as a visiting scholar in the Department of Electrical Engineering, The University of Sydney, Australia. Her research interests include image and signal analysis, processing and compression, pattern recognition, medical and optical image processing, and computer vision application. About the Author*HONG YAN received his B.E. degree from Nanking Institute of Posts and Telecommunications in 1982, M.S.E. degree from the University of Michigan in 1984, and Ph.D. degree from Yale University in 1989, all in electrical engineering. From 1986 to 1989 he was a research scientist at General Network Corporation, New Haven, CT, USA, where he worked on developing a CAD system for optimizing telecommunication systems. Since 1989 he has been with the University of Sydney where he is currently a Professor in Electrical Engineering. His research interests include medical imaging, signal and image processing, neural networks and pattern recognition. He is an author or co-author of one book, and more than 200 technical papers in these areas. Dr. Yan is a fellow of the Institution of Engineers, Australia (IEAust), a senior member of the IEEE, and a member of the SPIE, the International Neural Network Society, the Pattern Recognition Society, and the International Society for Magnetic Resonance in Medicine.
Pattern Recognition 33 (2000) 809}819
A novel fuzzy logic approach to contrast enhancement H.D. Cheng*, Huijuan Xu Department of Computer Science, Utah State University, 401b Old Main Hall, Logan, UT 84322-4205, USA Received 1 February 1999; accepted 23 March 1999
Abstract Contrast enhancement is one of the most important issues of image processing, pattern recognition and computer vision. The commonly used techniques for contrast enhancement fall into two categories: (1) indirect methods of contrast enhancement and (2) direct methods of contrast enhancement. Indirect approaches mainly modify histogram by assigning new values to the original intensity levels. Histogram speci"cation and histogram equalization are two popular indirect contrast enhancement methods. However, histogram modi"cation technique only stretches the global distribution of the intensity. The basic idea of direct contrast enhancement methods is to establish a criterion of contrast measurement and to enhance the image by improving the contrast measure. The contrast can be measured globally and locally. It is more reasonable to de"ne a local contrast when an image contains textual information. Fuzzy logic has been found many applications in image processing, pattern recognition, etc. Fuzzy set theory is a useful tool for handling the uncertainty in the images associated with vagueness and/or imprecision. In this paper, we propose a novel adaptive direct fuzzy contrast enhancement method based on the fuzzy entropy principle and fuzzy set theory. We have conducted experiments on many images. The experimental results demonstrate that the proposed algorithm is very e!ective in contrast enhancement as well as in preventing over-enhancement. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Fuzzy logic; Fuzzy entropy; Contrast; Contrast enhancement; Adaptiveness; Over-enhancement; Under-enhancement
1. Introduction Contrast enhancement is one of the most important issues of image processing and analysis. It is believed that contrast enhancement is a fundamental step in image segmentation. Image enhancement is employed to transform an image on the basis of the psychophysical characteristics of human visual system [1]. The commonly used techniques for contrast enhancement fall into two categories: (1) indirect methods of contrast enhancement and (2) direct methods of contrast enhancement [2]. The indirect approach is to modify the histogram. In a poor contrast image, the intensities only occupy a small portion of the available intensity range. Through histo-
* Corresponding author. Tel.: #1-435-797-2054; fax: #1435-797-3265. E-mail address:
[email protected] (H.D. Cheng)
gram modi"cation, the original gray level is assigned a new value. As a result, the intensity span of the pixels is expanded. Histogram speci"cation and histogram equalization are two popular indirect contrast enhancement methods [3]. However, histogram modi"cation technique only stretches the global distribution of the intensity. To "t an image to human eyes, the modi"cation of intensity's distribution inside small regions of the image should be conducted. The basic idea of direct contrast enhancement method is to establish a criterion of contrast measurement and enhance the image by improving the contrast measure. Contrast can be measured globally and locally. It is more appropriate to de"ne a local contrast when an image contains textural information. Dhnawan et al. [4] de"ned a local contrast function in terms of the relative di!erence between a central region and a larger surrounding region of a given pixel. The contrast values are then enhanced by some of contrast enhancement
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 6 - 5
810
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
functions, such as the square root function, the exponential, the logarithm and the trigonometric functions. This method is more e$cient and powerful than the indirect method. However, this method may enhance noise and digitization e!ect for a small neighborhood, and may lose the details for a large neighborhood [5]. It is well known that the perception mechanisms are very sensitive to contours [6,7]. Beghdad and Negrate [5] improved the method in Ref. [4] by taking into account of edge detection operators, and de"ning the contrast with the consideration of edge information. Although this adaptive contrast enhancement method achieves a success in enhancing the major components of an image, the noise may be ampli"ed too, especially in a relatively #at region. Laxmikant Dash and Chatterji [2] proposed an adaptive contrast enhancement scheme which enhanced contrast with a lower degree of noise ampli"cation. The idea of this method is that the degree of contrast ampli"cation may vary with the severity of brightness change. The brightness variation is estimated by local image statistics. As a result, when the brightness change in a region is severe, the degree of enhancement is high, and conversely, the enhancement is relatively low. Therefore, the noise in #at region is reduced. However, overenhancement and under-enhancement occur sometimes. Fuzzy set theory has been successfully applied to image processing and pattern recognition [8]. It is believed that fuzzy set theory is a useful tool for handling the uncertainty associated with vagueness and/or imprecision. Image processing bears some fuzziness in nature due to the following factors: (a) information loss while mapping 3-D objects into 2-D images; (b) ambiguity and vagueness in some de"nitions, such as edges, boundaries, regions, features, etc.; (c) ambiguity and vagueness in interpreting low-level image processing results [9,10]. Moreover, the de"nition of contrast of an image is fuzzy as well. Therefore, it is reasonable to apply fuzzy set theory to contrast enhancement. Pal and King [11] used smoothing method with fuzzy sets to enhance images. They applied contrast intensi"cation operations on pixels to modify their membership values. Li and Yang [12] used fuzzy relaxation technique to enhance images. At each iteration, the histogram was modi"ed. Both Refs. [11,12] are indirect contrast enhancement approaches. In this paper, we will use maximum fuzzy entropy principle to map an image from space domain to fuzzy domain by a membership function, and then apply the novel, adaptive, direct, fuzzy contrast enhancement algorithm to conduct contrast enhancement.
2.1. Image representation in fuzzy set notation An image X of size M]N having gray levels ranging from ¸ to ¸ can be modeled as an array of fuzzy min max singletons [8,11]. Each element in the array is the membership value representing the degree of brightness of the gray level l (l"¸ , ¸ #1, 2, ¸ ). In the fuzzy set min min max notation, we can write X"Mk (x )/x , k"1, 2, 2, M, s"1, 2, 2, NN, X ks ks
(1)
where k (x ) denotes the degree of brightness possessed X ks by the gray level intensity x of the (k, s)th pixel. ks 2.2. Entropy of fuzzy set The degree of ambiguity of an image X can be measured by the entropy of the fuzzy set, which is de"ned as [8,11]: 1 M N H(X)" + + S (k (x )), n X kl MN k/1 l/1 where S ( ) ) is a Shannon function n
(2)
S (k (x ))"!k (x )log k (x ) n X kl X kl 2 X kl !(1!k (x ))log (1!k (x )) X kl 2 X kl k"1, 2, 2, M, l"1, 2, 2, N.
(3)
H(X)(0(H(X)(1) measures the fuzzy uncertainty, caused by the inherent variability and/or fuzziness rather than the randomness. Shannon's function Sn( ) ) increases monotonously in [0, 0.5] and decreases monotonously in [0.5, 1] with a maximum at k (x)"0.5. X 2.3. Membership function Membership function characterizes the fuzziness in a fuzzy set. It essentially embodies all fuzziness for a particular fuzzy set, and its description is essence of fuzzy property or operation. The membership function of a fuzzy set maps all the elements of the set into real numbers in [0, 1]. The larger values of the membership represent the higher degrees of the belongings. That is, the membership value represents how closely an element resembles an ideal element. The most commonly used membership function for a gray level image is the S-function de"ned as [13] k (x )"S(x , a, b, c) X mn mn
2. Fuzzy entropy and membership function In this section, the de"nition of an image using the fuzzy set notation will be explained and the fuzzy entropy, a measure of the fuzziness, will be de"ned.
G
0, (xmn~a)2 , (b~a)(c~a) " 2 1! (xmn~c) , (c~b)(c~a) 1,
0)x )a, mn a)x )b, mn b)x )c, mn x *c. mn
(4)
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
where a, b, and c are the parameters which determine the shape of the S-function. Notice that in this de"nition, b is not necessarily the midpoint of the interval [a, c], and can be any point between a and c.
3. Proposed method The main purpose of this paper is to enhance the contrast in fuzzy domain e!ectively and adaptively. The "rst step is to map an image from space domain to fuzzy domain using the S-function as the membership function. Then we propose a more powerful and adaptive fuzzy contrast enhancement method than adaptive contrast enhancement (ACE) method with adaptive power variation and interpolation techniques [2]. The proposed approach employs fuzzy entropy principle and fuzzy set theory. It can automatically determine the related parameters according to the nature of the image. 3.1. Mapping an image to fuzzy domain As mentioned before, the performance of fuzzy enhancement depends on the membership function. The selection of parameters a, b and c for S-function becomes an important issue since these parameters decide the shape of the membership function, S-function. The criterion to determine the membership function in this paper is to reduce noise and minimize the information loss. Furthermore, the determination of the membership function should be based on the characteristics of the image. Algorithm 1. Assume the image has gray levels from ¸ to ¸ . The detailed procedure to determine paramin max meters a and c is described as follows. 1. Obtain the histogram His(g). 2. Find the local maxima of the histogram, His (g ), His (g ), 2, His (g ). max 1 max 2 max k 3. Calculate the average height of the local maxima. 1 k (g)" + His (g ). max max i k i/1 4. Select a local maximum as a peak if its height is greater than the average height His (g), otherwise, max ignore it. 5. Select the "rst peak P(g ) and the last peak P(g ). 1 k 6. Determine the gray levels B and B , such that the 1 2 information loss in the range [¸ , B ] and [B , ¸ ] min 1 2 max equals to f , (0(f (1), that is, 1 1 B1 + His(i)"f , 1 i/Lmin Lmax + His(i)"f . 1 i/B2 His
811
7. Determine parameters a and c as below: Let f "constant, ( f (1) 2 2 (a) a"(1!f ) (g !¸ )#¸ 2 1 min min if (a'B ) 1 a"B 1 (b) c"f (¸ !g )#g 2 max k k if (c(B ) 2 c"B 2 In our experiments, f and f are set to 0.01 and 0.5, 1 2 respectively. The gray levels less than the "rst peak of the histogram may correspond to the background while the gray levels greater than the last peak may relate to noise. The idea behind the above algorithm is to reduce noise and maintain enough information of the image. Since the peaks of the histogram contain essential information, we cover the range between the two limits to avoid important information loss. According to information theory [8,11}13], entropy measures the uncertainty of an information system. A larger value of the entropy of a system indicates more information in the system. The selection of parameter b is based on the maximum fuzzy entropy principle. That is, we should compute the fuzzy entropy for each b, b3[a#1, c!1], and "nd an optimum value b such opt that H (X, a, b , c)"maxMH(X; a, b, c) D ¸ max opt min )a(b(c)¸ N. max After b is determined, the S-function is decided and will opt be used to map the image to fuzzy domain. 3.2. Adaptive fuzzy contrast enhancement with adaptive power variation ACE combines local contrast measurement with contour detection operator, therefore, it is very e$cient for contrast enhancement. Also the improved version, adaptive power variation method, uses local statistics and successfully reduces noise ampli"cation. However, some parameters, such as minimum and maximum ampli"cation constants, were not determined automatically. Furthermore, they were not determined according to the characteristics of the image. Thus, the method may overenhance some images while it may under-enhance others. Moreover, in the regions that are #at, it may need deenhancement of the contrast instead of enhancement since these regions usually are associated with background or noise. The goal of our proposed method is to take care of the fuzzy nature of an image and the fuzziness in the de"nition of the contrast to make the contrast enhancement more adaptive and more e!ective, and to avoid over-enhancement/under-enhancement.
812
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
Algorithm 2. Given an M]N image X with ¸ di!erent gray levels, and parameters a, b and c selected by the opt above method, the adaptive fuzzy contrast enhancement can be described as follows. Step 1. Construct the membership k which measures X the fuzziness of an image X: k (x )"S(x , a, b , c), X mn mn opt n"0, 1, 2, N~1.
m"0, 1, 2, M~1,
Step 2. For each pixel (m, n) with k (x ), apply edge X mn gradient operator, such as Laplacian or Sobel operator, and "nd edge value of the image in fuzzy domain d mn . Here, we use Sobel operator. k(x ) Step 3. Compute the mean edge value E mn , within k(x ) a window = centered on pixel (m, n), using the mn formula:
N
E mn " + (k(x )d mn ) + d mn . k(x ) mn k(x ) k(x ) (m, n)|Wmn (m, n)|Wmn Step 4. Evaluate the contrast related to the membership value k(x ), mn C mn "Dk(x )!E mn D/Dk(x )#E mn D. k(x ) mn k(x ) mn k(x ) Step 5. Transform the contrast C mn to C{ mn k(x ) k(x ) C@ mn "(C mn))pmn, k(x ) k(x where p is the ampli"cation constant, mn 0(p (1 for enhancement, and p '1 for demn mn enhancement. Step 6. Obtain the modi"ed membership value k@(x ) mn using the transformed contrast C@ mn : k(x ) k@(x )" mn E mn (1!C@ mn /(1#C@ mn ) if k(x ))E mn , k(x ) k(x ) k(x ) mn k(x ) (5) E mn (1#C@ mn /(1!C@ mn ) if k(x )'E mn . k(x ) k(x ) k(x ) mn k(x ) Step 7. Defuzzi"cation: transform the modi"ed membership value k@(x ) to the gray level by the formula: mn
G
G
The ampli"cation constant can be determined by the brightness variation which is estimated by local image statistics [2]. We use fuzzy logic to perform contrast enhancement. Given a window = with size S ]S , the mn m n fuzzy entropy of the brightness in the region = is mn calculated by (7) o "! + (P log P )/log (S S ), ij 2 ij 2 m n mn (i,j)|Wmn where P "b /+ b and b "k(x )d uv , k(x ) uv uv k(x ) uv ij ij (u,v)|Wmn uv is the membership, and d ( uv is the edge value. kx ) To obtain the ampli"cation constant p for contrast mn enhancement, the following algorithm is proposed. Algorithm 3. Let His(g), g"¸ , 2, ¸ , be the histomin max gram of a given image. 1. Determine the ranges of the low degree of contrast enhancement [k(a), k(g )] and [k(g ), k(c)], g and l h l g are the gray levels that meet the following condih tions: +gli His(g ))f, and +c i h His(g ))f, where g /a i g /g i f(1, which indicates the percentage of pixels in the range of the low degree of contrast enhancement. We use 0.005 for f here. 2. Compute the fuzzy entropy o for each window mn centered on pixel (m, n) under consideration. Then "nd the maximum and minimum fuzzy entropy o and max o , respectively, through the entire image. min 3. The power value p is computed by mn p " mn k(gl) [p #(omn~omin)(pmax~pmin)], k(x )(k(g ), omax~omin mn l k(xmn) min mn min max min p #(o ~omax)(p min~p ), k(g ))k(x ))k(g ), min o ~o l mn h k(xmn)[p #(omn~omin)(pmax~pmin)], k(x )'k(g ). k(gh) min omax~omin mn h (8)
G
where p "(c!a)/2(¸ !¸ ), p "1, and min max min max o and o are the maximum and minimum values of max min entropy through all sub-regions of the entire image, respectively.
¸ , min ¸ #Lmax~LminJk@(x )(b!a)(c!a), c~a mn x@ " min mn ¸ #Lmax~Lmin (c!a!J(1!k@(x ))(c!b)(c!a)), min c~a mn ¸ , max For the proposed algorithm, the determination of ampli"cation constant p in step 5 is quite critical. We mn improve the performance by the following considerations: (1) make the determination of constant p more mn adaptive and automatic; (2) decrease the degree of enhancement in the regions which are either too dark or too bright; (3) enhance/de-enhance the images based on the nature of the local regions of the images.
k@(x )"0, mn 0(k@(x ))(b~a) , mn (c~a) (b~a)(k@(x )(1, (c~a) mn k@(x )"1. mn
(6)
Since the value of p signi"cantly a!ects the degree min of the enhancement of an image, the determination of p should relate to the contrast of the given image. If min the contrast of the original image is relatively low, p should be small. Conversely, p should be large to min min avoid over-enhancement. We exploit the width of the histogram to estimate the relative global contrast of an image. If the contrast is low, the width of the histogram is
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
narrow, therefore, p will be small and the degree of the min contrast enhancement will be high. In the homogeneous region, the ampli"cation constant p should be large mn and close to p . That is, there should be no enhancemax ment, or even de-enhancement should be performed on the homogeneous regions, according to the requirement of the applications. The basic idea behind the ampli"cation constant is that if o is low, which implies that the brightness mn variation is severe, and the degree of enhancement is high, hence, ampli"cation constant p should be small. mn Conversely, if o is high, the respective region is relativemn ly #at, or k(x ) is inside the range of low degree enhancemn ment, then p should be large. mn
813
Fig. 1. Sample points (*) and resultant point ( ) ).
3.3. Speed up by interpolation The adaptive fuzzy contrast enhancement method discussed in subsection B requires extensive computation when the window becomes large, since the modi"ed gray level is obtained by convoluting the window pixel by pixel. A signi"cant speed up can be obtained by interpolating the desired intensity values from the surrounding sample mapping [2,14]. The idea of the interpolation technique is that the original image is divided into subimages and the adaptive fuzzy contrast enhancement method is applied to each sub-image to obtain the enhanced sample mapping, and the resultant mapping of any pixel is interpolated from the four surrounding sample mappings. In this way, we only need to calculate the sample mapping using the proposed algorithm, which requires more computation time and the values of other pixels can be obtained by interpolation, which requires much less time. Given a pixel at location (m, n) with membership value k (x ), the interpolated result is (Fig. 1) X mn f (k (x ))"abf (k (x ))#a(1!b) f (k (x )) X mn ~ ~ X mn ~` X mn #(1!a)bf (k (x )) `~ X mn #(1!a)(1!b) f
(k (x )), `` X mn
(9)
where a"(m !m)/(m !m ) and b"(n !n)/ ` ` ~ ` (n !n ), f is the sample mapping at location ` ~ `~ (m , n ), which is the upper right of (m, n). Similarly, the ` ~ subscripts ##, !#, and ! ! are for the locations of the pixels of the lower right, low left and upper left of (m, n), respectively. In the interpolative technique, the original image is divided into non-overlapping regions, CR (i"0, 1, 2, ij N , j"0, 1, 2, N ), called contextual regions (CR). x y Every resultant pixel is derived by interpolating four surrounding mappings, each associated with a contextual region. Thus, the result of each pixel is a!ected by a region which is the union of the four surrounding contextual regions, called equivalent contextual region (ECR) (Fig. 2). The mean edge membership value E mn and k(x )
fuzzy entropy o are calculated for each contextual mn region. The region, which is made up of the mapping points, is a rectangle concentric with contextual region CR , but twice its size in each dimension. This region is ij termed as mapping region, MR (Fig. 2). The resultant ij mean edge value and fuzzy entropy are used to calculate the contrast, C mn , ampli"cation constant p and modik(x ) mn "ed membership value k@(x ) with respect to one of the mn four mapping regions. After all of four mappings have been obtained, the "nal result is calculated by taking bilinearly weighted average of these four results. Consider an image with contextual regions CR (i"0, ij 1, 2, N , j"0, 1, 2, N ) of size S ]S . Every mapping x y x y for the pixels through the whole image will form a subset of the original image. Four mappings will form four sub-images that consist of alternate contextual regions (Fig. 3). These four sub-images are named intermediate images IM (x, y) where k"0, 1 and l"0, 1, which corkl respond to CR with odd or even i and j, respectively. ij Notice that only in the central area, every pixel is involved in all four intermediate images, while the pixel located on the border or in the corner, it may be in one or two intermediate images. The bilinear weights, which are used to obtain the resultant membership value, form a cyclic function of x and y with a period in each dimension equal to 2S and x 2S , respectively. The two-dimensional period function is y de"ned by =(x, y)"= (x)= (y), (10) x y x, 0)x)S , x (11) = (x)" Sx x 2Sx~x, S (x(2S , Sx x x y, 0)y)S , y = (y)" Sy (12) y 2Sy~y, S (y(2S . Sy y y The detailed algorithm for adaptive fuzzy contrast enhancement with power variation and interpolation techniques is described as follows.
G G
814
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
/H Compute the mean edge value and fuzzy entropy H/ for i"0 to M!1 for j"0 to N!1 M E ij " + (k(x )d ij )/ + d ij k(x ) ij k(x ) k(x ) (i, j)|CRij (i, j)|CRij ¸ "! + (P log P )/log (S S ) ij 2 ij 2 x y ij (i, j)|CRij
Fig. 2. Contextual regions (CR), mapping regions (MR), and equivalent contextual regions (ECR).
Fig. 3. The image is divided into four subimages, IM (} ) } ) }), 00 IM (- - - - - -), IM (22), IM (} } }), their common 01 10 11 central area, the border regions, HB ,
Algorithm 4. Input an M]N image with gray level ¸ to ¸ , parameters for S-function, a, b , and c, min max opt the low degree of enhancement ranges [¸ , g ], and min l [g , ¸ ], and ampli"cation constants p and p . h max max min /HInitialize k@(x, y), which stores the enhanced membership value H/ for i"0 to M!1 for j"0 to N!1 k@[i][j]"0; /H Mapping an image into fuzzy domain H/ for i"0 to M!1 for j"0 to N!1 k (x )"S(x, a, b , c) X ij opt
N /H Compute the values of contextual regions H/ for k"0 to 1 for l"0 to 1 M for i"k to N !1 with step size of 2 x for j"l to N !1 with step size of 2 y M compute C ij "Dk(x )!E ij D/ k(x ) ij k(x ) D(k(x )#E ij D ij k(x ) compute C@ ij "(C ij )pij, where p is k(x ) mn k(x ) computed by Eq. (8) compute the modi"ed membership value M(i, j) N /H end of i and j loop H/ /H Weight the intermediate images and sum results H/ for all (x, y) in the central area k@(x, y)"k@(x, y)#M(x, y)= (x#kS x x #S /2)= (y#lS #S /2); x y y y for all (x, y) in the horizontal borders k@(x, y)"k@(x, y)#M(x, y)= (x#kS #S /2); x x x for all (x, y) in the vertical borders k@(x, y)"k@(x, y)#M(x, y)= (y#lS #S /2); y y y for all (x, y) in the corners k@(x, y)"M(x, y); N/H end of k, l loop H/ /H Defuzzi"cation H/ for i"0 to M!1 for j"0 to N!1 compute modi"ed gray level x@ using Eq. (6). ij
4. Experimental results and discussions We have applied the proposed algorithm to a variety of images. As mentioned before, the commonly used techniques for contrast enhancement can be categorized as (1) indirect methods of contrast enhancement and (2) direct methods. Histogram speci"cation and histogram equalization are two most popular indirect contrast enhancement methods [3]. Laxmikant Dash and Chatterji [2], Dhnawan et al. [4], and Beghdad and Negrate [5] have discussed and shown that the direct contrast enhancement approaches are better than indirect contrast enhancement approaches. The newly developed approach, the ACE with adaptive power variation approach,
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
815
Table 1 The parameters for the images Image Fig. Fig. Fig. Fig. Fig. Fig.
4 5 6 7 8 9
(ed100) (airplane) (couple) (light-tower) (window) (lena)
Size
a
b
c
Low enhancement range
p min
p max
CR size
352]240 512]512 256]256 480]320 384]256 512]512
13 107 0 48 27 24
14 237 1 141 138 214
146 238 158 201 204 215
[1, [0, [0, [0, [0, [0,
0.26 0.26 0.31 0.30 0.35 0.37
1 1 1 1 1 1
8]8 16]16 16]16 8]8 16]16 16]16
Fig. 4. (a) The original image ed100 with size 352]240. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
18], [118, 255] 142], [223, 255] 2], [148, 255] 82], [194, 255] 30], [188, 255] 36], [212, 255]
produces better results [2]. In order to demonstrate the performance of the proposed method, we compared the experimental results of the proposed approach with those of ACE with adaptive power variation approach. Here, we present a few of the experimental results. The sizes of the images, parameters a, b and c, ranges of low degree enhancement, low and high limits of ampli"cation constant p and p , and the sizes of contextual regions min max (CR) are listed in Table 1. Figs. 4(a)}9(a) are the original images, Figs. 4(b)}9(b) and 4(c)}9(c) are the results obtained by the method in Ref. [2] and the proposed method, respectively. Unless mentioned, the ampli"cation constants p and p min max used for the method in Ref. [2] are the same as that used for the proposed method, as shown in Table 1, for the comparison purpose. Fig. 4(a) is a low-contrast, dark image. After employing the proposed method, the three major components of the image are well enhanced as shown in Fig. 4(c). The contours of the cage, and the yarn on the right of the basket are distinct in (c), but the counterparts in (a) and (b) are quite vague. Moreover, the pet in the cage, the basket and the statue are more apparent than that in (a) and (b). While 4(b) under-enhanced most of the parts of the image, it over-enhanced the rings of the right-hand side of the image. The original image in Fig. 5(a) is relatively bright and blur. Comparing (c) with (b), the contrast enhancement is evident, especially the letters and the emblem on the airplane, the shadow on the surface of the mountain and the ridges of the mountains are better enhanced. On the other hand, several areas of image (b) were over-enhanced. Comparing the images in Figs. 6(a)}(c), we can see that the contrast was enhanced signi"cantly through the entire image in (c). The contour of the girl becomes clear, the face of the man, the sofa and the carpet are well outlined, and the background is much more distinct in (c). Based on the results shown in Figs. 4}6, we can see that the proposed method is more e!ective than the method in Ref. [2]. Now, we will show that our proposed method not only e!ectively enhances the contrast, but also significantly reduces or even eliminates the over-enhancement. In the light tower image series, the result of Fig. 7(c) is better than 7(b) due to the following facts: (1) the details
816
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
Fig. 5. (a) The original image airplane with size 512]512. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
Fig. 6. (a) The original image couple with size 256]256. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
817
Fig. 7. (a) The original image with size 480]320. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
Fig. 8. (a) The original image with size 384]256. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
of (c), such as the houses, rocks, water, clouds, etc. are more distinct than that of (a) and (b); (2) the apparent distortions on the upper part of the tower and on the roofs of the houses are shown in (b). These are caused by over-enhancement. For Fig. 8(c), the background is uniform after enhancement, and major components of the image are much clearer than that in (a) and (b). However, in Fig. 8(b), the background becomes very noisy and the contour of
the leaves of the trees are non-uniform and unnatural. It is again caused by over-enhancement. The over-enhancement problem of ACE method can also be found in Fig. 9. In (b), the ampli"cation constants p and p are 0.4 and 0.85, respectively, which are min max relatively higher than the values used in Ref. [2]. Even though these values are relatively higher, still the overenhancement is very strong. The whole image in (b) is noisy. Some homogeneous areas (face and shoulder)
818
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819
become non-homogeneous and not natural, and the background has some unnatural, noisy spots as well. However, in (c), the main features of the image are enhanced without amplifying noise. From the above experiments, we can conclude that the proposed approach outperforms ACE method with the following advantages: (1) The contrast enhancement is more e!ective with a better adaptability. (2) The overenhancement is signi"cantly decreased or eliminated. The superior performance of the proposed approach is due to the following reasons: (a) The proposed approach takes care of the fuzziness in the images by using fuzzy set theory and fuzzy entropy principle. (b) The necessary parameters are determined automatically based on the nature of the images. (c) The proposed approach uses global and local information to decide enhancement/deenhancement, therefore, it can prevent over-enhancement e!ectively. (d) The proposed approach has better adaptive capability by employing p (refer Eq. (8)). mn 5. Conclusions Contrast enhancement is a fundamental step in image segmentation, and plays an important role in image processing, pattern recognition, and computer vision. The commonly used techniques for contrast enhancement fall into two categories: (1) indirect methods and (2) direct methods. Direct approach to contrast enhancement is more useful because it has considered both global and local information of the image. Fuzzy logic has been found many applications in image processing and pattern recognition, etc. In this paper, we propose a novel, adaptive, direct, fuzzy contrast enhancement method based on the fuzzy entropy and fuzzy set theory. The experimental results have demonstrated that the proposed algorithm is more adaptive and e!ective for contrast enhancement compared to ACE method. Moreover, it signi"cantly reduces the over-enhancement/under-enhancement due to its better adaptive capability. The proposed approach may "nd wide applications in image processing, pattern recognition and computer vision. References
Fig. 9. (a) The original image with size 512]512. (b) The image enhanced by ACE method. (c) The image enhanced by the proposed method.
[1] F. Neycenssac, Contrast enhancement using the Laplacian-of-a-Gaussian Filter, Graphic Models Image Process. 55 (1993) 447}463. [2] Laxmikant Dash, B.N. Chatterji, Adaptive contrast enhancement and de-enhancement, Pattern Recognition 24 (1991) 289}302. [3] R.C. Gonzalez, R.E. Woods, Digital Image Processing, 3rd Edition, Addison-Wesley, Reading, MA, 1992. [4] A.P. Dhnawan, G. Buelloni, R. Gordon, Enhancement of mammographic features by optimal adaptive neighborhood image processing, IEEE Trans. Med. Imaging 5 (1986) 8}15.
H.D. Cheng, H. Xu / Pattern Recognition 33 (2000) 809}819 [5] A. Beghdad, A.L. Negrate, Contrast enhancement technique based on local detection of edges, Comput. Vision Graphics Image Process. 46 (1989) 162}174. [6] A. Rosenfeld, A. Kak, Digital Picture Processing, 2nd Edition, Academic Press, New York, 1992. [7] T.N. Cornsweet, Visual Perception, Academic Press, New York, 1970. [8] S.K. Pal, D.K.D. Majumder, Fuzzy Mathematical Approach to Pattern Recognition, Wiley, New York, 1986. [9] H.D. Cheng, C.H. Chen, Hui-Hua Chiu, Huijuan Xu, Fuzzy homogeneity approach to image thresholding and segmentation, IEEE Trans. Image Process. 7 (7) (1998) 1084}1088. [10] H.D. Cheng, J.R. Chen, J. Li, Threshold selection based on fuzzy c-partition entropy approach, Pattern Recognition 31 (7) (1998) 857}870.
819
[11] S.K. Pal, R.A. King, Image enhancement using smoothing with fuzzy sets, IEEE Trans. System Man Cybernet. 11 (7) (1981) 404}501. [12] H. Li, H.S. Yang, Fast and reliable image enhancement using fuzzy relaxation technique, IEEE Trans. System Man Cybernet. 19 (5) (1989) 1276}1281. [13] N.R. Pal, S.K. Pal, Entropy: a new de"nition and its applications, IEEE Trans. System Man Cybernet. 21 (5) (1991) 1260}1270. [14] S.M. Pizer, E.P. Amburn, I.D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B.H. Romeny, J.B. Zimmerman, K. Zniderveld, Adaptive histogram equalization at its variations, Comput. Vision Graphics Image Process. 39 (1987) 355}368.
About the Author*HENG-DA CHENG received Ph.D. degree in Electrical Engineering from Purdue University, West Lafayette, IN in 1985. Now he is a Full Professor, Department of Computer Science, Utah State University, Logan, Utah. Dr. Cheng has published over 170 technical papers and is the co-editor of the book Pattern Recognition: Algorithms, Architectures and Applications (World Scienti"c Publishing Co., 1991). His research interests include image processing, pattern recognition, computer vision, arti"cial intelligence, medical information processing, fuzzy logic, genetic algorithms, neural networks, parallel processing, parallel algorithms, and VLSI architectures. Dr. Cheng is the chairman of the First International Conference on Computer Vision, Pattern Recognition and Image Processing (CVPRIP'2000), 2000, was the chairman of the First International Workshop on Computer Vision, Pattern Recognition and Image Processing (CVPRIP'98), 1998, and was the Program Co-Chairman of Vision Interface '90, 1990. He served as a program committee member and session chair for many conferences, and as a reviewer for many scienti"c journals and conferences. Dr. Cheng has been listed in Who's Who in the World, Who's Who in America, Who's Who in Communications and Media, Who's Who in Finance and Industry, Who's Who in Science and Engineering, Men of Achievement, 2000 Notable American Men, International Leaders in Achievement, Five Hundred Leaders of Inyuence, International Dictionary of Distinguished Leadership, etc. He is appointed as a Member of the Advisory Council, the International Biographical Center, England, and a Member of the Board of Advisors, the American Biographical Institute, USA. Dr. Cheng is a Senior Member of the IEEE Society, and a Member of the Association of Computing Machinery. Dr. Cheng is also an Associate Editor of Pattern Recognition and an Associate Editor of Information Sciences. About the Author*HUIJUAN XU received BS and MS degrees in Chemistry from Peking University, China, in 1990 and 1993, respectively. She is pursuing her Master's degree in the Department of Computer Science, Utah State University. Her research interests include image processing and pattern recognition. Currently she is working in a software company.
Pattern Recognition 33 (2000) 821}832
Applying deformable templates for cell image segmentation A. Garrido*, N. PeH rez de la Blanca Departamento de Ciencias de la Computacio& n e I.A. ETS Ingeniern& a Informa& tica, Universidad de Granada, 18071 Granada, Spain Received 24 June 1998; accepted 29 March 1999
Abstract This paper presents an automatic method, based on the deformable template approach, for cell image segmentation under severe noise conditions. We de"ne a new methodology, dividing the process into three parts: (1) obtain evidence from the image about the location of the cells; (2) use this evidence to calculate an elliptical approximation of these locations; (3) re"ne cell boundaries using locally deforming models. We have designed a new algorithm to locate cells and propose an energy function to be used together with a stochastic deformable template model. Experimental results show that this approach for segmenting cell images is both fast and robust, and that this methodology may be used for automatic classi"cation as part of a computer-aided medical decision making technique. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Deformable template; Automatic image segmentation; Hough transform; Stochastic deformable template
1. Introduction Automatic cell segmentation is one of the most interesting segmentation problems due both to the complex nature of the cell tissues and to problems inherent to video microscopy. Object multiplicity, short range of grey levels, clutter, occlusion and non-random noise are some examples of the di$culties present in this kind of images. One common segmentation scheme is image thresholding [1,2], which can be regarded as pixel classi"cation. A feature value, such as grey level intensity, is associated with each pixel. This value is compared to the threshold in order to classify the pixel as object or background. The threshold may be obtained globally, as a unique value used for the entire image, or locally, as a surface which provides a di!erent threshold value to be applied to each pixel. Here, these approaches are not applicable because the grey level intensity of a cell image does not vary only on the boundary, but also within cells and throughout the background (see Fig. 1).
* Corresponding author. Tel.: #34-958-242837; fax: #34958-243317. E-mail addresses:
[email protected] (A. Garrido),
[email protected] (N. PeH rez de la Blanca)
There exist other methods that use region-based information, but they are not applicable here because not all of the parts of the same tissue are equally stained. Darker background regions may be misclassi"ed as cells and lighter cell regions may be misclassi"ed as background. On the other hand, edge information is also available for use. Traditionally, edge-based segmentation has been divided into two independent stages: edge detection and edge linking. Under Marr's paradigm, boundary extraction is conventionally treated as a set of independent problems, where each of them has input information, a method to process it, and output information. This one-way #ow of the information (from low to high semantic levels), may yield wrong results because of the error propagation: f If any information is lost at one step, it cannot be recovered and used in the next steps. f If false information is obtained at one step, there will consequently be more (and probably larger) errors in the next steps. To improve this situation, every step of the process can be studied in depth. Obviously, many methods to re"ne each of them have been described in the literature. The
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 1 - 6
822
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
a model in the presence of noise, clutter and occlusion is an ill-posed problem [3]). By using a deformable template, we can evaluate the validity of a solution from the gradient vectors determined by the contour (see for example Refs. [4,5]). Then, it is not necessary to postprocess the gradient image (thresholding and linking) to select edge points. However, as seen below, when we consider the output of an edge detector (considering the points with a high gradient), very stable information about object locations can be obtained (arriving at an aproximate solution). This information allows us to use new and more accurate information (adding points with a low gradient) to re"ne this approximation (see Fig. 2). Thus, good edges are used to determine the object location and the contour model is used to select the set of edges involved in this object location. The result of this automatic process is the accurate location of the contour. Our images (see Fig. 1) represent cytologies (acquired through a CCD camera adapted to an optical microscope) stained with the Papanicolau technique. These images share the following characteristics:
Fig. 1. Cell image.
Fig. 2. General scheme of the process. The numbers indicate the di!erent stages. Dashed lines indicate that di!erent semantic levels of information are used to obtain the "nal solution.
methodology proposed in this paper attempts to reduce the problem of error propagation by reusing low level information by means of a contour model (we propose to use this model because contour extraction without such
f An absence of high contrast. It is well known that microscopical biomedical images have a short range of grey levels. f Many cluttered objects in a single scene. A high number of overlapping objects makes image segmentation dif"cult. f Low quality. Traditional staining techniques like that of Papanicolau introduce a lot of inhomogeneities into the images, where not all of the parts of the same tissue are equally stained. Many deformable models have been proposed in the literature in which only one object is present and/or initialization by hand is needed. When dealing with images of cells great care must be taken in how the model is initialized. The Hough transform has been a very popular research topic for many years [6}11] because it is able to automatically segment images with problems such as a short range of grey levels, clutter, occlusion and nonrandom noise. However, high computational cost and the need for a very precise parametric description of the shape impose severe restrictions on its applicability to biomedical objects. We show how the classical Hough transform may be reformulated to be used for automatic deformable template initialization in biomedical images. In this paper, we present the main components of the proposed methodology and the results from cell images that show the validity of the algorithms. Section 2 describes a method to obtain the edge information that will be used to obtain a "rst approximation of the cell contour. The algorithm yielding this approximation, using a reformulated Hough Transform, is discussed in Section 3. Section 4 presents an energy function used as external
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
823
information from the image to get local deformations. Finally, Sections 5 and 6 show experimental results (with discussion) and draw some conclusions.
2. Detecting and postprocessing edges From a semantic point of view, several levels of edge information can be distinguished (pixel, pixel and its neighbourhood, straight line segment, etc.). Because of image noise and the deformation of biological objects, pixel information is very unstable and therefore, it is desirable to use a higher level in which much of the noise has been taken out. In this work, we have selected straight line segments obtained from a set of good edges (high gradient) because it is stable, easy to obtain, and follows the proposed object model. Note that the method presented in this paper may be generalized in order to use other kinds of information (for example, using another level, several levels, or several sources of information). Obviously, if we use only good edges and simplify the contour model in order to guarantee object detection, the solution obtained will be approximated. We propose to calculate this "rst approximation and then to improve it by adding other unused edge information (small chains or low gradient points). Thus, information from both the image and the object model contributes to the desired segmentation (see Fig. 2). Because of the image characteristics, the contours of the cells may appear as split chains of edge points and false chains may be obtained. To obtain better results, we can use a smoothing operation (for example, a Gaussian "lter which characteristically is optimal in terms of the smoothing and location in both the spatial and frequency domains) and thresholding with hysteresis. This helps to avoid the breaking-up of the edge contour [12]. Obviously, other edge detectors can be used to improve the edge map (see for example Ref. [13]) or even several edge maps (as shown in Section 5). We use the Canny edge detector [12], a well-known algorithm, which allows us to obtain good results and to show the robustness of our algorithms. Before starting the locating process, we have to postprocess the edges. The straight line segment was selected as the input information to obtain the "rst location (although the method may be generalized to other possibilities). The postprocessing stage consists of:
not know how to select the next point, and so the multiple points are "rst deleted. To do this, we have developed an algorithm that basically studies the length of the three chains arriving at a multiple point; we then remove the branch lacking su$cient length (probably caused by noise). If the three chains are long we delete the point and its neighbourhood, thus obtaining di!erent chains in this stage of the process (see Fig. 3). Once these points are suppressed, the chains can be codi"ed; however, all chains must be oriented in a run direction according to the mode described. In our experiments, a counterclockwise direction is used. The correct run direction of the chains is determined by the sign of the vectorial product between the tangent line to the chain and the gradient vector at this point. The two vectors form a right angle except in precision or noise problems. The run direction can be calculated without an exact estimation of the tangent vector. Once the chains have been obtained, we have to decide when a chain corresponds to a straight line segment. We have selected the simplest method, where the maximum distance between each of the points along the chain and the straight line segment is less than a given threshold. Obviously, this threshold depends on both the noise of the image and the object that is being looked for. Experimental results with several di!erent objects show that the threshold value may be selected easily in a robust way.
f Preparing the chains (this stage consists of processing the multiple points and the chain orientation). f Determining the location of the straight line segments.
3. Locating cells
Multiple points correspond to di!erent chains crossing the same point on the image (that is, three chains joined at the same point). When we process the chains to search for their extremities, arriving at a multiple point, we do
Our goal is to "nd a set of parameters de"ning the location of the objects being looked for. Because of problems in the images (noise, overlapping objects, etc.), a natural way of doing this is to use a powerful technique
Fig. 3. Processing multiple points.
824
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
called the Hough transform (HT) [9]. Unfortunately, it is impossible to apply HT directly in this context because the level of noise may be very high and the cells characteristically present local deformations. Thus, the generalized Hough transform (GHT) [6] must be reformulated. Consider the curve C(t)"f (a ,2, a , t)(0)t)2p) 1 n which de"nes the set of shapes to be located (n unknown parameters must be determined). The de"nition of the HT assumes an n-dimensional discretized Hough space H(A ,2, A ) that is composed of
Fig. 4. Uncertainty region.
Fig. 5. Segments to de"ne a circle.
that, from a practical point of view, it is not necessary to determine the uncertainty region exactly because a larger region in which it is included can be used. Obviously, a larger region implies a deterioration of the precision, but we only need an approximate location which will be optimized in the next stage. Cell segmentation is an interesting problem that illustrates this methodology. A circle can be used as the shape to locate the cells. Obviously, an elliptical representation is more precise but we will include this possibility as a deformed circle. Fig. 5 shows that: (a) The circle may be de"ned using 8 segments. Let R be the radius and l the length of each segment. If we consider this object model, it might seem that the cells are equally sized; however, note that our method detects deformed instances of the object model by using an uncertainty region, and therefore elliptical shapes may be located. In this approximation, the object model is simpli"ed in order to use a two-dimensional parameter space, and thus the algorithm used to obtain the "rst approximation (see Fig. 2, number 4) is very fast. (b) Detection of a segment in the image implies determining a possible location of the centre. We will use a circular uncertainty region of radius r. Another problem to solve is how to accumulate the evidence. Suppose we have a shape de"ned by n segments
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
r of length l (1)i)n). If we detect m segments of length i i i ¸ j(1)j)m ) corresponding to tendence r and referi i i encing position p in the accumulator, we can calculate the value E(p) as follows:
G
H
minM¸ j, l N n i i , E(p)" + a max i l i 1xjxm i i/1
(3)
where n + a "1. i i/1
(4)
Thus, a weights each tendence, that is, it weights each i part of the shape. In the example of cells, we will obviously use a "1/n, ∀i. In Eq. (3), note that (1) due to the i minimum operator, we do not accumulate more evidence than l , that is, the algorithm calculates a value between i 0 and 1 (it indicates the percentage of evidence i detected); (2) the maximum operator selects the best detected evidence about the existence of tendence i; (3) the algorithm accumulates the evidences from each tendence by using the weights a . i Thus, the value E(p) lies between 0 and 1 and corresponds to the contour ratio that has been detected. This enables us to determine a threshold to locate maxima from the parameter space. Results from this algorithm are shown in Fig. 10 (see Section 5). 3.1. Accuracy and sampling of the parameter space The problem of segmenting cells becomes more di$cult if we consider cells with high variability (see Fig. 1) in images with non-random noise. In this case, a circle, the shape to be located, is not the best choice because it would be necessary to increase the size of the uncertainty region. If we use a very large uncertainty region with very noisy images, it would be more di$cult to process the parameter space (see Fig. 10(c)) primarily because of the
825
higher number of maxima (the values of some regions in the parameter space will be increased) and secondly because the approximated solution would be less accurate than when using a smaller region (moreover, several maxima from di!erent objects might be joined). Therefore, if the variability of the shape is very high, it may be useful to improve the cell location algorithm. Since we are using a reformulated Hough transform, the method to segment cells with high variability may be easily designed by considering a sampling of the parameter space. In this case, an elliptical approximation of the cell is used, and thus we can de"ne a parameter space with 5 axes. For instance, translation (two), rotation, and two axes to handle non-rigid deformations. In order to illustrate the results with an improved parameter space, we have used in our experiments 36]3]2 as the rotation and non-rigid deformation axes sampling (respectively), together with an uncertainty region that will avoid problems from both a poorer sampling and local deformations. Because of the improved parameter space, we can use a smaller region to locate the "rst approximation. If we consider a smaller parameter space, that is, a small number of samples and/or axes, the uncertainty region will become larger. In fact, segmenting cells by using a circle as the locator shape (considering only translation) is a particular case of sampling. Fig. 6 shows results obtained with a cell image by using a larger parameter space. It is apparent that the ellipse approximation stage1 can be avoided if the sampling is good enough. Note that the GHT cannot be applied to these images because the level of noise and local deformations generate a large number of local maxima in the parameter space. By using the reformulated Hough transform these problems can be avoided, as shown in Fig. 6. Additionally, we have been able to perform a coarse sampling of
1 See Fig. 2, number 5 and the next section.
Fig. 6. (a) Original image, (b) Parameter space, (c) Results using 36]3]2 as the rotation and non-rigid deformations axes sampling (respectively).
826
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
Fig. 7. Elliptical approximation of the most stable locations.
the parameter space, which implies a more e$cient algorithm than the GHT. Moreover, we propose to use as few axes and/or samples as possible in order to obtain a fast algorithm. Finally, this methodology may be generalized to initialize deformable templates (see Section 5). 3.2. Ellipse approximation Once the location of each cell has been estimated, a better approximation can be obtained, by using an ellipse. To do this, we obtain the edge points used to de"ne each location [8] and an ellipse is "tted to this set of points by a least-squares method. The algorithm to "t this ellipse consists of (1) obtaining an initial approximation of the parameters a, b, A, B, C from the equation
A B A BA B r!a t A
B
r!a
c!b
C
r!b
B
"1,
(5)
and (2) re"ning the solution using a descent gradient algorithm (see Appendix A). Fig. 7 shows the result of this algorithm applied to the previous example.
4. Fitting local deformations In order to classify the cell images it is important to determine local deformations and discover whether or not the cell has been damaged. Once we have the initial approximation of the contour, the solution must be optimized by using local deformations. To do this, a stochastic deformable template model is used. The deformable models proposed in the literature can be divided into two classes: models with local shape constraints and models with global shape constraints. In models with local shape constraints, there is no global structure of the template, that is, the template is only a!ected by continuity and smoothness constraints. An example of this is the active contour model (snake) proposed by Kass et al. [4]. These models sacri"ce model speci"city in order to accommodate variability and so
cannot be used to locate partially occluded objects in noisy, cluttered images. In models with global shape constraints, prior information about the geometrical shape is available. One example of this is the active shape model proposed by Cootes et al. [14]. They compute the mean shape of a class of correctly annotated training objects as the prototype template. An object is de"ned by type-dependent landmarks, and the variability allowed for these landmarks is determined by using principal component analysis on the training set. Another example is Yuille's model [15]. In this case, eye and mouth templates are drawn using circles and parabolic curves. These models attempt to simplify the set of parameters which control the global shape of a template, and therefore are not able to "t local deformations as required in our application. Because our "rst approximation is very close to the optimal solution, local deformations are small displacements that can be assumed to be stochastic deformations; therefore, the solution is a shape that is similar to the elliptical approximation obtained. A deformable template model, with global shape constraints, which is in accordance with this idea, was proposed by Grenander [16,17]. He proposed a prototype template (in our problem, the "rst elliptical approximation), which can accommodate a certain degree of variability (it is deformed to match salient image features) while maintaining the global structure. 4.1. The deformable template model The basic elements of the Grenander approach are: a generator space (segment on the plane), a connector graph (cyclic graph), regularity conditions on this graph (simple and closed polygons) and a transformation group on the generator space (Euclidean Group ] Scale). Grenander introduces a probability measure into the value space de"ning the transformation group. This measure is de"ned so that the distribution over the local deformations of the template is a 2n-Gaussian distribution [18,19]. Let v"(vT,2, vT )T be a sample of n points from the 0 n~1 template boundary (i.e. the deformable template is a polygon that represents the `meana shape we are looking for). In our problem, the template boundary is the elliptical approximation of the cell's location. An edge is de"ned as e "v !v , (6) i i`1 i where i"0, 2, n!1(v "v ). An example of a temn 0 plate is shown in Fig. 8. The template vector cycle e"(eT, 2, eT )T3R2nx1 satis"es the following closure 0 n~1 condition: n~1 + e "0. i i/0
(7)
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
Fig. 8. A template with vertices v and edges e . i i
Fig. 9. Start template and "rst four simulations.
A deformed shape e@ is obtained by applying a transformation S"(S , 2, S ) to the initial template e. 0 n~1 The result e@ can di!er from e in length and orientation, i i and therefore the matrix S may be de"ned as i
A
B A
827
B
j cos t j sin t 1#a !/ S (j, t)" " , (8) i !j sin t j cos t / 1#a where a"j cos t!1 and /"!j sin t. Grenander's model assumes that the parameter cycles a"(a , 2, 0 a )T and /"(/ , 2, / )T follow independent cenn~1 0 n~1 tral cyclic Markovian Gaussian densities constrained to preserve closure in the template cycle e (note that S is i linear in (a, /) and that the variability of the shapes can be controlled by selecting the variance on vector length and angle).2 The Markov property is assumed to be of "rst order. See Refs. [17,18] for a complete discussion about this "rst-order Markov property and the normality assumption. Consider a random vector cycle x"(x , 2, x ). 0 n~1 The "rst-order Markov property states that
ellipse with local deformations (the parameters to control the variability of the shapes for our application have been selected experimentally). The template vector is segmented by sampling from the posterior density p(xDI)Jp(x)p(IDx). The imaging model p(IDx) is de"ned by using the Gibbs distribution: 1 p(IDx)" exp (!E(x, I)), z
(11)
where z is a normalizing constant and E(x, I) is an energy function (see next section), which is a measure of how well the template matches the object boundary in the image. For further details about this model see Refs. [17,18,20]. For details about a C## implementation, ask the authors. Once we have selected a geometrical model of the template together with a prior probability, we only need an external energy function to de"ne a posterior density [18,21]. 4.2. External energy function
P(x Dx , 2, x , x , 2, x )"P(x Dx , x ), i 0 i~1 i`1 n~1 i i~1 i`1
(9)
for i"0, 2, n!1 and x "x . n 0 From Grenander et al. [18] the cyclic Markovian Gaussian density can be factored as
The function is de"ned on the set of possible locations, and an object location is described by a closed curve as follows: C"(X(s), >(s)) 0)s)¸.
A
1 1 n~1 f (x)" exp ! + z 2 i/0
AA
B A B BB
!x 2 x 2 i`1 i # i e p i i
x
,
(10)
where p , e '0 ∀i. When e2 decreases, density f becomes i i more concentrated, that is, it corresponds to a tighter bonding between neighbours. If p2 decreases, variability in vector lengths and orientations should also decrease [18]. These de"nitions and assumptions allow us to design an algorithm to simulate the template (see Ref. [17] for a detailed description). Results from this algorithm are shown in Fig. 9. Note that the deformed shape is an
2 For small values of t, and j near 1, a+j!1 means change in length, and /+!t means change in orientation.
(12)
The function may then be de"ned as
P
1 L E (C)" P(X(s), >(s)) ds, (13) ext ¸ 0 where the potential P is computed as a function of the image data according to the desired goal. The "rst option to de"ne the energy function is to use the gradient as follows: P(x, y)"!DD+g(x, y)DD.
(14)
In this stage the solution is approximated by an elliptical contour which we only need to optimize in order to "t local deformations. Because the "rst approximation obtained is very close to the desired location, it might seem that we do not need to improve this energy function, but
828
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
in fact function P(x, y) is not good enough because we are considering cell images with many cluttered objects in a single scene, inhomogeneities, a lot of noise, etc. In these images, we may "nd many local minima or several edges (from di!erent objects) in a small region, and then the algorithm may become confused. Therefore, the use of another energy function is recommended. In order to improve the results, we propose using all the available information. Firstly, some known information is obviated when we know which stable edges are involved in a given location; the position can then be re"ned by using this more stable information. Secondly, as the gradient has two components, these are used to get even more precise results. The procedure employed is as follows: (1) An easy and e$cient solution to use the edge points [22] is to calculate the distance to the nearest edge point [23]: d(x, y)" min MDD(x, y)!(r, s)DDN, (r,s)|A
(15)
where A is the set of edge points. (2) To improve the energy function, we use the gradient direction by changing the function P(X(s), >(s)) to P(X(s), >(s), h(s)), where h(t) is de"ned as the angle between the vectors +g(X(t), >(t)) and
A
B
dy dx ! (t), (t) . ds ds
(16)
If these vectors di!er by more than p/2, it can be assumed that the gradient is zero. Therefore, when one object is very close to another the algorithm does not become confused. 5. Experimental results and discussion To illustrate the proposed methodology, we now present results obtained from several images. Fig. 10 shows the process of locating cells by using a circle as the uncertainty region with r"5 (see Fig. 5). The template is de"ned using 8 segments with R"17 (see Fig. 5). The images represent: (a) Original image. (b) Edge image (p"1.0, low threshold 0.5 and high threshold 0.9). (c) Parameter space. Every value is between 0 and 1. (d) Thresholding of the parameter space (threshold"0.5). We have used 0.5 as the threshold in order to detect cells with 50% of their contours. (e) Locations. The location is de"ned as the point with maximum response. (f) Locations on the original image. Some cells have not been located, which was expected because there is not enough edge information (see image b). Fig. 11 shows the "nal results for eight cells. Each local deformation has been obtained using only 50 iterations. Note that the whole process is determined by the previous set of parameters. No human intervention is needed between the processing stages. Obviously, if we
Fig. 10. Results of cell location.
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
829
Fig. 11. Results after "tting local deformations.
tried to apply this process to another problem, we would need to de"ne a new template and probably new parameters because the conditions may be very di!erent. For example, the images might contain more noise (p'1) or there could be more complex images with many di!erent objects (threshold'0.5). The actual running time is 17.8 s for the location process (Fig. 10e) and 65.4 s for the local deformation estimation, using a SUN SPARC Classic Workstation. It should be noted that the running time could be markedly improved by using an optimized (even parallelized) implementation of the algorithm. As shown in Fig. 10, we have used a simpli"ed (twodimensional) parameter space to locate the cells. This simpli"cation was possible because HT was reformulated through the uncertainty region. This new formulation may be considered a generalization of some methods designed to simplify the parameter space in some special cases. See for example Lo [24], in which a method to
detect perspectively transformed shapes is proposed, or Jeng [10] in which an algorithm to handle shape scaling and rotation is described. However, these papers present solutions only for speci"c problems, such as perspective, scale and rotation. The reformulation of the Hough transform with voting in regions has also been studied in other references. See for example So!er [25] and other references therein, where the conditions to ensure that the global maximum is in the immediate neighbourhood of the maximal grid point are studied. Thus, they analyse and propose a multiresolution Hough transform for detecting straight edges one by one. Another example is Stephens [26], where a probabilistic Hough transform is de"ned as a likelihood function in the output parameters, whereby an improvement in robustness over a conventional method is obtained. In this paper, however, we propose a reformulated HT not only to obtain a more accurate or more e$cient algorithm but to show that the HT may be
830
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
Fig. 12. Results using several sources of information.
Fig. 13. Results from a complex image: (a) original, (b) edges without any processing together with parameter space, (c) results.
used for deformable template initialization in spite of the fact that the HT has traditionally been considered a rigid scheme in that it is not capable of detecting deformed shapes (see Ref. [21]). The reformulated HT is so general that it comprises a new methodology for initializing deformable templates in a wide range of applications, in which classical HT may not be used because of the high degree of dimensionality. If we de"ned a parametric description of the shape which provides a global-to-local ordering of shape deformation, we would be able to remove a large set of axes (local deformations) to apply the reformulated HT using the simpli"ed parameter space. A method to obtain this deformation system is presented in a separate paper [23], whereby the proposed methodology can be applied to locating other shapes. Qualitative features, i.e. scene features with qualitative attributes assigned to them, for example straight line segments, are e!ective in reducing the number of spurious interpretations [7] and allow us to normalize the parameter space (see Eq. (3)) in order to determine a value for parameter space thresholding. Moreover, further information can be used by considering other sources (like information from regions, textures, colours, etc.) to improve the results, because a qualitative accu-
mulation of evidences is performed (see Eq. (3)). An example of this idea is shown in Fig. 12, where several edge maps (from scales p"1, 1.5, and 2) are jointly used to improve the segmentation. In this "gure, the maxima of the parameter space greater than 0.5 are considered. More cells are obtained despite the fact that the same threshold is selected (0.5, 1 of the whole contour). These 2 results indicate that more robust algorithms can be designed by using several sources and thus that segmentation can be accomplished with very poor images. Note that this algorithm performs the integration of several edge maps by means of the object model. Obviously, the solution to the general segmentation problem has not yet been found. Let us consider a complex image in order to understand the limitations of the suggested technique and thus directions for future reseach. In Fig. 13, some false cells have been obtained because of the excess of edge points or overlapping objects in some regions. Note that spurious maxima are more probable than in classical HT because we vote in a region, especially when complex images are involved and when there exists a lot of unstable information. Therefore, it is desirable to use a set of features that are as stable as possible in order to obtain good results from the reformulated HT. To this e!ect, we have proposed using
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
a smoothed image to obtain only the most stable edges (Canny's algorithm), direction of the gradient, qualitative features (straight line segments), etc. However, when the image is too complex or when several sources of information are used, producing a cluttered scene with too many chains from di!erent objects, the algorithm could obtain a large percentage of the "nal contour by deforming the template to "t edge chains which do not correspond to the same object. In this case, a higher threshold could be used to obtain the most stable results, but some objects might be lost and the result would not be complete. We are currently in the process of improving the obtained results by using several sources, which allows us to determine the most stable information for the location stage. Firstly, we propose using integration methods which can be applied in previous stages (see for example Ref. [13] where we propose a method to integrate edge information from di!erent scales); we thus reduce a large amount of information in order to speed up the algorithm and to obtain more stable edges. Secondly, we also plan to consider region-based information, such as intensity, homogeneity, texture, etc. which could solve many di$cult cases (see Fig. 13).
6. Conclusions The main aim of this paper is to present a new way of segmenting cells, which may be used for automatic classi"cation. We describe a complete methodology for cell image segmentation and thus new way to solve the di$cult problem of initializing deformable templates has been shown. The results obtained indicate a promising direction for further research into automatic initialization, which is especially important for designing automatic algorithms in biomedical applications. This approach for segmenting cell images is both fast and robust, inspite of the fact that it is an automatic method applied to images with severe noise conditions. It is fast because it uses a reformulated Hough transform in which a simpli"ed parameter space is considered. It is not possible to locate objects by using HT because the images have too much noise and the number of deformation axes must be very high in order to handle every deformed contour. In this new formulation, an uncertainty region is used to avoid the problems that arise with classical HT. After location, we approximate the solution by means of an ellipse. The "nal solution, with local deformation, is then obtained using Grenander's deformable template model from this initialization, which is very close to the desired solution. Moreover, this approach is robust because we calculate an initial approximation from stable information (straight line segments from edges). We then optimize this approximation using both components from the gradient vectors.
831
Acknowledgements This research has been partially supported by the project TIC97-1134-c02-01.
Appendix A. Ellipse approximation In this appendix, we present the expressions used to approximate a set of points by means of an ellipse. Let us suppose a set of points (r , c )2(r , c ). The "rst approx1 1 N N imation is 1 n 1 n a" + r , b" + c , n n N N n/1 n/1
A B A B B
C
A
B
k !k 1 cc rc , " 2(k k !k2 ) !k k rr cc rc rc rr
where 1 N 1 N k " + (r !a)2, k " + (r !a)(c !b), rr N n rc N n n n/1 n/1 1 N k " + (c !b)2, rc N n n/1 and the error function to minimize is N e2" + [d2(r !a)2#2de(r !a)(c !b) n n n n/1 #(e2#f 2)(c !b)2!1]2, n where B d"JA, e" , f"Jc!e2. d Finally, the derivatives used are Le2 N "4 + ( f !1)[!d2(r !a)!de(c !b)], n n n La n/1 Le2 N "4 + ( f !1)[!de(r !a)!(e2#f 2)(c !b)], n n n Lb n/1 Le2 N "4 + ( f !1)[d(r !a)2#e(r !a)(c !b)], n n n n Ld n/1 Le2 N "4 + ( f !1)[d(r !a)(c !b)#e(c !b)2], n n n n Le n/1 Le2 N "4 + ( f !1)[ f (c !b)2]. n n Lf n/1
832
A. Garrido, N. PeH rez de la Blanca / Pattern Recognition 33 (2000) 821}832
References [1] R. Haralick, L. Shapiro, Image segmentation techniques, Comput. Vision, Graph and Image Process 29 (1985) 100}132. [2] P.K. Sahoo, A.K. Saltani, A.K.C. Wong, Y.C. Chen, A survey of thresholding techniques, Comput. Vision, Graph Image Process 41 (1988) 233}260. [3] T. Poggio, V. Torre, Ill-posed problems and regularization analysis in early vision, Proceedings of AARPA Image Understanding Workshop, 1984, pp. 257}263. [4] M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models, Int. J. Comput. Vision 1 (4) (1987) 321}331. [5] F. Leymarie, M.D. Levine, tracking deformable objects in the plane using an active contour model, IEEE Trans. Pattern Anal. Mach. Intell. 15 (6) (1993) 617}634. [6] D.H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition 13 (1981) 111}122. [7] S.M. Bhandarjar, M. Suk, Qualitative features and the generalized Hough transform, Pattern Recognition 25 (9) (1992) 987}1006. [8] E. Davies, Machine Vision: Theory, Algorithms, Practicalities, Academic Press, New York, 1990. [9] J. Illingworth, J. Kittler, A survey of the Hough transform, Comput. Vision. Graph and Image Process. 44 (1988) 87}116. [10] S.C. Jeng, W.H. Tsai, Scale and orientation invariant generalized Hough transform- A new approach, Pattern Recognition 24 (11) (1991) 1037}1051. [11] S.Y. Yuen, C.H. Ma, An investigation of the nature of parameterization for the Hough transform, Pattern Recognition 30 (6) (1997) 1009}1040. [12] J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Anal. Mach. Intell. 8 (6) (1986) 679}698. [13] M. Garcia-Silvente, J.A. Garcia, J. Fdez-Valdivia, A. Garrido, A new edge detector integrating scale spectrum information, Image Vision Comput. 15 (1997) 913}923.
[14] T.F. Cootes, C.J. Taylor, D.H. Cooper, J. Graham, Active shape models * their training and application, Comput. Vision Image Understanding 61 (1) (1995) 38}59. [15] A.L. Yuille, P.W. Hallinan, D.S. Cohen, Feature extraction from faces using deformable templates, Int. J. Comput. Vision 8 (2) (1992) 133}144. [16] U. Grenander, Pattern Synthesis. Lectures in Pattern Theory, vol. 1. Appl. Math. Sci. vol. 18, Springer, Berlin, 1976. [17] A. Knoerr, Global Models of Natural Boundaries: Theory and Applications. Report in Pattern Theory 148, Brown University, 1988. [18] U. Grenander, Y. Chow, D.M. Keenan, in: Hands: A Pattern Theoretic Study of Biological Shapes, Springer, Berlin, 1991. [19] N. PeH rez de la Blanca, J. Fdez-Valdivia, Building up templates for non-rigid plane outlines, Proceedings ICPR-92, vol.3, 1992, pp. 575}578. [20] U. Grenander, D.M. Keenan, Towards automated image understanding, J. Appl. Probab. 16 (2) (1989) 207}221. [21] A.K. Jain, Y. Zhong, S. Lakshmanan, Object matching using deformable templates, IEEE Trans. PAMI 18 (3) (1996) 267}277. [22] L.D. Cohen, I. Cohen, Finite-element methods for active contour models and balloons for 2-D and 3-D images, IEEE Trans. Pattern Anal. Mach. Intell. 15 (11) (1993) 1131}1147. [23] A. Garrido, N. PeH rez de la Blanca, Physically-based active shape models: initialization and optimization, Pattern Recognition 31 (8) (1998) 1003}1017. [24] Rong-Chin Lo, Wen-Hsiang Tsai, Perspective-transformation-invariant generalized Hough transform for perspective planar shape detection and matching, Pattern Recognition 30 (3) (1997) 383}396. [25] M. So!er, N. Kiryati, Guaranteed convergence of the Hough transform, Comput. Vision Image Understanding 69 (2) (1998) 119}134. [26] R.S. Stephens, Probabilistic approach to the Hough transform, Image Vision Comput. 9 (1) (1991) 66}71.
About the Author*A. GARRIDO was born in Granada, Spain, in 1969. He received the B.S. and Ph.D. degrees both in Computer Science from the University of Granada in 1992 and 1996, respectively. Since 1993 he has been with the Computer Science Department (DECSAI) at Granada University. His current interest includes pattern recognition, multiresolution method, deformable templates, image registration, and biomedical applications. Dr. A. Garrido is a member of the IAPR Association. About the Author*N. PED REZ DE LA BLANCA was born in Granada, Spain. He received the B.S. and Ph.D. degrees both in Mathematics from the University of Granada in 1975 and 1979, respectively. He was at the Statistical Department of the University of Granada from 1976 to 1986. In 1986 he moved to the Computer Science and Arti"cial Intelligence Department (DECSAI). Now he is Professor of Arti"cial Vision. His current interest includes pattern recognition, image registration, deformable templates, 3D medical applications. Prof. PeH rez de la Blanca is a member of the IAPR, SPIE. He is the Vice-president of the IAPR's Spanish chapter.
Pattern Recognition 33 (2000) 833}839
Maximum certainty data partitioning Stephen J. Roberts*, Richard Everson, Iead Rezek Intelligent & Interactive Systems Group, Department of Electrical & Electronic Engineering, Imperial College of Science, Technology & Medicine, Exhibition Road, London SW7 2BT, UK Received 5 August 1998; accepted 18 March 1999
Abstract Problems in data analysis often require the unsupervised partitioning of a dataset into clusters. Many methods exist for such partitioning but most have the weakness of being model-based (most assuming hyper-ellipsoidal clusters) or computationally infeasible in anything more than a three-dimensional data space. We re-consider the notion of cluster analysis in information-theoretic terms and show that minimisation of partition entropy can be used to estimate the number and structure of probable data generators. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Cluster analysis; Data partitioning; Information theory
1. Introduction Many problems in data analysis, especially in signal and image processing, require the unsupervised partitioning of data into a set of &self-similar' clusters or regions. An ideal partitioning unambiguously assigns each datum to a single cluster and one thinks of the data as being generated by a number of data generators, one for each cluster. Many algorithms have been proposed for such analysis and for the estimation of the optimal number of partitions. The majority of popular and computationally feasible techniques rely on the assumption that clusters are hyper-ellipsoidal in shape. In the case of Gaussian mixture modelling [1}3] this is explicit; in the case of dendogram linkage methods (which typically rely on the ¸ norm) it is implicit [4]. For some datasets this leads to 2 an over-partitioning. Alternative methods, based on valley seeking [2] or maxima-tracking in scale-space [5] for example, have the advantage that they are free from such assumptions. They can be, however, computationally intensive, sensitive to noise (in the case of valley seeking approaches) and unfeasible in high-dimensional spaces
(indeed these methods can become prohibitive in even a three-dimensional data space). In this paper we re-consider the issue of data partitioning from an information-theoretic viewpoint and show that minimisation of entropy, or maximisation of partition certainty, may be used to evaluate the most probable set of data generators. The approach does not assume cluster convexity, it is shown to partition a range of data structures and to be computationally e$cient.
2. Theory The idea underlying this approach is that the observed dataset is generated by a number of data generators (classes). We "rst model the unconditional probability density function (pdf) of the data and then seek a number of partitions whose linear combination yields the data pdf. Densities and classi"cations conditioned on this partition set are then easily obtained. 2.1. Information maximisation
* Corresponding author. Tel.: #171-594-6230; fax: #171823-8125. E-mail address:
[email protected] (S.J. Roberts)
Consider a set of K partitions. The probability density function of a single datum x, conditioned on this
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 8 6 - 2
834
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
partition set, is given by
2.2. Mixture models
K p(x)" + p(xDk)p(k). k/1
(1)
We consider the overlap between the contribution to this density function of the kth partition and the density p(x). This overlap may be measured by the Kullback}Liebler measure between these two distributions. The latter is de"ned, for distributions p(x) and q(x), as
P A B
p(x) K¸(p(x)DDq(x))" p(x) ln dx. q(x)
(2)
Note that this measure reaches a minimum of zero if, and only if, p(x)"q(x). For any other case it is strictly positive and increases as the overlap between the two distributions decreases. What we desire, therefore, is that the KL measure be maximised as this implies that the overlap between two distributions is minimised. We hence write our overlap measure as v "!K¸(p(xDk)p(k) DD p(x)). k
(3)
As this measure is strictly non-positive we may de"ne a total overlap as the summation of all v : k <"!+ K¸(p(xDk)p(k) DD p(x)) k
P
"!+ p(xDk)p(k) ln k
A
B
p(xDk)p(k) dx. p(x)
(4)
We note, furthermore, that as <)0, so minimisation over all data is equivalent to minimisation of < for each datum. An &ideal' data partitioning separates the data such that overlap between partitions is minimal. We therefore seek the partitioning for which < is a minimum. By Bayes' theorem we may re-write Eq. (4) as
P PA
<"!+ p(kDx)p(x) ln p(kDx) dx k "!
B
+ p(kDx) ln p(kDx) p(x) dx. k
(5)
Note that the summation term is simply the Shannon entropy, given datum x, over the set of partition posteriors, i.e. H(x)"!+ p(kDx) ln p(kDx). Minimising < is k hence equivalent to minimising the expected (sample) entropy of the partitions over all observed data. It is this objective which we will use to form minimum-entropy, or maximum certainty, partitions. It is noted that this is achieved by having, for each datum, some partition posterior close to unity, while all the others are close to zero, which conforms to our objective for ideal partitioning.
2.2.1. Kernel-based density estimators We restrict ourselves in this paper to considering a set of kernels or basis functions which model the probability density function (pdf ) of the data and thence of each data partition. It is worth at this point considering the major approaches to estimation of a density function using a "nite set of kernel functions. 1. Parametric representation: in this case a strong assumption is made regarding a model for the data generators, namely that a single kernel represents each data generator. The Gaussian kernel is a popular choice, leading to Gaussian mixture modelling. 2. Semi-parametric representation: arbitrary density functions may be adequately represented (i.e. to any "nite precision) with a "nite number of kernels (if the kernel is chosen to be a function with universal approximation properties). The density function of each data generator is thus represented by a "nite mixture of kernels (typically Gaussians). 3. Non-parametric representation: in this case each datum serves as the prototype (normally the location) for a single kernel function. As with semi-parametric representations, if the kernels are chosen so as to have universal approximation properties, then arbitrary density functions may be approximated. Further details of density estimation approaches may be found in Ref. [6]. As we wish to decompose overly complex models of data generation into simpler partitionings we require either semi- or non-parametric models of the data pdf. In all the examples presented in this paper we "nd little di!erence between partitioning results using non- and semi-parametric estimators, and we have chosen the latter (with a heuristically-chosen &moderate' number of kernels) in all examples. This, clearly, gives a computational advantage although this is arguably o!set by having any heuristic values in our analysis (which is, in principle, free from any) and by the necessity of optimising the kernel set. We perform the latter using the standard EM (expectation}maximisation) algorithm (see Ref. [7], for example). It is worth commenting that the EM algorithm has a wellknown failure mode, namely that a single kernel can reduce its variance to zero and hence maximise the data likelihood to in"nity. This situation arises when the mixture model is over-complex given the available data. We avoid this by careful initialisation of the EM algorithm using K-means clustering and by early stopping [7]. The latter performs a natural regularisation of the system and we observe (empirically) no over-"tting problems on all the data presented in this paper.
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
2.2.2. Partitions as mixture models We consider a set of partitions of the data. We may model the density function of the data, conditioned on the ith partition, via a semi-parametric mixture model of the form: J p(xDi)" + pH(xD j)nH( j), (6) j/1 where J is the number of kernels forming the mixture and pH are a set of (unknown) mixture coe$cients which sum j to unity. Each mixture component may be, for example, a Gaussian kernel and hence each candidate partition of the data is represented in this approach as a di!erent mixture of these kernels. The use of the &star' notation, i.e. pH, denotes that this set of probabilities is evaluated over the kernel representation, rather than over the set of data partitions. Eq. (6) may be written, via Bayes' theorem, as a linear transformation to a set of partition posteriors of the form: p"WpH,
(7)
where p is the set of partition posterior probabilities (in vector form), W is some transform, or mixing, matrix (not assumed to be square) and pH is the set of kernel posteriors (in vector form). Hence the ith partition posterior may be written as p "p(iDx)"+ = pH( jDx). (8) i ij j If we are to interpret p as a set of posterior probabilities we require p 3[0, 1]∀i and + p "1. (9) i i i As pH( j D x)3[0, 1] so the "rst of these conditions is seen to be met if each = 3[0, 1] (as + a b 3[0, 1] if a , ij i i i i b 3[0, 1]). The second condition is met when i 1"+ p(iDx)"+ + = pH( jDx) ij i i j "+ pH( jDx) + = "+ = , (10) ij ij j i i i.e. each column of W sums to unity. Given a set of partition posteriors, the partition priors may be re-evaluated as their maximum-likelihood estimates, i.e. 1 p(i)"Sp(iDx)T" + p(iDx ), (11) n N n hence we may also form partition (class) conditional likelihoods via Bayes' theorem, i.e., p(iDx)p(x) p(xDi)" . p(i)
(12)
835
We note once more that each partition-conditional density in the transformed space is represented by a mixture of kernels from the original space. If &hard' partitioning is required, each datum is assigned to the partition with the largest posterior in the transformed space. Centroid locations of each partition are simply obtained if required (and believed to be meaningful; the centroid of a ring structure lies in a region of no data density, for example). The ith such point, m , which we i may consider as the centroid of the ith data partition, is estimated by its maximum likelihood value namely + x p(iDx ) n. m" n n i + p(iDx ) n n
(13)
2.3. Entropy minimisation Given that we represent each partition via a "xed set of kernels, we wish to adjust the elements of the matrix W such that the entropy over the partition posteriors is minimised. We must also, however, take into account the constraints on the elements of W (that they are bounded in [0,1] and the sum of each column of W is unity). We may achieve this by introducing a set of dummy variables, which will be optimised, such that W is represented by a generalised logistic function (the so-called &softmax' function) of the form: exp(h ) ij . =" (14) ij + exp(h ) i{ i{j The gradient of the entropy with respect to each dummy variable, h , is given via the chain rule as ij LH LH L= i{j. "+ ) (15) Lh L= Lh ij i{j ij i{ The summation term is easily evaluated noting that L= i{j"= d != = , (16) i{j i{i i{j ij Lh ij where d "1 if i"i@ and zero otherwise. The term i{i LH/L= is evaluated by writing the expectation of the i{j entropy (of Eq. (5)) as a sample mean over all N data, i.e. LH 1 LH(x ) Lp(i@Dx ) n ) n. " + (17) L= N Lp(i@Dx ) L= i{j n i{j n As p(i@Dx )"+ = pH( jDx ) the above is easily evaluated. n j i{j n In all the experiments reported in this paper we optimise W using the above formalism via the BFGS quasiNewton method [8]. 2.4. Model-order estimation Since the number of partitions is not known a priori it is useful to be able to discover the most probable number
836
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
of partitions. To this end we evaluate the entropy change, per partition, as a result of observing the data set, X. This quantity is given as *H(M DX)"H(M )!H(M DX), K K K
(18)
where M is the K-partitions model. The "rst term on the K right-hand side of the above equation is simply the entropy of the model priors before data is observed and is the Shannon entropy taking the partition probabilities to be uniform and equal to 1/K. The second term is the entropy associated with the posterior partition probabilities having observed X. It is noted that the prior entropy is constant for any M and hence our objective of minimK ising the posterior entropy will result in a maximum of *H(M DX) at the most-probable partition number. K Noting that H(X)!H(XDM )"H(M )!H(M DX) and K K K that H(XDM ) is the expectation of the negative logK likelihood of X given M so the likelihood (evidence) of K X given M may be written as K p(XDM )J expM*H(M DX)N K K
Fig. 1. Simple dataset: 30 points are drawn from each of the four well-separated Gaussian sources.
(19)
in which the data entropy term, H(X), is ignored as it is constant for all models. Choosing the model with the largest value of this likelihood is equivalent, via Bayes' theorem, to choosing the model with the highest probability, p(M DX) if we assume #at prior beliefs, p(M ) for K K each model. We hence obtain a posterior belief measure for each candidate partitioning: expM*H(M DX)N K p(M DX)" , K + expM*H(M DX)N K{ K{
(20)
and it is this measure which we use to assess the model order, choosing the order K for which it is maximal.
3. Results
Fig. 2. Simple dataset: the plot shows the kernel posteriors p*( jDx) from EM "tting of the set of ten kernels.
3.1. Simple dataset We "rst present results in detail from a dataset in which clusters are simple and distinct; the data are generated from four Gaussian distributed sources with 30 data drawn from each. Each component has the same (spherical) covariance. These data are shown in Fig. 1. As an illustration of a simple kernel set, ten Gaussian components are "tted to the data using the EM algorithm. The resultant set of posterior probabilities, pH( jDx ), is shown n in Fig. 2. Fig. 3(a) shows ln p(M DX) and plot (b) p(M DX). K K Note that a set of four partitions is clearly favoured. Choosing the K"4 model we obtain, for this example, W as a 4]10 matrix. The set of four partition probabilities, p(iDx ), is shown in Fig. 4. The resultant partitioning of n the dataset gives the results of Fig. 5. There are no errors in the partitioning for this simple dataset.
3.2. Ring data The next (synthetic) dataset we investigate is drawn from two generator distributions; an isotropic Gaussian and a uniform &ring' distribution. A total of 100 data points were drawn from each distribution (hence N"200). A 20-kernel Gaussian mixture model was "tted to the data (using again the EM algorithm). Fig. 6(a) shows that p(M DX) gives greatest support for the two K partition model. Plot (b) of the same "gure depicts the resultant data partioning. For this example there are no errors. Note that, due to the pathological structure of this example, a Gaussian mixture model per se fails to estimate the &correct' number of partitions and provide a reasonable data clustering.
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
837
Fig. 3. Simple dataset: (a) ln p(M DX) and (b) p(M DX). Note the clear maxima at K"4. K K
Fig. 4. Simple dataset: The four posterior probabilities, p(iDx), for the most likely partitioning.
3.3. Iris data
Fig. 5. Simple dataset: Data partitioning in the transformed space. For this simple example there are no errors.
Anderson's &iris' dataset is well-known [9]. The data we analysed consisted of 50 samples for each of the three classes present in the data, Iris Versicolor, Iris Virginica and Iris Setosa. Each datum is four-dimensional and consists of measures of the plants morphology. Once more a 20-kernel model was "tted to the dataset. Fig. 7(a) shows the model-order measure, shown in this case on a linear y-scale. Although support is greatest for the (correct) K"3 partitioning it is clear that a two-partition model has support. We regard this as sensible given the nature of the dataset, i.e. it naturally splits into two partitions. As in previous "gures plot(b) depicts the data partitioning. This plot shows the projection onto the "rst two principal components of the dataset. The partitioning has three errors in 150
838
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
Fig. 6. Ring dataset: (a) ln P(M DX) & (b) resultant partitioning. For this example there are no errors. k
Fig. 7. Iris dataset: (a) ln P(M DX) & (b) resultant partitioning. For this example there are three errors, corresponding to an accuracy of k 98%.
Fig. 8. Wine recognition dataset: (a) ln P(M DX) & (b) resultant partitioning. For this example there are four errors, corresponding to an k accuracy of 97.75%.
samples giving an accuracy of 98%. This is slightly better than that quoted in Ref. [3] and the same as that presented for Bayesian Gaussian mixture models in Ref. [1].
3.4. Wine recognition data As a "nal example we present results from a wine recognition problem. The dataset consists of 178
S.J. Roberts et al. / Pattern Recognition 33 (2000) 833}839
13-dimensional exemplars which are a set of chemical analyses of three types of wine. Once more we "t a 20-kernel model and perform a minimum-entropy clustering. Fig. 8(a) shows ln P(M DX). There is a clear K maximum at the &correct' partitioning (K"3). Plot (b) shows this partitioning projected onto the "rst two components of the dataset. For this example there are four errors, corresponding to an equivalent classi"cation performance of 97.75%. This dataset has not (to the authors' knowledge) been analysed using an unsupervised classi"er, but supervised analysis has been reported. Our result is surprisingly good considering that supervised "rst-nearest neighbour classi"cation achieves only 96.1%, and multivariate linear-discriminant analysis 98.9% [10]. It should be commented that the same partitioning is obtained via analysis of the "rst two data principal components alone, rather than the full 13-D dataset.
4. Conclusions We have presented a computationally simple technique for data partitioning based on a linear mixing of a set of "xed kernels. The technique is shown to give excellent results on a range of problems. For computational parsimony we have used an initial semi-parametric approach to kernel "tting although, as mentioned, the results from a non-parametric analysis are near identical in all cases. The methodology is general and non-Gaussian kernels may be employed in which case the estimated partition-conditional densities will be mixture models of the chosen kernel functions. The method, furthermore, scales favourably with the dimensionality of the data space and the entropy-minimisation algorithm is e$cient even with large numbers of samples.
839
Acknowledgements IR and RE are funded, respectively, via grants from the commission of the European Community (project SIESTA, grant BMH4-CT97-2040) and British Aerospace plc. whose support we gratefully acknowledge. The iris and wine datasets are available from the UCI machine-learning repository. The authors would also like to thank the anonymous reviewers of this paper for insightful comments and suggestions.
References [1] S.J. Roberts, D. Husmeier, I. Rezek, W. Penny, Bayesian approaches to Gaussian mixture modelling, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1133}1142. [2] K. Fukunaga, An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1990. [3] I. Gath, B. Geva, Unsupervised optimal fuzzy clustering, IEEE Trans. Pattern Analy. Mach. Intell. 11 (7) (1989) 773}781. [4] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Chi!s, NJ, 1988. [5] S.J. Roberts, Parametric and non-parametric unsupervised cluster analysis, Pattern Recognition 30 (2) (1997) 261}272. [6] B.W. Silverman, Density Estimation for Statistics and Data Analysis, Number 26 in Monographs on Statistics and Applied Probability, Chapman & Hall, London, 1986. [7] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1995. [8] W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes in C., Cambridge University Press, Cambridge, 1991. [9] E. Anderson, The Irises of the Gaspe peninsula, Bull. Amer. Iris Soc. 59 (1935) 2}5. [10] S. Aeberhard, D. Coomans, O. de Vel, Comparative-analysis of statistical pattern-recognition methods in high-dimensional settings, Pattern Recognition 27 (8) (1994) 1065}1077.
Pattern Recognition 33 (2000) 841}848
A better "tness measure of a text-document for a given set of keywords Sukhamay Kundu* Computer Science Department, Louisiana State University, Baton Rouge, LA 70803-4020, USA Received 7 October 1998; accepted 18 March 1999
Abstract We present a new "tness measure B (D) for a text-document D against a set of keywords =. The "tness evaluation W forms a basic operation in information retrieval. The measure B (D) di!ers from other measures in that it accounts for W both the frequency of the keywords and their clustering characteristics. It also satis"es the important properties of monotonicity and super-additivity, which do not hold for either of the well-known Paice-measure and the mixed-max}min measure. We give e$cient algorithms for computing B (D) and a generalized form Ba (D) of it. ( 2000 Pattern W W Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Information retrieval; Fitness measure; Clustering; Algorithm
1. Introduction A key operation in information retrieval is the evaluation of the "tness of a document D against a set of keywords =-)"Mw , w ,2, w N, which denotes the 1 2 K universe of keywords that can be used in evaluating D. We assume that D has been preprocessed to remove all non-essential words, and the remaining words have been reduced to their basic forms, say, via a stemming algorithm [1]. We regard D as a sequence D"a a 2a , 1 2 N where a 3); we write N"DDD for the size of D. A more j general case is where D is represented by a tree, say, but we do not consider this here. It is clear that a "tness measure 0)m (D))1 should have the properties W (P.1)}(P.4) below. It is easy to see that the properties (P.1) and (P.2) are equivalent; the property (P.3) implies mH(D)"0 and also (P.1) and (P.2). (P.1) =-monotonicity: If =-=@-), then m (D)) W m (D). W{ (P.2) D-monotonicity: If D@ is obtained from D by replacing one or more occurrences of the keywords in
* Tel.: 225-388-2246; fax: 225-388-1465. E-mail address:
[email protected] (S. Kundu)
)!= by those in =, then m (D))m (D@). (In W W this case, we write D( D@.) W (P.3) =-Superadditivity: m X (D)*m (D)#m (D) if W W{ W W{ =W=@"H. (The inequality `*a is motivated by the fact that the co-occurrences of two keywords w and w should make D at least as valu1 2 able as that when all occurrences of w are re1 placed by w , say, to re#ect the synergistic e!ect of 2 the co-occurrences of w and w .) 1 2 (P.4) D-superadditivity: If D"D D , i.e., D is obtained 1 2 by appending D at the end of D , then 2 1 m (D)*m (D )#m (D ). (Here, the inequality W W 1 W 2 `*a is motivated by the fact that D should be at least as valuable as the sum of its parts; otherwise combining D and D into a larger document may 1 2 not be justi"ed.) For w3), let positions(w)"positions(w, D)"M j: a "wN and for =-) let positions(=)"XMposij tions(w ): w 3=N. We refer to a subset of positions of i i the form Mi, i#1,2, jN, 1)i)j)DDD, as an interval I"[i, j]; we write left(I) for i and right(I) for j. Let size(I)"DID"j!i#1"the number of positions covered by I, and let f "f i"Dpositions(w )D"the numi w i ber of occurrences of w in D; clearly, N"f # i 1 f #2#f . It is convenient to de"ne the binary string 2 K
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 8 7 - 4
842
S. Kundu / Pattern Recognition 33 (2000) 841}848
Fig. 1. Three documents having a given frequencies f of the i keywords and di!erent clustering characteristics of the keyword a and the other keywords.
bin(=, D) whose ith item is1 if a 3= and 0 otherwise; the i length of bin(=, D) is DDD. If D and D are two docu1 2 ments over ), then their concatenation D D denotes the 1 2 combined document D followed by D . For example, 1 2 if D "aba and D "bbbc, then D D "ababbbc. The 1 2 1 2 concatenation of bin(=, D ) and bin(=, D ) gives 1 2 bin(=, D D ); positions(=, D D )"positions(=, D )X 1 2 1 2 1 positions@(=, D ), where positions@(=, D )"MDD D#j: 2 2 1 j3positions(=, D )N. If D( D@, then positions (=, D) 2 W -positions(=, D@). The frequency measure of "tness [1] is de"ned by the normalized ratio F (D)/DDD, where F (D)"+ f (sumW W i med over w 3=)"Dpositions(=)D, the frequency of the i keywords = in D. The mixed-max}min measure [1] uses a linear combination of the maximum and the minimum of the frequencies M f : w 3=N in place of F (D). For i i W D=D"1, the mixed-max}min measure and the Paicemeasure [1,2] coincide with the frequency measure. A major drawback of these measures is that they fail to account for the `patterna of the occurrences of = in D. Fig. 1 shows three documents with a given frequencies f of the keywords )"Ma, b, c, d, eN; the second docui ment is an extreme case where all a's appear clustered together and the third document is the other extreme case where the a's are least clustered. We clearly perceive these documents di!erently in regard to the keyword a; for example, the second document is likely to have a more focussed discussion on the topic a than the third document. This shows the need for a "tness measure to account for both the pattern of occurrences of the keywords and their frequencies. The measure F (D)/D satisW "es the properties (P.1)}(P.4), with equality in both (P.3) and (P.4); the mixed-max}min measure and the Paice measure do not satisfy any of the properties (P.1)}(P.4). We present here two new "tness measures A (D)/DDD W and B (D)/DDD. The measure A (D) approximates F (D) W W W from above and the measure B (D) approximates F (D) W W from below (A"above and B"below): 0)B (D)) W F (D))A (D))DDD. For any given =OH, the range W W of possible values for both A (D) and F (D) is the interW W val [0, DDD]. However, A (D) is more powerful in distinW guishing among various documents D than F (D) beW cause given the basic frequencies f for w 3), the F (D)'s i i W are uniquely determined for all =-) by the f 's. That is, i
there is only one combination of the values F (D), D=D*2, for a given collection of basic frequencies W f . As we will see, this is not so for A (D), D=D*2, and i W there can be many combinations of the values of A (D) W depending on the clustering characteristics of keywords in D. This gives A (D) the extra power to distinguish W among documents D. A similar remark holds for B (D). W We "rst present A (D) because of its simplicity and its W important role as a point of contrast for B (D), which is W superior to the measure A (D). We also present several W generalizations of B (D) to account for the e!ect of W paging within a document and to reduce the impact of small changes in D. We point out that although A (D) W and B (D) are de"ned in terms of positions(=), these W positions themselves are not signi"cant in that if we shift the positions of the keywords = by a "xed amount, for example, then there is no change in A (D) and B (D). W W
2. The measure A (D) W Consider the document D in Fig. 2(i), where the positions of w"a are shown as solid circles; F (D)"6. a Suppose we partition positions(a)"M1, 3, 8, 9, 10, 18N into three disjoint subsets P , P , and P in some way. 1 2 3 Let I be the interval which covers the items in P , i.e., j j I extends from the leftmost item in P to the rightmost j j item in P . If we choose the partition positions(a)"P X j 1 P XP to minimize DI D#DI D#DI D is minimum, then we 2 3 1 2 3 get positions(a)"M1, 3NXM8, 9, 10NXM18N, giving I " 1 [1, 3], I "[8, 10], I "[18, 18], and the minimum total 2 3 length"3#3#1"7. See Fig. 2(i). Let g denote the number of subsets in a partition. For g"2, the optimal partition is positions(a)"M1, 3, 8, 9, 10NXM18N, giving I "[1, 10], I "[18, 18], and the minimum total 1 2 length"10#1"11'7. We now formally de"ne A (D) by Eq. (1) below. W Note that for an optimal set of intervals I" MI , I , 2, I N, n)g, we necessarily have I WI "H for 1 2 n i j
Fig. 2. Illustration of A (D) for ="MaN, McN, and Ma, cN using W g(a)"3, g(c)"1, and g(Ma, cN)"4.
S. Kundu / Pattern Recognition 33 (2000) 841}848
iOj. For the special case ="MwN, A (D) re#ects the W portion of D that would be referred by an optimal indexset (like that at the end of a textbook) of size )g for w such that each item I in the index-set refers to a group j of one or more occurrences of w (with possible gaps among those occurrences) and these references together cover all occurrences of w. We will say more about the choice of g shortly. It is clear that A (D)*F (D); for W W g*F (D), each DI D"1 in an optimal partition of posiW i tions(=), giving A (D)"F (D). In general, the optimal W W partition corresponding to A (D) is not unique. It is easy W to see that the more clustered [6] the occurrences of the keywords = are, the closer is the approximation A (D) W to F (D). W A (D)"min MDI D#DI D#2#DI D: I XI X2XI W 1 2 n 1 2 n nxg .positions(=)N. (1) Throughout this paper, we choose g for a keyword w3) according to g"g(w)"x F (D)/k y , for some "xed W constant k*1 (where x xy denotes the largest integer )x). This amounts to saying that on the average each interval of an optimal partition of positions(w) contains *k occurrences of w. For =-), we then take g"g(=)"+g(w), summed over w3=. The reason for allowing a larger number of intervals for a larger set of keywords is that otherwise A (D) tends to be too large W when the sets positions(w), w3=, are dispersed too widely, which is typically the case for a large document. This can be seen by using g"3 for each of ="MaN, McN, and Ma, cN for the document in Fig. 2. The optimal set of intervals for ="Ma, cN is now M[1, 3], [7, 10], [14, 18]N, giving AM N(D)"12, which is much a, c larger than A (D)#A (D)"7#3. However, if we take a c k"2 and hence g(a)"3, g(c)"1, and g(Ma, cN)" 3#1"4, then we have AM N(D)"11)A (D)# a, c a A (D)"7#10. c One price we pay for using a larger g for a larger = is that we no longer have the =-monotonicity of A (D), W i.e., we do not always have A (D))A (D) for =L=@. W W{ For example, if D"ababccbccb and we take k"2, then we have g(a)"1, g(b)"2, g(Ma, bN)"3, AM N(D)" a, b 6(7"A (D). The reverse inequality A (D)*A (D) b W W{ for =L=@ does not always hold either. If we use the same g for both = and =@, then A (D))A (D). The W W{ lack of =-monotonicity (and hence of D-monotonicity) of A (D) is caused partly by its over-dependence on the W clusteredness of positions(=), and this makes it unsuitable as a "tness measure. The following theorem summarizes some other properties of A (D) (cf. (P.3) and W (P.4)). Theorem 1. If =W=@"H, then we have the =-subadditivity A X (D))A (D)#A (D) and the D-subadW W{ W W{ ditivity A (D D ))A (D )#A (D ). W 1 2 W 1 W 2
843
Proof. Let the intervals I"MI , I ,2, I N and 1 2 n I@"MI@ , I@ ,2, I@ N gives rise to A (D) and A (D), re1 2 n{ W W{ spectively. Let IA"IXI@. If I WI@ OH for some i and j, i j then we remove I@ from IA and replace I in IA by I XI@ . j i i j In this process, the total sum of DI D and DI@ D for the i j intervals in IA is not increased, and their union remains the same and hence it contains positions(=X=@). Also, the number of intervals in IA does not exceed n#n@)g(=)#g(=@)"g(=X=@). We repeat this process until no further change in IA is possible, i.e., the intervals in IA are disjoint. It follows that A X (D)) W W{ A (D)#A (D). (If the "nal IA has (n#n@ intervals, W W{ then we might be able to replace an interval in IA by two intervals with a smaller total length such that the union of the intervals in the row IA still contains positions(=X=@) but the sum of their sizes is decreased.) The second part of the theorem is proved in a similar way. h Given below is a simple greedy algorithm MEASURE-A to compute A (D). The step (1) takes W O(N) time because bin(=, D) can be constructed in time O(N) if we assume that the document D is represented by an array whose jth item equals i if a "w j i and the subset = is presented as a binary array whose ith item is 1 or 0 according as w 3= or not. The sorting i of the intervals ; according to their sizes in step (4) takes i at most O(n log n) time and hence at most O(N log N) time; the total time of MEASURE-A is therefore O(N log N). Algorithm MEASURE-A Input: The document D, a subset of keywords =-), and an integer g*1. Output: A (D), using at most g intervals. W 1. Construct bin(=, D) and scan it from left to right to determine the successive intervals ; , i"1, 2,2, n i (say), corresponding to the blocks of consecutive 0's. Initialize sum"DDD. 2. If the leftmost 0-interval ; contains the position 1, 1 then let sum"sum!D; D. Similarly, if the rightmost 1 0-interval ; contains the position DDD, then let n sum"sum!D; D. (The intervals ; and ; are disn 1 n joint from any interval which contributes to A (D) in W these cases.) 3. If there are )g!1 intervals ; remaining, then let i A (D)"sum!+D; D (summed over all remaining W i ; 's). (In this case, we have A (D)"F (D).) i W W 4. Otherwise, choose (g!1) largest intervals ; from the i remaining intervals and let A (D)"sum!+D; D W i (summed over these g!1 intervals). Let MA (D)"A (D)/F (D), if F (D)'0 and"1 W W W W otherwise. If M"maxMMA (D), MA (D)N, then it follows W W{
844
S. Kundu / Pattern Recognition 33 (2000) 841}848
from Theorem 1 that for =W=@"H, A X (D)) W W{ M(F (D)#F (D))"MF X (D) and hence MA X (D)) W W{ W W{ W W{ maxMMA (D), MA (D)N. Let kA (D)"1/MA (D) so that W W{ W W 0)kA (D))1. We can restate the above inequality as W kA X (D)*minMkA (D), kA (D)N, or equivalently, kA (D)* W w{ W W{ W minMkA(D): w3=N. If we regard kA (D) as a membership w W function of a fuzzy set on the set of documents D, then according to the fuzzy set theory [3] the min-operation in the above inequality makes the query `how "t is D to =a look like an and-combination of the elementary queries `how "t is D to w a, w 3=. (The abstract fuzzy i i and-operation gives equality in the above minimum.) Note that although the union in positions(=)"Xpositions(w ) (over w 3=) may suggest that the query correi i sponding to = is an or-combination of the elementary queries for w 3=, the above discussion shows i that this is an incorrect interpretation. The measure B (D) in the next section also supports the and-combinaW tion view of the query for =. From Theorem 1, one can also show that kA (D D )*minMkA (D ), kA (D )N. We W 1 2 W 1 W 2 note in passing that the fuzzy membership function kA (D), which is well behaved in view of the above inW equalities, is somewhat unusual in that it is the ratio of the probability F (D)/DDD and the approximation W A (D)/DDD to that probability. In Ref. [4], the leftnessW measure between two intervals was another example of a well-behaved fuzzy membership function that resulted in an unusual way of being the di!erence between two probabilities.
3. The measure B (D) W We de"ne the measure B (D), which is closely related W to A (D), by Eq. (2). The main di!erence between Eqs. (2) W and (1) is the restriction I -positions(=) for each i, and i the replacement of `mina by `maxa. The intervals I in i Eq. (2) are taken to be disjoint to avoid duplicate count of the occurrences of the keywords = in D. B (D)"max MDI D#DI D#2#DI D: each I W 1 2 n i nxg -positions(=) and I WI "H for iOjN. i j
(2)
For g"3 and the document D in Fig. 2, we get B (D)"5, B (D)"3, and BM N(D)"6. As in the case of a,c a c A (D), a larger value of g tends to make B (D) larger and W W hence make B (D) approximate F (D) more closely. In W W the extreme case of g*maxMF (D): w3)N, we have each w B (D)"F (D) and for g"DDD each B (D)"F (D). w w W W The measure B (D) approximates F (D) poorly when W W positions(=) are least clustered (cf. the third document in Fig. 1), which is also the case for A (D). The following W algorithm MEASURE-B, which is similar to MEASURE-A, computes B (D) in time O(N log N). W
Algorithm MEASURE-B Input: The document D, a subset of keywords =-), and an integer g*1. Output: B (D), using at most g intervals. W 1. Construct bin(=, D) and scan it from left to right to determine the successive intervals < , i"1, 2,2, n i (say), corresponding to the blocks of consecutive 1's. 2. If (n)g), then B (D)"+D< D (summed over W i 1)i)n). 3. Otherwise, choose g largest intervals among < 's and i let B (D)"+D< D (summed over these g intervals). W i Theorem 2. The measure B (D) satisxes the properties W (P.1)}(P.4). Proof. The proof of (P.3) is similar to the proof of subadditivity of A (D) in Theorem 1, except that an interval W I for B (D) is now necessarily disjoint from an interval i W I@ for B (D) when =W=@"H because I -posij W{ i tions(=) and I@ -positions(=@). This also proves (P.1) j and (P.2). The proof of (P.4) is easy. h Let kB (D)"B (D)/F (D) if F (D)'0, and"1 otherW W W W wise; clearly, 0)kB (D))1. If we now write W M"minMkB (D), kB (D)N, then it follows from Theorem 2 W W{ that for =W=@"H, B X (D)*M(F (D)#F (D))" W W{ W W{ MF X (D) and hence kB X (D)*min MkB (D), W W{ W W{ W kB (D)N. We can restate this as kB (D)*minMkB (D): W{ W w w3=N. As in the case of kA (D), we can regard kB (D) as W W a membership function of a fuzzy set on the set of documents D, and the above inequality supports the view that the query `how "t is D to =a is an and-combination of the elementary queries `how "t is D to w a, w 3=. We i i also have kB (D D )*minMkB (D ), kB (D )N. We point W 1 2 W 1 W 2 out that although neither of kA (D) and kB (D) may have W W the =-monotonicity property, this in itself does not indicate a shortcoming of A (D) or B (D). We make two W W additional remarks: (1) The subadditivity of A (D) suggests that the co-ocW currence of two keywords w and w in D may reduce 1 2 the importance of each w in some way. This would i be the case if in some way we tend to perform excess `countinga in A (D) for each w . Indeed, this excess W i counting is caused by the fact that the intervals I may contain positions not occupied by keywords i in =. (2) The reader who is familiar with the notions of outer and inner measures in abstract measure theory [5] will recognize that A (D) is a kind of outer measure W and B (D) is a kind of inner measure of the set W positions(=), with the important restriction imposed here on the number of intervals I allowed in Eqs. (1) j and (2). Although B satis"es =-monotonicity, this W restriction on the number of intervals makes A (D) W
S. Kundu / Pattern Recognition 33 (2000) 841}848
fail to satisfy the =-monotonicity. Theorem 1 here corresponds to a similar result for outer measures [5], and similarly Theorem 2 here corresponds to a similar result for inner measures [5].
4. Generalization of B (D) W The condition `I -positions(=)a in Eq. (2) is in i a sense too strict in that a small change in few positions of the keywords = may a!ect the value of B (D) in a signifW icant way. One way to reduce this impact is to relax the condition (2) to the weaker form (3) shown below and de"ne the new measure Ba (D) by Eq. (4). The assumption W a*0.5 in Eq. (3) is clearly meaningful from the practical considerations; it also plays a critical role in the proof of Theorem 3. The need to assume the disjointness of the intervals I in Eq. (4) arises because of the pathological i situation illustrated in Fig. 3, which shows that if I 's are i not disjoint, then a large block of occurrences of the keywords in = may be accounted for more than once, giving too large a value for Ba (D). One way to avoid this W problem is to replace the sum in Eq. (4) by D(XI )Wposij tions(=)D, but this does not properly re#ect our intuition about the role of a and hence we do not consider this variation (although Theorem 3 below would remain valid). DI Wpositions(=)D*aDI D, where 0.5)a)1 i i is a "xed constant.
(3)
Ba (D)"max M + DI Wpositions(=)D: each I W i i nxg 1xixn satis"es Eq. (3) and I WI "H for iOjN. (4) i j The condition (3) has the e!ect of allowing many more alternate choices for the intervals I in Eq. (4) when a(1, i and hence Ba (D)*B (D). The intersections I WposiW W i tions(=) in the right-hand side of Eq. (4) together with the disjointness of I 's ensure that Ba (D))F (D). It is i W W clear that Ba (D)"B (D) for a"1. Note that if < is W W a maximal block of consecutive 1's and I belongs to an i optimal set of intervals for Eq. (4), then either <-I or i <WI "H. This property plays an important role in the i algorithm MEASURE-Ba for computing Ba (D). W
Fig. 3. For g"2 and a"2/3, the use of non-disjoint I and 1 I would give Ba (D)"10, which is too large compared to 2 W F (D)"6. W
845
Theorem 3. For a*0.5, Ba (D) satisxes the properties W (P.1)}(P.4). Proof. We "rst prove the =-superadditivity of Ba (D). W Let =W=@"H. Suppose the intervals I" MI : 1)i)nN give the maximum value in Eq. (4) for i Ba (D) and similarly the intervals I@"MI@ : 1)j)n@N for W j Ba (D). Let IA"IXI@ and l"Ba (D)#Ba (D). Since W{ W W{ DI Wpositions(=X=@)D"DI Wpositions(=)D#DI Wposii i i tions(=@)D*DI Wpositions(=)D*aDI D, it follows that i i each I satis"es Eq. (3) for =X=@, and similarly for each i I@ . Also, the sum in the right-hand side of Eq. (4) for the j intervals in IA and =X=@ equals l. We now modify IA to make its intervals disjoint, without decreasing the sum in Eq. (4) below l for the set of intervals IA and =X=@, and without increasing the size of IA, which is currently )n#n@)g(=X=@). First, we can remove an I from i IA if it is contained in some I@ because DI@ W positions j j (=X=@)D"DI@ Wpositions(=)D#DI@ Wpositions(=@)D* j j DI Wpositions(=)D#DI@ Wpositions(=@)D, where DI posii j i tions(=)D is the contribution of I to Ba (D) and DI@ posii W j tions(=@)D is the contribution of I@ to Ba (D). Similarly, j W{ we can remove an I@ from IA if it is contained in some I . j i We repeat this process until no interval in IA is contained in another. Note that if I "I@ for some i and j, then i j exactly one of them will remain in the reduced IA. Suppose now that for some I and I@ in IA, we have I WI@ OH. i j i j Consider the interval IA "I !I@ OH. If DIA Wposiij i j ij tions(=)D*aDIA D, then we can replace I by IA to reduce ij i ij the number of pairs of intervals in IA that have non-empty intersection and this does not decrease the sum (4) for the set of intervals IA and =W=@ below l because DIA Wpositions(=X=@)D#DI@ Wpositions(=X=@)D ij j *DIA Wpositions(=)D#DI@ Wpositions(=)D ij j #DI@ Wpositions(=@)D j *DIA Wpositions(=)D#DI WI@ Wpositions(=)D ij i j #DI@ Wpositions(=@)D j "DI Wpositions(=)D#DI@ Wpositions(=@)D i j "Contributions of I and I@ to v. i j A similar modi"cation of IA would be possible if D(I@ !I )Wpositions(=@)D*aDI@ !I D. Now, suppose that j i j i neither of the above replacement operations for I and i I@ apply. Then, DIA Wpositions(=)D(aDIA D implies that j ij ij for the interval I "I WI@ we have DI Wpositions ij i j ij (=)D'aDI D; likewise, we also have DI Wpositions ij ij (=@)D'aDI D. However, this is a contradiction because ij DI D*DI Wpositions(=X=@)D ij ij "DI Wpositions(=)#DI Wpositions(=@)D ij ij 'aDI D#aDI D ij ij *DI D, since a*0.5. ij
846
S. Kundu / Pattern Recognition 33 (2000) 841}848
This shows that the remaining intervals in IA can be modi"ed to make them disjoint without decreasing the sum in Eq. (4) for IA below l. This completes the proof of (P.3), and hence of (P.1)}(P.2). Finally, we prove D-superadditivity. Suppose I"MI : 1)i)nN and I@"MI@ : 1)j)n@N are optimal i j intervals for Ba (D ) and Ba (D ). Let IA"[DD D# W 1 W 2 j 1 left(I@ ), DD D#right(I@ )] be the interval in the part D of j 1 j 2 D"D D corresponding to I@ ; let IA"MIA: I@ 3I@N. 1 2 j j j Clearly, IXIA consists of disjoint intervals which satisfy Eq. (3) and n#n@)g(=) for D. This shows Ba (D D )*the sum in Eq. (4) for I@XIA, which equals W 1 2 Ba (D )#Ba (D ). This completes the proof. h W 1 W 2 5. Algorithm for Ba (D) W As one might expect, the computation of Ba (D) is W substantially more complex than that of B (D) due to the W fact that there are many more intervals I that need to be i considered now. An e$cient method for computing Ba (D) is obtained by converting the problem to a shorW test-path problem in a specially directed graph G"G(a, =, D), which is constructed as follows. Consider the successive blocks of consecutive 0's and consecutive 1's in bin(=, D). Let Z , 1)i)n (say), be i the 0-blocks which have 1-blocks on its left and right; let [left(Z ), right(Z )] denote the interval for the i i 0-block Z . We consider two other imaginary 0-blocks i Z and Z in the following sense. If the "rst position 0 n`1 in bin(=, D) equals 1, then Z is an imaginary 0-block 0 preceding the leftmost 1-block; otherwise, Z is the 0 0-block starting at the "rst position of bin(=, D). We de"ne Z in a similar way by considering the last n`1 position of bin(=, D). If Z is an imaginary 0-block, 0 then we take left(Z )"0"right(Z ) and similarly if 0 0 Z is an imaginary 0-block then we take n`1 left(Z )"N#1"right(Z ), where N"DDD. The din`1 n`1 graph G has (n#2) nodes Z , Z ,2, Z . There are 0 1 n`1 two kinds of arcs in G: Cover-arcs: Each cover-arc corresponds to a group of consecutive 1-blocks (together with their intervening 0blocks) that can be accounted by a single interval satisfying the condition (3) and hence can potentially contribute to Ba (D). Such an interval will necessarily contain all of W an 1-block or be disjoint from it, and hence we represent it as an arc (Z , Z ), 0)i(j)n#1. Here, Z is the i j i 0-block to the left of the "rst 1-block included in the interval and Z is the 0-block to the right of the last j 1-block in the interval. The interval for this cover-arc is I(i, j)"[right(Z )#1, left(Z )!1] and the condition (3) i j for I(i, j) can be rewritten as Eq. (5) below. Note that (Z , Z ) is a cover-arc for each i, 0)i)n. In general, i i`1 there will be many other cover-arcs. Skip-arcs: Each skip-arc correspond to a 1-block. The word `skipa means that we may not use this 1-block in
Fig. 4. The digraph G(a, MbN, D) for a"4/5"0.8 and D as in Fig. 2, and the measure Ba(D) for di!erent values of g. b
Ba (D). The only skip-arcs are (Z , Z ), 0)i(n#1. W i i`1 Condition for cover-arc(Z , Z ): i j (1!a)[left(Z )!right(Z )!1]* + DZ D. (5) j i k i:k:j We consider paths n in G from Z to Z which consists 0 n`1 of at most g cover-arcs. We de"ne the cost of each cover-arc (Z , Z ) to be 0 and the cost of the skip-arc i j (Z , Z ) to be the size of the 1-block between Z and i i`1 i Z , namely, left(Z )!right(Z )!1. Fig. 4 shows the i`1 i`1 i digraph G(4/5, MbN, D) for the document D in Fig. 2. Cost of an arc:
G
for a cover-arc cc(Z , Z )"0, i j for a skip-arc cs(Z , Z )"left(Z )!right(Z ) i i`1 i`1 i !1'0. (6)
The cost of a path n is de"ned to be the sum of the costs of its skip-arcs. Let n(Z , k)"the shortest length j`1 of a Z !Z path using at most k cover-arcs. The 0 j`1 connection between n(Z , g) and Ba (D) is given by n`1 W Eq. (7). Ba (D)"Dpositions(=)D!n(Z , g). (7) W n`1 Eq. (8) below gives a method for computing successively n(Z , k), 0)k)g, for increasing j. The "rst term on j`1 the right-hand side in Eq. (8) corresponds to the case where the last arc on the optimal Z !Z path is the 0 j`1 skip-arc (Z , Z ). The second term in Eq. (8), i.e., the j j`1 nested min-term corresponds to the case where the last arc (Z , Z ), i)j, on the optimal path is a cover-arc. i j`1 The algorithm MEASURE-Ba given below for computing Ba (D) is based directly on Eq. (8). Although we state W the algorithm for a*0.5, it actually computes Ba (D) W
S. Kundu / Pattern Recognition 33 (2000) 841}848
correctly for all a'0.
6. E4ect of paging
n(Z , k)"minMn(Z , k)#cs(Z , Z ), j`1 j j j`1 min Mn(Z , k!1): (Z , Z ) i i j`1 k~1xixj is a cover-arcNN.
(8)
Algorithm MEASURE-(Ba) Input: The document D, a subset of keywords =-), an integer g*1, and 0.5)a)1. Output: Ba (D), using at most g intervals. W 1. Construct G"G(a, =, D). Initialize, n(Z , k)"0 for 0 0)k)g. 2. For ( j"0, 1,2, n), do the following: (2.1) Let n(Z , 0)"n(Z , 0)#cs(Z , Z ). j`1 j j j`1 (2.2) For (k"1, 2,2, g), compute n(Z , k) by j`1 Eq. (8). 3. Let Ba (D)"Dpositions(=)D!n(Z , g). W n`1 To construct G in Step (1), we scan the binary vector bin(=, D) from left to right to determine the 0-blocks Z to Z (which lie between the "rst 1 in bin(=, D) and its 1 n last 1) and the number of 1's between Z and Z , i i`1 1)i)n, including the number of 1's preceding Z and 1 following Z . For each i, the veri"cation of Eq. (5) n for each successive j"i#1, i#2,2, n#1 requires one addition for the right-hand side of Eq. (5) and hence can be done in a constant time. Thus, G can be constructed in time O(n2). It is clear that each cover-arc and each skip-arc is examined in step (2) at most g times, and hence step (2) takes at most O(gp), where p"d(arcs in G)" O(n2). The total computation time for MEASURE-Ba is therefore O(gn2), when n)F (D))N. W We remark that although Ba (D) is a signi"cant imW provement over B (D), it still has some drawbacks in W that if we have a very large block of 1's in bin(=, D), then Ba (D) will account also any nearby isolated small groups W of 1's and thereby tend to increase the "tness value of =. This may not be always desirable because the isolated (groups of ) 1's may be quite far from the main block of 1's if the size of the later is large. This can be avoided, however, by putting additional restriction on the intervals I in Eq. (4); for example, we may require that the i maximum separation between two blocks of 1's in I i cannot exceed a certain limit, which may depend on the number of 1's in I . Fortunately, such modi"cations do i not increase the computation time of MEASURE-Ba. We also remark in passing that A (D) cannot be meanW ingfully generalized to Aa (D) in a fashion similar to W Ba (D). For example, if we try to replace the condition W I XI X2XI .positions(=) in Eq. (1) by the weaker 1 2 n condition (9), then we may no longer have F (D)) W Aa (D) although we would have Aa (D))A (D), with W W W equality for a"1. D(I XI X2XI )Wpositions(=)D*aDpositions(=)D. 1 2 n
847
(9)
We brie#y consider two ways to account for the e!ect of paging in D. The e!ects of other structures in D such as paragraphs or chapters can be handled in a similar manner. The paging basically means that the keywords in D are now grouped by pages. In Fig. 5, we use the marker &D' to separate the successive pages. For simplicity, we consider a page which does not contain any keyword from ) as a blank-page and ignore such pages. We now replace the binary vector bin(=, D) by an integer vector pagewise-counts(=, D), whose ith term is the count of the keywords in = in page i of D; in particular, pagewise-counts(=, D)"bin(=, D) if each page of D contains at most one occurrence of the keywords in =. We write page-sizes(D)"pagewise-counts(), D). We write pagewise-probs(=, D) for the vector of probabilities of the keywords in = in the various pages of D; see Fig. 5. Finally, the vector cond-probs(=, D) gives the conditional probability of each page, given that it contains a keyword from =. The ith term in condprobs(=, D) is given by the ratio of the ith term in pagewise-counts (=, D) and Dpositions(=)D. Note that Dpositions(=)D is the sum of the terms in pagewisecounts(=, D). One way of handling the e!ect of paging is to "rst convert the vector cond-probs(=, D) to a binary vector by thresholding, i.e., replacing each probability by 1 if it is *q and 0 otherwise, where q is the threshold value. Then, use the resulting binary vector in the same way as in Ba (D). Yet another possibility is to use the vector W cond-probs(=, D) directly in place of bin(=, D) with size(I) of an interval I"[i, j], where i and j are page numbers, being de"ned by +cond-prob (=, D) (summed k over pages k, i)k)j). In this way, we focus on the pages that are relatively important in terms of the
Fig. 5. The page-sizes(D), and pagewise-counts(=, D) and pagewise-probs(=, D) for ="MaN and Ma, bN for a document with 6 pages and length 20.
848
S. Kundu / Pattern Recognition 33 (2000) 841}848
keywords = themselves. The use of pagewiseprobs(=, D) instead of con-probs(=, D) is not appropriate here; likewise, the use of thresholding with pagewiseprobs(=, D) is not appropriate.
would correspond to the days where the price change is *#2% and we can take a"0.9, say, to capture the notion of `sustaineda increase; the measure B0.9(D) W would now give us the desired evaluation.
7. Conclusions
References
We presented here an e!ective way of measuring the occurrences of certain atomic information in an ordered sequence such as the occurrences of keywords in a textdocument by taking into account both the frequency and the clustering characteristics of those occurrences. The basic measure B (D), including its generalization Ba (D), W W satisfy the important properties of super-additivity and monotonicity. Although our original motivation for developing Ba (D) came from a problem in information W retrieval, we can use Ba (D) e!ectively in a variety of other W applications. For example, if we view the daily changes in the price of a stock as a document D and we are interested in measuring the periods of sustained growth in value of the stock by, say, 2% or more, then positions(=, D)
[1] W.B. Frakes, R. Baeza-Yates (Eds.), Information Retrieval } Data Structures and Algorithms, Prentice-Hall, Engleman Cli!s, NJ, 1992. [2] C.P. Paice, Soft evaluation of boolean search queries in information retrieval systems, Information Technol. Res. Dev. Appl. 3 (1984) 33}42. [3] G.J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic } Theory and Applications, Prentice-Hall, Engleman Cli!s, NJ, 1995. [4] S. Kundu, Min-transitivity of fuzzy leftness relationship and its application to decision making, Fuzzy Sets and Systems 86 (1997) 357}367. [5] P.R. Halmos, Measure Theory, D. Van Nostrand Co., Princeton, NJ, 1950. [6] B. Everitt, Cluster Analysis, (Second ed.), Halsted Press, New York, 1980.
About the Author*SUKHAMAY KUNDU received his Ph.D. from University of California, Berkeley and Masters from Indian Statistical Institute, Calcutta. His research interests include algorithms, fuzzy logic, and arti"cial intelligence.
Pattern Recognition 33 (2000) 849}858
A tabu-search-based heuristic for clustering C.S. Sung*, H.W. Jin Department of Industrial Engineering, Korea Advanced Institute of Science and Technology, 373-1 Kusong-dong, Yusong-gu, Taejon 305-701, South Korea Received 24 April 1998; received in revised form 8 February 1999; accepted 8 March 1999
Abstract This paper considers a clustering problem where a given data set is partitioned into a certain number of natural and homogeneous subsets such that each subset is composed of elements similar to one another but di!erent from those of any other subset. For the clustering problem, a heuristic algorithm is exploited by combining the tabu search heuristic with two complementary functional procedures, called packing and releasing procedures. The algorithm is numerically tested for its e!ectiveness in comparison with reference works including the tabu search algorithm, the K-means algorithm and the simulated annealing algorithm. ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Tabu Search; Heuristic; Clustering; Packing; Releasing
1. Introduction Clustering has been an important issue occurring widely in information data analysis areas. Many of its applications can be found in the literature analyzing marketing, medical, archaeology, or pattern recognition data [1]. This paper considers such a clustering problem stated as follows. Given n elements each of which has K attributes, the problem objective is to group (classify) all the elements into C clusters such that the sum of the squared Euclidean distance between each element and the center of its belonging cluster for every such allocated element is minimized. The clustering process can be represented as an assignment of all the n elements to C clusters, so that a M0, 1N integer programming formulation can be derived. For the integer programming formulation, the following notation will be used throughout this paper. n C
number of attributes of each element value of the kth attribute of element i average of the kth attribute values of all elements in the cluster c
K a ik m ck
G
1 if data i is contained in cluster c, y " ic 0 otherwise. It is noted that the number of clusters C is a "xed value within the range of from 2 to (n!1). Each a is given as ik an input, and each y is a decision variable. The average ic m is dependent on y . The given problem is now exck ic pressed as the integer programming problem; min s.t.
number of elements number of clusters
C n K + + + (a !m )2 ) y ik ck ic c/1 i/1 k/1 C + y "1, i"1, 2,2, n, ic c/1 n + y *1, c"1, 2,2, C, ic i/1 y 3M0, 1N, ic
* Corresponding author. Tel.: #82-42-869-3102; fax: #8242-869-3110. E-mail address:
[email protected] (C.S. Sung)
where
A B
n n m " + y ) a / + y , k"1,2, K, c"1,2, C. ck ic ik ic i/1 i/1
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 0 - 4
850
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
The objective function is non-linear and non-convex [2], so that it seems to be very di$cult to investigate the problem in an analytical approach. Referring to Ref. [3], it is seen that the above problem is NP-complete. This provides us with the motivation of developing an e$cient heuristic procedure to "nd a near optimal solution in reasonable time. For such clustering problems, several algorithms using the simulated annealing technique have appeared in the literature [4}6]. Klein and Dubes [5] have proposed a simulated-annealing-based algorithm, but without making any explicit discussion about the cooling strategy. Selim and Al-Sultan [6] have proposed a simulated annealing algorithm with a discussion about both a cooling strategy and a parameter selection strategy. Al-Sultan [7] has recently proposed an algorithm with a tabu search heuristic adapted and showed that the algorithm outperforms both the K-means algorithm and the simulated annealing algorithm. Thereby, for the integer programming problem, this paper wants to exploit a heuristic solution algorithm by employing a tabu search heuristic.
2. Motivation of proposing the algorithm The heuristic algorithm of this paper consists of two parts. One is to "nd the initial solution and the other one is to improve the initial solution. For the initial solution derivation, both the procedure of packing element pairs and the procedure of clustering such packed elements are proposed. Moreover, for the initial solution improvement, a reallocation procedure is also proposed. The main feature of the heuristic algorithm is composed of packing and releasing procedures. The packing procedure is to bind a subset of elements together as a single element and the releasing procedure is to separate any packed elements from each other. In the initial solution step, any elements showing a high possibility to be grouped into a cluster is packed together, which is represented as the packing procedure. In the solution improvement step, each packed element is allowed to move together so as to make a drastic move (improvement) in the solution search e!ort. This is the major motivation of proposing the packing procedure. Another motivation is provided with that the solution space can be reduced by disregarding any solution in which any two elements having a short element-to-element Euclidean distance do not belong to the same cluster. Therewith, the packing procedure can help to promote the e$ciency of the solution search. On the other hand, there may be such a drawback in the packing procedure that the solution space may not be fully searched due to each packed element treated as a single element. This provides us with the motivation of proposing a releasing procedure such that
any packed-elements are separated from each other (release of a packing relation). Once all the packed elements are tried for reallocation, the releasing operation is to be performed. The procedure of reallocating such packed elements and that of releasing any one of the packed elements are processed alternately in an iterative manner until all the packed elements are released. This way, the releasing procedure can help to promote the e!ectiveness of the solution search. More detail about the packing and releasing procedures will be discussed in Sections 3 and 4. During the solution improvement e!ort, a reallocation trial of all the packed elements may become trapped at a local optimal solution. In order to avoid such a trapping situation, the tabu search method [8] is to be employed together with both the packing and releasing procedures to improve greatly the solution search e$ciency and e!ectiveness. The tabu search method for clustering problems has already been proposed by AlSultan [7]. However, in this paper, the tabu search method is to be used as a sub-module in the whole solution algorithm. Moreover, as will be discussed later, the move for packed element and releasing is newly exploited for the tabu search, while Al-Sultan [7] has considered just the individual move.
3. Initial solution 3.1. Packing procedure The basic idea of the proposing algorithm is that any pair of elements being close to each other are more likely to be grouped into a cluster. To implement this idea, the algorithm employs a packing procedure to treat a subset of elements as a single element. That is, a subset of elements are packed together as a single element and assigned to the same cluster. The packing procedure is now described in detail. All the possible pairs of elements are "rst sorted in the increasing distance order. The two elements of the "rst ranked (closest) pair in the sorted sequence are packed together as a single element. This packing procedure is then repeated for the rest of all such ranked pairs until having a predetermined number of packings completed. If one of the elements of a current pair is a member of any earlier packed element, then the other element will also be packed into that earlier packed element. If both the elements of a current pair are separately contained as the members of two di!erent packed elements, then the two packed elements will be agglomerated as one packed element. If both are contained as the members of the same packed element, then no change will be made. This way the packing procedure will be processed sequentially, while the method of determining the proper number of packings will be discussed later.
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
3.2. Packing property This section characterizes a property named packing property. The property is used in the solution algorithm to determine the appropriate number of packings and to "nd an initial solution. Let us introduce the property called a generalized string property (GSP) which has been considered in Ref. [9] as the necessary condition for optimal clustering in one-dimensional Euclidean case. Property (generalized string property). Consider a clustering problem for various elements each being de"ned in a Euclidean space. Let d be the ij Euclidean distance between elements i and j. If i and j are included together in a cluster, then every element k satisfying the relation d (d or d (d needs also kj ij ki ij to be included in the cluster. Rao [10] has also studied a property similar to GSP, while it is neither a necessary condition nor a su$cient condition. Actually, the proposed clustering problem is heavily dependent on data structures, so that it seems di$cult to "nd any good robust property. Therefore, this paper wants to exploit a packing property that can be used to determine the appropriate number of packings, while it is similar to GSP but more practical for determining the number of packings. Property (packing property). Let every cluster satisfy GSP and have more than m elements. Then each pair of elements having its element-to-element distance shorter than D be(2m~1) longs to such a cluster together, where D denotes (2m~1) the (2m!1)st shortest element-to-element Euclidean distance. Proof. Suppose that d is the kth shortest element-toij element distance, where 1)k)2(m!1) and d ( ij D , and that elements i and j belong to di!erent (2m~1) clusters, say I and J, respectively. Since each cluster has more than m elements, the size cardinality of each of the clusters I and J is at least m. Based on GSP, it holds that d )d for all i@3I, i@Oi, where the number of such ii{ ij element i@ is at least (m!1). Therefore, there are at least (m!1) pairs having element-to-element distances shorter than or equal to d . A similar logic can be applied to ij elements i and j@3J, j@Oj, having (m!1) pairs with their distances shorter than or equal to d . This implies that ij there are at least 2(m!1) pairs having their distances shorter than or equal to d , and so d is equal to or ij ij greater than D . This contradicts the assumption (2m~1) that d is the kth shortest distance. ij Thus, the proof is completed. h 3.3. Number of packings This algorithm uses 2(m!1) initial packings provided that each cluster has at least m elements. Therefore,
851
determining the minimum cardinality m is equivalent to determining the number of packing operations. Surely, the lower bound of m is 1. The logic for calculating the upper bound can also be characterized easily. If a packing is made, then two elements (or packed elements) are agglomerated as one packed element whose cardinality is the sum of the cardinalities of all the original elements (or packed element). Since the number of packings is 2(m!1), in the extreme case the maximum cardinality of a packed element can be 2m!1 (the number of packings plus its original element). In the case, 2m!1*m for ∀m*1 and all the elements in a packed element should belong to the same cluster, so that it is possible to make a cluster being composed of all the elements in such packed elements and having its cardinality at 2m!1. This gives the maximum cardinality of a cluster in the initial solution. Moreover, since all the other clusters should contain more than m elements, the following relation should be satis"ed; n!(2m!1)*(C!1)m. This leads to m)(n#1)/(C#1), and so m) x (n#1)/(C#1)y since m is an integer. Thus, m has its upper bound, x (n#1)/(C#1) y, and so can take an integer number on the range between 1 and x (n#1)/(C#1)y. This implies that there are many possible choices for m. Therefore, this algorithm wants to choose the maximum cardinality of m. The reason is that the biggest m within the allowed integer range can maximize the advantage of the packing. Thus, the algorithm considers the number of packing operations at 2(x(n#1)/(C#1) y!1). 3.4. Initial solution search procedure In the initial solution each cluster should contain more than m elements, and each pair of elements having the element-to-element distance shorter than D should (2m~1) be packed together, at m"x (n#1)/(C#1)y. Moreover, it may be desirable to allow the variance in cluster cardinality to be small. These requirements are put together to derive the following initial solution search procedure. Step 1: Sort each pair of elements in the increasing distance order, and pack every pair which has the element-to-element distance shorter than the (2x (n#1)/ (C#1)y!1)st shortest distance. Let each of such packed elements form an individual cluster, and a nonpacked single element also form an individual cluster. Let C@ denote the number of such individual clusters and k of them have more than x (n#1)/(C#1)y elements. It can easily be shown that C@'C for n)2 and 2)C(n. Sort the C@ clusters in the decreasing order of their cardinalities. Step 2: If k*C, go to Step 3. Otherwise, go to Step 4. Step 3: Merge the Cth cluster with the (C#1)st cluster to become a single cluster. C@"C@!1.
852
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
If k'C, then k"k!1. Go to Step 5. Step 4: Merge the (k#1)st cluster with the (k#2)th cluster to become a single cluster. C@"C@!1. If the cardinality of the new cluster is greater than or equal to x (n#1)/(C#1) y, then k"k#1. Step 5: If C@"C, then stop. Otherwise, sort the clusters in their cardinality order and go to Step 2.
4. Improvement of initial solution Each element of the initial solution is to be reallocated among the C clusters for any better clustering. For the reallocation, the tabu search method is to be employed to avoid any search trapping at a local optimal solution and also to make an e$cient and thorough reallocation search, such that all the elements of a current intermediate solution are reallocated so as to "nd any improved solution. Then one of the packed elements of the current solution is released from its packing relation. Thus, the reallocation and the releasing processes are sequentially operated. In an iterative manner, such that all the elements of the improved solution with such one packed element released are again reallocated among the improved C clusters (solution). 4.1. Reallocation of elements Tabu search is a metaheuristic suggested by Glover [11,12] to solve combinatorial optimization problems by introducing a memory-based strategy to prevent the solution search from becoming trapped at a local optimal solution. The heuristic either generates or starts with some initial solutions and proceeds iteratively from one solution to another one (in neighborhood) until some termination conditions are satis"ed. To improve the effectiveness of the heuristic, various intensi"cation and diversi"cation strategies have been developed [8,13,14]. One of them is the vocabulary building strategy [14]. In fact, the packing procedure functionally corresponds to the vocabulary building strategy of the tabu search. The vocabulary building strategy has been designed to identify any common attributes of a chosen set (solution) and also to search for (or generate) other solutions having the same common attributes. Similarly, the packing procedure is more likely to get any pairs having a shorter element-to-element distance to be included in the same cluster. Now, the design issues of our tabu search are brie#y described. A detail of the tabu search can be found in Ref. [8]. Mo*e. In a clustering problem, a move indicates that an element changes its cluster to another one to which it
is newly assigned. Speci"cally, each move represents a change of the form My "1Py "0 and y "0P ij ij ik y "1 for jOk j, k"1,2C, i"1,2, nN. It has been ik shown in Ref. [7] that by ignoring all the solutions of a neighborhood at some predetermined ignorance probability, the performance of the associated algorithm increases as the ignorance probability gets lowered. This implies that such element ignorance is not appropriate. Therefore, this paper does not allow any neighborhood element ignorance. There is a fundamental di!erence between the reference work [7] and ours. The reference work allows individual move such that any cluster change induced by a move is applied only to one element. However, in performing our packing operation, such a cluster change is induced by a move of the whole elements of a packedelement. That is, packed-element-together move occurs so as to make a drastic move in the solution search. Therefrom, several advantages can be gained. The "rst advantage is that a greater improvement in each intermediate solution, if any, is possible so as to get a faster approach to any anticipated good solution space. The second advantage is that the drastic move makes it possible to search a broader area of the solution space at each search iteration. Another important characteristic of the packing is that it can contribute to reduce the solution space. This is done by its helping to remove any non-interesting solution (regarded to be bad) from further consideration in the search e!ort. In fact, it is desired to disregard any solution in which any two elements having the short element-to-element distance do not belong to the same cluster. This way, the packing can help a lot to search for any better solution space more thoroughly (conforming to an intensi"cation strategy). Tabu list. In our algorithm, the tabu list contains each element index and its cluster index. For example, a tabu list element (i, j) indicates that at some iteration, any move for changing element i from its belonging cluster k(Oj) to another cluster j is prohibited because it is in a tabu state. The advantage of such tabu lists is in memory saving. In fact, for the integer data type, each element of a tabu list needs only 16 bits for computer programming, so that the tabu list is not a burden on computer memory. In this paper, the size of the tabu list is determined dynamically to reduce the risk of trapping at cycling [15]. That is, the tabu list size is changed to a random integer number in the range between 7 and n after n iterations. Secondary tabu list. In the clustering problem, there exist two types of cycle to worry about. One is the ordinary cycle appearing as the replication of the same order of mathematical solutions. The other one is the cycle associated with the labels attached to each cluster. The second type of cycle appears as the replication of
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
the same order of clusterings, but not as that of the mathematical solutions. This is because the label assigned to each cluster is used only to distinguish one cluster from the others. That is, in the mathematical formulation, any di!erent label attached to the same cluster may be treated as a di!erent solution. For example, consider the instance with n"4 and C"2. The solution > "(y "y "y "y "1, y " 1 11 21 32 42 12 y "y "y "0) and the solution > "(y " 22 31 41 2 12 y "y "y "1, y "y "y "y "0) mean 22 31 41 11 21 32 42 the same clustering, where elements 1 and 2 constitute one cluster, and elements 3 and 4 constitute another. However, in terms of mathematical view they are di!erent solutions. From this respect, the solution space of the mathematical formulation is over-expanded C! times more than the real solution (clustering) space. In the above example, if the search process starts from > and 1 reaches to > , it will form a cycle in terms of clustering, so 2 that the search process will move back again from > to 2 > . This trajectory will then be repeated. The tabu list 1 described in the preceding section has the role of preventing the "rst type of cycle, but it cannot prevent the second type (clustering) of cycle. In fact, the second type of cycle was experienced very often during the experiment work of this paper. Therefore, it is desired to design some technique to prevent the associated search from becoming trapped at the second type of cycle. This paper uses the secondary tabu list [16] to escape from the second type of cycling trap. The sum of the Euclidean distances within each cluster of any current solution forms a vector recorded as an element of the secondary tabu list. All the C values in such an element are then sorted in the decreasing order. The sum of the Euclidean distances within a cluster is not altered unless the elements composing the cluster are changed. Moreover, because the values (representing each cluster) are sorted in their size, it is not necessary to keep the label attached to each cluster. This makes it possible to prevent from the second type of cycle. The role of the secondary tabu list is similar to that of the ordinary tabu list. At some search iteration, when a solution has the least objective function value, among the solutions in neighborhood, but it is not restricted by the tabu list, the algorithm is initiated to check whether or not the solution is contained in the secondary tabu list. If it is contained in the secondary tabu list, it cannot be any candidate for the next solution. Aspiration condition. Since the tabu list restricts some moves, there is a risk even to restrict a good move. In order to get rid of this risk, the solution can be selected as the candidate for next solution, if the solution in a tabu state satis"es the aspiration condition. Accordingly, this algorithm considers the aspiration condition of solution f (y)(f . This implies that if the objective function .*/ value of solution y is less than the current minimum
853
objective function value, the solution is set free from the tabu and can rather be the candidate for the next solution. Stoppinu condition. The reallocation procedure ends after a pre-determined number of iterations. In the experimental test, the number of iterations is set to 1000/M2 ) (x (n#1)/(C#1)y!1)N to compare with the algorithms in the literature. 4.2. Releasing procedure As stated above, the packed-element-together move has some advantages. However, it may have such a drawback as a missing of some good solutions. Because the elements of a packed element move together, the solution space may not be fully searched to "nd any better solution. To compensate for such a drawback, this paper proposes a strategy of releasing such packing relations in an approach of releasing the packed pairs one at each iteration of the reallocation operation. In the approach, the last ranked (most distant) pair is "rst selected as a released pair whose elements are allowed separately to belong to di!erent clusters. In other words, the released elements are allowed to move separately. Thus, any solution space reduced by the packing operation can rather be expanded by the releasing operation (conforming to a diversi"cation strategy).
5. Step-by-step solution procedure The proposed algorithm can be characterized as designed to improve the e!ectiveness of the tabu search by incorporating both the packing and releasing procedures together such that the packing procedure tends to play a role of focusing the solution search on good solution spaces and the releasing procedure tends to play a complementary role of making an intensive search on the focused solutions spaces. The whole algorithm is now constructed below in a step-by-step procedure. Step 0: Sort k pairs of elements in the increasing distance order, where k"2(x(n#1)/(C#1)y!1). Step 1: Find the initial solution (refer to Section 3.4). Step 2: Search for an improved solution by keeping the k pairs packed (refer to Section 4.1.). Step 3: If k"0, stop, and the best solution up to now becomes the "nal solution. Otherwise, go to Step 4. Step 4: Release the kth packed pair. k"k!1. Let the initial solution be the best solution found up to now. Go to Step 2.
854
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
6. Experiment In this paper, four algorithms are numerically tested for their e!ectivenesses with 360 data sets. They include the K-means algorithm [17], the simulated annealing algorithm by Selim [6], the tabu search algorithm by Al-Sultan [7], and the proposed algorithm of this paper in Section 5. They are implemented in C language on a Pentium II 400MHz Personnel Computer.
6.1. Data set Since the e!ectivenesses of any clustering algorithms are usually dependent on data set, how to design their test data set is important as well. In the literature, some authors [18,19] have paid attention to their strategies of designing test data sets. In this paper, the design strategy of Milligan [19] is adapted to generate our test data set by incorporating "ve design factors including number of elements, number of clusters, number of dimensions (attributes), ratio of cardinality, and error type. To design the data set, the boundaries for each dimension of the cluster is pre-determined. In this data set, no overlap is permitted. In order to satisfy the non-overlapping restriction, any cluster overlap is not permitted on the "rst dimension of the space. That is, the clusters are required to occupy disjoint regions of space. Then, all the elements assigned to a given cluster are required to fall within the boundaries for each dimension of the variable space. Two types of data sets are considered, one having 50 elements and the other having 100. Four di!erent clusters including 2, 3, 4, and 5 clusters are considered. Each element is allowed to vary in dimension such as to have four, six, and eight attributes. The cardinality factor is allowed to vary in element weight to have three di!erent distribution patterns. In the "rst pattern, all elements are distributed among clusters as equally as possible. In the second and third patterns, one cluster in the data set has 10 and 60% of all the elements, respectively, while the other clusters have the rest of the elements being equally distributed among them. Five di!erent types of error are
considered. The "rst type of error represents an error-free data set. The second and third type of errors are concerned with some random noise elements added to the error-free data set. The second type error data set has 10 such random noise elements, and the third type of error data set has 20 such random noise elements. Each of the random noise elements has the same K attributes as the ordinary (error-free) input element, but it is like a random error in the sense that it is not clearly dedicated to any speci"c cluster. The fourth and "fth type errors are concerned with random noise attributes (dimensions) added to each error-free element so as to have additionally one and two more dimensions, respectively. An example of the data generation procedure is illustrated in Fig. 1. At the "rst step, the boundaries for each dimension of the cluster is set. Then, the elements considering the "rst four design factors are allocated within the pre-determined boundaries. At the last step, the error types are considered. Fig. 1d shows the test data set having 50 elements, three clusters, two attributes, equal cardinality, and second type of error.
6.2. Results It is known that the e!ectivenesses of the K-means algorithm, the simulated annealing algorithm, and the tabu search algorithm are all dependent greatly on initial solutions. Therefore, for every data set, those three algorithms are performed 10 times individually for their own e!ectiveness tests, each time with a new initial solution generated randomly from the data set. However, our proposed algorithm is tested only one time, since it has a "xed procedure to generate its own initial solution. Each test is made of at most 1000 iterations of the associated solution search procedure by each of the four di!erent algorithms giving an equal chance for a fair performance comparison among the four algorithms. The results show that the proposed algorithm of this paper "nds a solution with the objective function value less than or equal to the mean of the objective function values obtained from 10 test runs by each of the other three algorithms. It is also seen that the tabu search
Fig. 1. Generating test data set.
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
algorithm outperforms the other two algorithms including the K-means algorithm and the simulated annealing algorithm, while the K-means algorithm and the simulated annealing algorithm show similar performances. This implies that our proposed algorithm shows its superiority in e!ectiveness to that of the tabu search algorithm. In the above performance comparison, 10 test results are gathered, by use of a given data set, and averaged for each reference algorithm. In other words, multiple samples are gathered for each reference algorithm. Therefore, it may be desirable to investigate how statistically con"dent the superiority of the proposed algorithm is. This provides a motive of making a statistical hypothesis test for the three reference algorithms with the following null hypothesis; H : the objective function value found by each of those 0 reference algorithms is equal to that of our proposed algorithm However, the reference algorithms are individually performed only 10 times and thier resulting objective function values are all discrete, so that the objective values are unlikely distributed in a Gaussian distribution. Thus, this paper wants to apply the Wilcoxon rank sum test, a non-parametric test. The SAS package is used for the test, and its results are summerized in from Tables 1}6. The values in those tables denote the number of data sets in which the hypothesis is rejected at the con"dence level 0.99 (rejected at 0.99), rejected at the con"dence level 0.95 but accepted at the con"dence level 0.99 (accepted at 0.99), accepted at the con"dence level 0.95 (accepted at 0.95), respectively. The rejection at the con"dence level 0.99 means that the solution found by the associated test algorithm gives a strong evidence of having a greater (not better) objective function value than the solution found by the proposed algorithm. According to Table 1, the number of data sets rejected at 0.99 by the tabu search algorithm is apparently smaller
855
Table 1 The results of the statistical hypothesis test on the whole 360 data sets generated by considering all the design factors together Algorithm
K-means S.A. T.S.
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
224 224 71
47 47 43
89 89 246
than each of those by the K-means algorithm and the simulated annealing algorithm. This means that the solutions obtained by the tabu search algorithm in general have a tendency of being closer to the solutions obtained by the proposed algorithm than those by the K-means algorithm or the simulated annealing algorithm. The test results summarized in Tables 2}6 show the e!ects of the design factors. Table 2 shows that as the number of elements increases, the performance of the tabu search algorithm gets lowered, and the performance gap between the tabu search algorithm and the other two algorithms decreases. That is, as the number of elements increases, the proposed algorithm shows a more robust result than the tabu search algorithm. Table 3 shows that the number of data sets rejected at 0.99 increases as the number of clusters increases. Thus, we can infer from Table 3 that an increased number of clusters may induce a larger di!erence between the solution of the proposed algorithm and those of the other algorithms. Moreover, we can infer that the proposed algorithm may be more robust against any change in the number of clusters. Tables 4 and 5 show the e!ects of the number of dimensions and the ratio of cardinality. In both the tables, the solutions of the tabu search algorithm are closer to those of the proposed algorithm than any of those of the K-means algorithm and the Simulated Annealing algorithms. However, no
Table 2 The results of the statistical hypothesis test on the whole 360 data sets divided only by the `Number of Elementsa factor Number of elements
Algorithm
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
50
K-means S.A. T.S.
107 128 29
23 11 22
50 41 129
100
K-means S.A. T.S.
117 96 42
24 36 21
39 48 117
856
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
Table 3 The results of the statistical hypothesis test on the whole 360 data sets divided only by the `number of clustersa factor Number of clusters
Algorithm
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
2
K-means S.A. T.S.
12 8 2
8 5 0
60 77 88
3
K-means S.A. T.S.
54 53 13
24 29 12
12 8 65
4
K-means S.A. T.S.
75 80 22
13 9 16
2 1 52
5
K-means S.A. T.S.
83 83 34
2 4 15
5 3 41
Table 4 The results of the statistical hypothesis test on the whole 360 data sets divided only by the `number of dimensionsa factor Number of dimensions
Algorithm
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
4
K-means S.A. T.S.
73 81 24
19 11 14
28 28 82
6
K-means S.A. T.S.
74 72 22
15 17 17
31 31 81
8
K-means S.A. T.S.
77 71 25
13 19 12
30 30 83
Table 5 The results of the statistical hypothesis test on the whole 360 data sets divided only by the `ratio of cardinalitya factor Ratio of cardinality
Algorithm
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
Equal cardinality
K-means S.A. T.S.
84 71 15
13 29 13
23 30 92
10% in one cluster
K-means S.A. T.S.
78 75 16
14 17 18
28 28 86
60% in one cluster
K-means S.A. T.S.
62 78 40
20 11 12
38 31 68
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
857
Table 6 The results of the statistical hypothesis test on the whole 360 data sets divided only by the `error-typea factor Type of errors
Algorithm
Number of data sets Rejected at 0.99
Accepted at 0.99
Accepted at 0.95
First-type error
K-means S.A. T.S.
28 41 11
7 11 6
37 20 54
Second-type error
K-means S.A. T.S.
52 46 12
8 12 6
12 14 54
Third-type error
K-means S.A. T.S.
44 46 19
15 11 10
13 15 43
Fourth-type error
K-means S.A. T.S.
42 42 15
10 10 8
20 20 49
Fifth-type error
K-means S.A. T.S.
47 49 14
7 3 12
18 20 46
peculiar trend associated with the design factors is shown. Table 6 shows the e!ect of the error types. The solutions of the "rst error type (error-free data set) of each algorithm show slightly closer to the solution of the proposed algorithm than those of the other error types of each algorithm. It means that the error perturbation deteriorates the e!ectiveness of each algorithm, so that the proposed algorithm is more robust in terms of error perturbation. The average elapsed time for each algorithm is shown in Table 7. It shows that the proposed algorithm takes the longest time. This may be due to the additional time requirement for updating the packing relation. As discussed earlier, the number of the packing relation updatings depends on the number of the releasing operations. The elapsed time may be shortened by reducing the number of the releasing operations rather than by releasing all the packed elements as done in this paper. Such releasing-operation reduction is left as a further research issue.
7. Conclusion This paper proposes an e$cient heuristic algorithm for a clustering problem by employing the tabu search method which is combined with two newly exploited procedures (called packing and releasing procedures) for both the solution search e$ciency and e!ectiveness.
Table 7 The average elapsed time of each algorithm (s) Our algorithm 2.893
T.S. 0.856
S.A. 0.782
K-means 0.169
Moreover, a secondary tabu list is considered to prevent the associated search from becoming trapped at any local optimal solution. The heuristic algorithm is numerically tested for its e!ectiveness and shown that it outperforms over all the three reference works including the tabu search algorithm, the K-means algorithm, and the simulated annealing algorithm. The results of this paper may be immediately applied to various information data analyses concerned with information classi"cation and pattern recognition. For example, the proposed algorithm may be applied to various practical classi"cation subjects including patient classi"cation, product distribution center allocation, and government service branch organization. Moreover, it may be feasible to use the results of this paper as a guideline for designing any decision boundary (classi"er). An e$cient adaptation of the proposed algorithm for any situation of allowing the number of clusters to be another decision variable may be an interesting subject for further study.
858
C.S. Sung, H.W. Jin / Pattern Recognition 33 (2000) 849}858
References [1] J.A. Hartigan, Clustering Algorithms, Wiley, New York, 1975. [2] S.Z. Selim, M.A. Ismail, K-means-type algorithms: a generalized convergence theorem and characterization of local optimality, IEEE Trans. Pattern Anal. Mach. Intell. 6 (1984) 81}87. [3] P. Brucker, On the complexity of the clustering problem, Optimierung und operations research, Lecture notes in Economics and Mathematical Systems, Springer, Berlin, 1978. [4] W.S. DeSarbo, R.L. Oliver, A. Rangaswamy, A simulated annealing methodology for clusterwise linear regression, Psychometrika 54 (1989) 707}736. [5] R.W. Klein, R.C. Dubes, Experiments in projection and clustering by simulated annealing, Pattern Recognition 22 (1989) 213}220. [6] S.Z. Selim, K.S. Al-Sultan, A simulated annealing algorithm for the clustering problem, Pattern Recognition 24 (1991) 1003}1008. [7] K.S. Al-Sultan, A tabu search approach to the clustering problem, Pattern Recognition 28 (1995) 1443}1451. [8] F. Glover, M. Laguna, Tabu search, in: C. Reeves (Ed.), Modern Heuristic Techniques for Combinatorianl Problems, Blackwell Scienti"c Publishing, Oxford, 1993, pp. 70}141.
[9] H.D. Vinod, Integer programming and theory of grouping, J. Am. Statist. Assoc. 64 (1969) 506}519. [10] M.R. Rao, Cluster analysis and mathematical programming, J. Am. Statist. Assoc. 66 (1971) 622}626. [11] F. Glover, Tabu search-part I, ORSA J. Comput. 1 (1989) 190}206. [12] F. Glover, Tabu search-part II, ORSA J. Comput. 2 (1990) 4}32. [13] F. Glover, E. Taillard, D. Werra, A user's guide to tabu search, Ann. Oper. Res. 41 (1993) 3}28. [14] F. Glover, Tabu search and adaptive memory programming } advances, applications and challenges, in: Barr, Helgason, Kennington (Eds.), Interfaces Comput. Sci. Oper. Res., Kluwer Academic Publishers, 1996. [15] J. Skorin-Kapov, Extensions of a tabu search adaptation to the quadratic assignment problem, Com. O.R. 21 (1994) 855}865. [16] M. Gendreau, P. Soriano, L. Salvail, Solving the maximum clique problem using a tabu search approach, Ann. Oper. Res. 41 (1993) 385}403. [17] R.O. Duda, P.E. Hart, Pattern Classi"cation and Scene Analysis, Wiley, New York, 1973. [18] K.S. Al-Sultan, M. Maroof Kahn, Computational experience on four algorithms for clustering, Pattern Recogn. Lett. 17 (1996) 295}308. [19] G.W. Milligan, An examination of the e!ect of six types of error perturbation on "fteen clustering algorithms, Psychometrika 45 (1990) 325}342.
About the Author*CHANG SUP SUNG received the M.S. degree in Industrial Engineering from Iowa State University, USA, in 1974 and the Ph.D. in Industrial Engineering from Iowa State University, USA, in 1978. He is a professor of Industrial Engineering at Korea Advanced Institute of Science and Technology (KAIST) in Taejon, Korea. His areas of research interests include Clustering, Production Planning and Control, Production Scheduling, Telecommunication Network Modeling and Evaluation, Combinatorial Optimization and Network Theory, and Logistics. About the Author*HYUN WOONG JIN received the B.S. degree in Statistics from Yonsei University in Seoul, Korea, in 1993 and the MS degree in Industrial Engineering from KAIST, Korea, in 1996. He is a student at the doctoral level at the Department of Industrial Engineering of KAIST. His research interests include Clustering, Optimization Theory, and Network Theory.
Pattern Recognition 33 (2000) 859}869
Mathematical framework to show the existence of attractor of partitioned iterative function systems Suman K. Mitra*, C.A. Murthy Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Calcutta 700035, India Received 23 September 1997; received in revised form 5 November 1998; accepted 28 January 1999
Abstract The technique of image compression using Iterative Function System (IFS) is known as fractal image compression. An extension of IFS theory is Partitioned or local Iterative Function System (PIFS) for coding the gray-level images. Several techniques of PIFS-based image compression have already been proposed by many researchers. The theory of PIFS appears to be di!erent from the theory of IFS in the sense of application domain. The present article discusses some basic di!erences between IFS and PIFS and provides a separate mathematical formulation for the existence of attractor of partitioned IFS. In particular, it has been shown that the attractor exists and it is an approximation of the given target image. The experimental results have also been presented in support of the theory. The experimental results have been obtained by using a GA-based PIFS technique proposed by Mitra et al. (IEEE Trans. Image Process. 7 (4) (1998) 586}593). ( 2000 Published by Elsevier Science Ltd. All rights reserved. Keywords: Image compression; Iterative fuction system (IFS); Partitioned iterative function system (PIFS); Attractor; Isometry
1. Introduction The theory of fractal-based image compression using Iterative Function System (IFS) was proposed by Barnsley [1,2]. He modeled real-life images by means of deterministic fractal objects, i.e., by the attractor evolved through iterations of a set of contractive a$ne transformations. Once the set of contractive a$ne transformations F (say) is obtained the rest is an iterative sequence of the form MFN(O)N , where `Oa is an initial object to N;0 start the iterative sequence. The set of contractive a$ne transformations F is called IFS. In particular, at the Nth iteration, the object O is used as input to the IFS, where N O is the output object obtained from the (N!1)th N iteration. The detailed mathematical description of the IFS theory and other relevant results are available in Refs. [1}5].
* Corresponding author. Tel.: #91-33-577-8085; fax: #9133-577-6680. E-mail address:
[email protected] (S.K. Mitra)
Image compression using IFS can be looked upon as an inverse problem of iterative transformation theory [6]. The basic problem here is to "nd appropriate contractive a$ne transformations whose attractor is an approximation of the given image. Thus, for the purpose of image compression it is enough to store the relevant parameters of the said transformations instead of the whole image. This technique reduces the memory requirement to a great extent. But the question is how to construct the transformations for a given image. A fully automated fractal-based image compression technique of digital monochrome image was "rst proposed by Jacquin [6}8]. This technique is known as partitioned [9] or local [2] iterative function system. The partitioned/local IFS is an IFS where the domain of application of the contractive a$ne transformations is restricted to the small portions of the image (subimages) instead of the whole image as in the case of IFS. In PIFS the given image is "rst partitioned into nonoverlapping square blocks. In the encoding process, separate transformations for each square block is then found out on the basis of its similarity with other square blocks, located any where
0031-3203/00/$20.00 ( 2000 Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 0 9 8 - 9
860
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
within the image support. As the transformations are applied partition wise the scheme is known as partitioned IFS or local IFS. Di!erent schemes, using PIFS, have been proposed by several other researchers [9}12]. The theory of partitioned/local IFS appears to be di!erent from the theory of IFS in the sense of restriction of the application domain for the contractive a$ne transformations. So, the questions are, how PIFS produces an attractor and how it becomes a close approximation of the given target image. Generally, PIFS is considered to be a simple extension of IFS and it is assumed that the theoretical foundation of PIFS is same as that of IFS. In reality IFS and PIFS techniques di!er widely. The present article discusses some basic di!erences between IFS and PIFS in the context of image compression and provides a separate mathematical formulation for the existence of attractor of partitioned IFS. In particular, "rstly we have shown that the transformations, in the PIFS scheme, give rise to a "xed point (attractor). Secondly, it has been shown that the transformations, though not exactly contractive, are eventually contractive. Finally, we have proved the attractor and the given image are very close to each other in the sense of a chosen distance measure. In the next section we have described the theory of image coding using IFS. Section 3 consists of basic features of constructing PIFS codes for a given image. Section 4 deals with the basic di!erence between IFS and PIFS techniques. The proposed mathematical formulation of PIFS has been discussed in Section 5. The experimental results have been presented in Section 6 and the conclusions are drawn in Section 7.
2. Theoretical foundation of IFS The salient features of IFS theory and image coding through IFS are given below. An elaborate description of this methodology is available in Ref. [1]. Let (X, d) be a complete metric space, where X is a set and d is a metric. Let f be a contractive map de"ned on the metric space (X, d) such that f : XPX and d( f (x ), 1 f (x )))s d(x , x ); ∀x , x 3X, where 0)s(1 is called 2 1 2 1 2 contractivity factor of the map f. Note that, lim f N(x)"a, ∀x3X, and also f (a)"a. `aa is called N?= "xed point of f. Here f N(x) is de"ned as f N(x)"f ( f N~1(x)), with f 1(x)"f (x), ∀x3X. Now let H(X) be the space of all non empty compact subsets of X. Let D denote Hausdor! metric de"ned on H(X). It can be shown that (H(X), D) is a complete metric space. Let f be a contractive map on (H(X), D). Then also lim f N(B)"A, ∀B3H(X) and f (A)"A. Here also N?=
f N(B) is de"ned as f N(B)"f ( f N~1(B)), with f 1(B)"f (B), ∀B3H(X). Now, let us consider n contractive maps f , f ,2, f with 1 2 n contractivity factors s , s ,2, s , on (H(X), D). Let for 1 2 n any B3H(X), n B " Z f (B) 1 i i/1 and n B " Z f (B ) ∀N*2. N i N~1 i/1 Then it can be shown that there exists a set C3H(X) such that n Z f (C)"C and lim B "C, ∀B3H(X). i N N?= i/1 C is called the attractor of the IFS (H(X); f , f ,2, f ). 1 2 n Let for any B3H(X), n W(B)" Z f (B), i i/1 and WN(B)"W(WN~1(B)), ∀N*2 where W1(B)"W(B), ∀B3H(X). Note that D(W(B ), W(B )))s D(B , B ); ∀B , B 3H(X) 1 2 1 2 1 2 where 0)s(1
(1)
and s"MaxMs , s ,2, s N. 1 2 n Collage Theorem (Barnsley [1]). Let I3H(X) and e'0. Let there exists n contractive maps f , f ,2, f , with con1 2 n tractivity factors s , s ,2, s , from H(X) to H(X) such 1 2 n that D(I, W(I)))e.
(2)
Then e D(I, C)) , 1!s
(3)
where s"maxMs , s ,2, s N and C is the attractor of 1 2 n MH(X); f , f ,2, f N. 1 2 n The above theory is given for contractive maps de"ned on complete metric spaces. The space X under consideration in this article is a subset of the three-dimensional Euclidian space R3. One can de"ne a$ne contractive maps f , f ,2, f on this X and these maps become con1 2 n tractive maps on (H(X), D). In this article, hence forth, we shall be dealing with a$ne contractive maps.
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
Now, let I3H be a given set and our intention here is to "nd a set W of a$ne contractive maps in such a way that the distance between the given set and the attractor of W is very small. Thus, by Collage theorem, the given set I can be approximated by the attractor of W. From Eq. (3) it is clear that, after a su$ciently large number (N) of iterations, the set of a$ne contractive maps W produces a set C belonging to H(X) and it is very close to the given original set I. Here, (H(X), W) is called iterative function system and W is called the set of fractal codes for the given set I. In the context of image coding, the given set I is the given digital monochrome image. Note that any digital image I with w rows, w columns and gray-level value as g(i, j)'s (g(i, j) is the gray-level values of the ith row and jth column pixel) can be represented by I" M(i, j, g(i, j)): i"1, 2,2, w; j"1, 2,2, wN. Thus, any image I is a subset of three-dimensional Euclidian space R3 and hence the theory stated above is applicable for images. In the context of digital monochrome image coding, the scheme suggested by Jacquin [7] is called partitioned or local Iterative Function System (PIFS). In the next section, the construction of PIFS codes has been described.
3. Construction of PIFS codes The structure of PIFS codes is almost same as that of IFS codes. The only di!erence is that PIFS codes are obtained and applied to a particular portion of the image instead of the whole image. Let I be a given digital image having size w]w and the range of gray-level values be M0, 1, 2,2, l!1N. Thus the given image I can be expressed as a matrix ((g(i, j))) , wCw where i and j stand for row number and column number, respectively, and g(i, j) represents the gray-level value for the position (i, j). The image is partitioned into n non overlapping squares of size, say b]b, and let this partition be represented by MN"R , R ,2, R N. Each R is 1 2 n i named as range block. Note that the number of range blocks n"w/b]w/b. Let M be the collection of all possible blocks of size 2b]2b in the image. Let M"MD , D ,2, D N. Here m"(w!2b)](w!2b) 1 2 m and D 's are named as `domain blocksa. j Now, let us de"ne, A"M1, 2,2, wN]M1, 2,2, wN]M0, 1, 2,2, l!1N. Here ALR3. Note that any image I is a subset of A but any subset of A is not necessarily an image. Also R LA; ∀i and D LA; ∀j. i j Let, for a range block R , i F "M f : D PA; f is an a$ne contractive mapN. j j
861
Let, f 3F be such that i@j j o(R , f (D )))o(R , f (D )) ∀ f3F , ∀j. i i@j j i j j Here o is a suitably chosen distance measure (a detailed description of o is given at the end of this section). Now let k be such that o(R , f (D ))"min Mo(R , f (D ))N. (4) i i@k k i i@j j j Also, let f (D )"RK . We shall denote f by f z . i@k k i@k i@k i@ The aim here is to "nd f (D ) for each i3M1, 2,2, nN. i@k k In other words, for every range block R , one needs to i "nd an appropriately matched domain block D as well k as an appropriate transformation f . The set of maps i@k F"M f z, f z ,2, f zN thus obtained is called the par1@ 2@ n@ titioned or local IFS or fractal codes of image I. To "nd the best-matched domain block as well as the best-matched transformation, all possible domain blocks as well as all possible transformations are to be searched with the help of Eq. (4). The problem of searching for an appropriately matched domain block and transformation for a range block has already been solved by enumerative search [7] and by using Genetic Algorithms [10]. The a$ne contractive transformation f z is constructed i@ using the fact that the gray values of the range block are a scaled, translated and rotated version of the grayvalues of domain block. The contractive a$ne transformation f is de"ned in such a way that f (D ) is close i@j i@j j to R . Also f can be separated into two parts, one for i i@j spatial information and the other for information of gray-values. The "rst part indicates which pixel of the domain block corresponds to which pixel of range block. The second part is to "nd the scale and shift parameters for the set of pixels of the domain blocks to the range blocks. The "rst part is shu%ing the pixel positions of the domain block. Generally, eight transformations (isometries) are considered for this purpose [10,11]. On the other hand, second part is the estimation of a set of values (gray-values) of range blocks from the set of values of the transformed domain blocks. These estimates can be obtained by using the least-squares analysis of two sets of values [10,11]. The second part is obtained using least-squares analysis of two sets of gray-values once the "rst part is "xed. Moreover, the size of a domain block is double that of any range block. But, the method of least squares (straight line "tting) needs point to point correspondence. To overcome this, one has to construct contracted domain block such that the number of pixels in a contracted domain block becomes equal to that of any range block. The contracted domain block is obtained by adopting any one of the following two techniques. In the "rst technique, as shown in Fig. 1, for a 4]4 domain block, the average values (integers) of four pixel values in 2]2
862
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
Fig. 1. Construction of contracted domain block: Scheme 1.
nonoverlapping squares within the domain block are considered as the pixel values of the contracted domain block. In this scheme, row number and column number corresponding to each pixel value of the contracted domain block are equal to the row number and column number of the topmost pixel value in every 2]2 square considered within the domain block [7]. In the other scheme, as shown in Fig. 2, for a 4]4 domain block, contracted domain block is constructed by taking pixel values along with the row number and column number from every alternative rows and columns of the domain block [10,11]. Now to select an appropriately matched domain block (D ) and appropriately matched transformation ( f ) for k i@k a range block (R ), the distance measure `oa plays an i important role. The distance measure `oa [used in Eq. (4)] is taken to be the simple root mean-square error (RMSE) between the original set of gray-values and the obtained set of gray-values of the concerned range block. The map from D to R is constructed in such a way that k i the pixel positions of R and RK are same. The di!erence i i@k between R and RK are found only in the gray-level i i@k values of the pixels. Let R (p, q) and RK (p, q) be, respeci i@k tively, the original and the obtained values of the (p, q)th pixel for the range block R of size b]b. The distance i measure o is then computed as
S
o(R , RK )" i i@k
1 1 + + MR (p, q)!RK (p, q)N2. i@k i b pb q
The RMSE is a metric and it serves the purpose of a distance measure. Note that the same measure had been used in several articles [7,9}12].
Fig. 2. Construction of contracted domain block: Scheme 2.
4. How PIFS technique di4ers from IFS Extension of the iterative function system concept results in the partitioned iterative function system. PIFS mainly di!ers from IFS in the domain of application of their respective transformations. In PIFS the transformations are not applied to the whole image, as in the case of IFS, but rather have restricted domains. In every PIFS, the information on every transformation f should coni tain the location of the domain block to which f is i applied. The di!erence in the domain of application of the two techniques is shown in Figs. 3 and 4. In Fig. 4, three a$ne contractive transformations are applied on the image I to result in an image which consists of three parts I , 0 01 I and I . These three transformations are then applied 02 03 sequentially to result in a "xed point. The set of transformations M f , f , f N is called the IFS, and the "xed 1 2 3 point of this set of transformations in this case is Sierspinski gasket [1]. In PIFS, contrary to the above, as shown in Fig. 2(b) the map f is applied to the domain D to result in RK ; i@j j i which is an estimate of R . In the next iteration, this i estimate (RK ) is not used as the domain to the map f . In i i@j particular, in the next iteration an estimate of Dj is used as the domain to obtain improved estimate, RY , of i R . Note that a domain block includes many other range i blocks or part of them (Fig. 4). So, the estimate of D conj sists of several other estimated range blocks or part of them. PIFS di!ers from IFS not only in the application domain but also in the case of selecting distance measure. In, IFS, the Hausdor! distance, which is a metric, is
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
863
In the next section we have described the mathematical formulation of attractor of PIFS, where the meaning of eventually contractive maps has been de"ned.
5. Mathematical formulation of PIFS
Fig. 3. Mapping for an IFS scheme.
Fig. 4. Mapping from domain blocks to range blocks in PIFS scheme.
considered as the distance measure. But in PIFS, RMSE is considered as the distance measure and RMSE is not a metric. Even then RMSE serves the purpose of selecting an appropriately matched map and domain block for a range block. As our purpose is to measure the closeness of the set of pixel values of the concerned range block and the scaled, translated and rotated version of the set of pixel values of appropriately matched domain block, selection of RMSE as distance measure is su$cient. But it needs to be examined whether the attractor, if it exists, of PIFS is close to the given image, considering RMSE as distance measure. Another important and signi"cant di!erence of PIFS and IFS lies in the context of contractivity factor of the transformations. For an IFS with an expansive map f , i the set of maps will not converge to a compact "xed point. The expansive part will cause the limit set to be in"nitely stretched along some direction. This is not necessarily true for a PIFS which can contain expansive transformations and still have a bounded attractor. So, it is not necessary, in PIFS, to impose any contractivity condition on all the transformations. A su$cient contractivity requirement is that the set of transformations F be eventually contractive [9]. Fisher et al. [9] have shown experimentally that maximum allowable value of s (contractivity factor) can be 1.5 ('1). Also they have shown that this maximum value of s, for a particular image, yields minimum distortion between the original image and the attractor evolved through the iterative process of the eventually contractive transformations.
In this section we have proposed a mathematical formulation of PIFS. To make it convenient we have divided our tasks into three stages. Firstly, it has been shown that the PIFS code (F) possesses a "xed point or attractor in iterative sequence. Secondly, the eventual contractivity of the maps in PIFS setup has been proved. Finally, it has been shown that the given image and the attractor are very close to each other in the sense of a chosen distortion measure which is root mean square (RMSE). Let, I be a given image having size w]w and the range of gray-level values be M0, 1, 2,2, l!1N. For this given image we can construct a vector x whose elements are the pixel values of the given image 6 I. Note that there are w2 pixel values of I. Thus, x"(x , x , x ,2, x 2){ 1 2 3 w 6 is the given image where x is the pixel value correspond1 ing to the (1, 1)th position of I. Likewise, let x be the pixel r value corresponding to the (i, j)th position of I, where, r"(i!1)w#j, 1)i, j)w. In this setup PIFS can be viewed as following. There exists an a$ne (linear), not necessarily strictly contractive, map for each element of x and this map is called forward map of the element. In 6 the process of iteration, the input to a forward map will be any one of the w2 elements of x and the map is called backward map for 6 this input element. Thus, for each element of x there exists a forward map and an element of x can have6 one or more or no backward map(s). The set6 F, of forward maps, is called the PIFS codes of I. Let us de"ne a set P as P"M0, 1, 2,2, l!1N. Now consider the set S where, S"Mx D x"(x , x , x ,2, x 2){, x 3PN. 1 2 3 w i 6 6 S is the set of all possible images. The given image I is surely an element of S, i.e. I3S. The PIFS codes F can be looked upon as F : SPS. The attractor of F, a (say), if exists, will also belongs to S. 6 show the existence of a. So, the "rst task is to 6 Let f be the forward map for a particular element x 1, 1 r where r "(i !1)w#j . Also let this element be map1 1 1 ped from the element x 2 (r "(i !1)w#j ). Thus f is r 2 2 2 1 the backward map for x 2. Again x 2 is being mapped r r
864
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
from x 3, (r "(i !1)w#j ) with a forward map f . r 3 3 3 2 Thus we have a sequence of maps for the element x 1 as r follows: fm~1 1 2 3 (i , j ) $f& (i , j ) $f& (i , j ) $f& 2 $& (im, jm); 2 2 3 3 1 1 m)(w2!1).
(5)
The above sequence will be stopped at (i , j ) if m m (i ,j )"(i , j ) for k"0 or 1 or 2 or 2 or m. (6) m`1 m`1 k k The stopping phenomenon of this sequence is mandatory as there are "nite number (w2) of elements in x. More6
Case 2: m'0 and k"m. Here (i ,j ) "(i , j ). It m`1 m`1 m m implies that (i , j ) is mapped into itself with a map m m f "a x#b ; x3P and 0)a (1. Thus (i , j ) will m m m m m m converge to b /(1!a ). Once (i , j ) is "xed at m m m m b /(1!a ), the element (i ,j ) will be "xed at m m m~1 m~1 a b !a b #b a b m m~1 m~1. m~1 m#b " m~1 m m~1 1!a 1!a m m In this case the forward map is f " m~1 a x#b ; 0)x)g. Again (i ,j ) is "xed m~1 m~1 m~1 m~1 implies convergence of (i ,j ) with forward map m~2 m~2 f "a x#b ; x3P, at m~2 m~2 m~2
a a b !a a b #a b !a b #b m~2 m~1 m m~2 m m~1 m~2 m~1 m m~2 m~2. 1!a m Proceeding in this way, the "xed point of (i , j ) is found 1 1 out to be a a 2a b #(a a 2 a b #a a 2 a b #2#a a b #a b #b )(1!a ) 1 2 m~1 m 1 2 m~2 m~1 1 2 m~3 m~2 1 2 3 1 2 1 m. 1!a m over, all the elements of x possess same type of sequence 6 in PIFS codes. Thus it is enough to show that the element x 1 has got a "xed point in the process of iteration and r this will lead to prove the existence of a (attractor of x). 6 6 It is clear from sequence (5) that during the iterative process the element x 1 will have a "xed point once the r element x 2 is "xed. Again the convergence (to a "xed r point) of the element x 3 con"rms the convergence of r the element x 2 and likewise for the rest of the elements. r Thus convergence of the last element of the sequence implies the convergence of the rest of the elements. The convergence of the last element of the sequence is possible in four di!erent ways according to the stopping condition (6). An important point to be noted in this context is the problem of discretization. To get the decoded image in an iterative process using PIFS codes one need to discretize the output. This can be done in two ways. One is discretization of the output in each iteration. Another is discretization at the end of the iterative process. The iterative process is stopped whenever there is no change in grayvalues in two successive iterations. To prove the convergence of the elements in four di!erent ways we have used the discretization of the second type. Case 1: m"1. Here (i , j )"(i , j ). It implies that 2 2 1 1 (i , j ) is mapped into itself with a map f . Here 1 1 1 f "a x#b ; x3P and 0)a (1. Note that in this 1 1 1 1 case the a$ne map f should necessarily be a strictly 1 contractive map otherwise the element will not converge to a "xed point. If we start with any value (x3P) of (i , j ), the element will converge to the "xed point 1 1 b /(1!a ). 1 1
Note that in this case the a$ne map f should necessarily m be contractive in strict sense. But the rest of the maps need not be strictly contractive. The eventual contractivity, associated with the element x 1"(i , j ), will be 1 1 r s 1"<m a . r i/1 i Case 3: m'0 and k"1. Here (i ,j )"(i , j ). m`1 m`1 1 1 It implies that the starting and the last element of sequence (5) is same. This can be looked as a complete loop for the sequence. This case has been solved stepwise. First of all the case is solved for m"2, and 3. Then on the basis of these the "xed point for the case of general m is solved. Case 3(a): m"2. Here we have only two elements viz. (i , j ) and (i , j ). The element (i , j ) is being mapped 1 1 2 2 1 1 from the element (i , j ) by the a$ne map 2 2 f "a x#b ; x3P. On the other hand, the element 1 1 1 (i , j ) is being mapped from (i , j ) by the a$ne map 2 2 1 1 f "a x#b ; x3P: 2 2 2 1 2 (i , j )$f& (i , j ) $f& (i , j ). 1 1 2 2 1 1
Let x be the starting value of (i , j ) and y be the starting 1 1 value of (i , j ). After "rst iteration the values of (i , j ) 2 2 1 1 and (i , j ) will be a y#b and a x#b , respectively. 2 2 1 1 2 2 Again after second iteration these will be a a x# 1 2 a b #b and a a y#a b #b , respectively. Proceed1 2 1 2 1 2 1 2 ing this way after in"nite (practically large but "nite) number of iterations, the "xed point of (i , j ) and (i , j ) 1 1 2 2 will be independent of x and y. The Coe$cients of x and y after N (even) iterations will be (a a )N@2 which tends to 1 2 zero as N tends to in"nity. The "xed points of (i , j ) thus 1 1
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
will be a b #b 1 2 1. 1!a a 1 2 The same for the element (i , j ) will be 2 2 a b #b 2 1 2. 1!a a 1 2 Note that both the maps need not be contractive. Moreover, the eventual contractivity associated with the element x 1 is a a which should be less than one. 1 2 r Case 3(b): m"3. Here we have three elements viz. (i , j ), (i , j ) and (i , j ). These three elements are making 1 1 2 2 3 3 a complete loop in the sequence. The sequence of forward and backward maps is as follows: 1 2 3 (i , j )$f& (i , j )$f& (i , j )$f& (i , j ). 1 1 2 2 3 3 1 1 Taking the starting values of three elements as x, y and z and proceeding as case 3(a) we have the following results. The "xed point of (i , j ) will be 1 1 a (a b #b )#b 1 2 3 2 1. 1!a a a 1 2 3 The element (i , j ) will converge to 2 2 a (a b #b )#b 2 3 1 3 2. 1!a a a 1 2 3 The "xed point of (i , j ) will be 3 3 a (a b #b )#b 3 1 2 1 3. 1!a a a 1 2 3 Here also the maps need not be contractive in the strict sense. The eventual contractivity will be a a a in this 1 2 3 case. Case 3(c): General m. Here we have m elements which are making a complete loop of sequence. It is clear from cases 3(a) and 3(b) that all the elements of this sequence will have a "xed point after a large but "nite number of iteration. Also the a$ne maps which are used, need not be contractive. In particular, in this case the element (i , j ) will converge to 1 1 a (a (2(a (a b #b )#b )#2)#b )#b 1 2 m~2 m~1 m m~1 m~2 2 1. 1!a a 2a 1 2 m Also the eventual contractivity for the element is s 1"<m a . r i/1 i Case 4: m'0 and 0(k(m. Here (i ,j ) m`1 m`1 " (i , j ), where k"2 or 3 or 2 or m!2. Without loss k k of generality say, 1(k"m (m!1. This case can be 0 viewed as mixture of two cases. Taking (i 0, j 0) as m m the starting element, a complete loop of sequence can be formed with rest of the elements. Thus, one can "nd the
865
"xed point of this element as it is nothing but case 3. Once the element (i 0, j 0) is "xed then the "xed point of m m the original starting element (i , j ) can be found out by 1 1 using case 2. Like all the previous cases the eventual contractivity, in this case, will be s 1"<m a . The above r i/1 i stated four cases provide the "xed point of the PIFS codes F. Thus for a very large positive number N, we have FN(o)Pa3S. (7) 6 6 Note that for each element there will be a sequence of form (5). This sequence will follow any one of the abovementioned four cases. Thus for each element there will be a sequence of forward maps. The contractivity factor associated with this element will be the product of all the scaled parameters (a ) of the forward maps. i The next task is to de"ne, mathematically, the eventual contractivity of the PIFS codes. In this context we are stating the following theorem. Theorem 1. Let F be the PIFS codes and S be the set of all possible images. For every x and y, xOy3 S & N'0 6 6 6 6 and 0)s(1 such that DFp (x)!Fq(y)D)sDx!yD; ∀p, q'N. 6 6 6 6 Proof. Using Eq. (7) we have for a very small positive number e'0, & a large positive number N such that 1 e p'N NDFp(x)!aD ( , ∀x 3 S. 1 2 6 6 6 Also, & another large positive number N such that 2 e q'N N DFq (y)!aD ( , ∀y3S. 2 2 6 6 6 Thus for xOy3S, 6 6 DFp(x)!Fq(y)D"DFp(x)!a#a!Fq(y)D 6 6 6 6 6 6 )DFp(x)!aD#DFq(y)!aD 6 6 6 6 (e; where, p, q'N"Max(N , N ). 1 2 Thus, for xOy, & a large number N and 0)s(1 such 6 6 that p, q'NNDFp(x)!Fq(y)D)sDx!yD. h 6 6 6 6 Once the eventual contractivity of the PIFS codes F has been proved the last task is to show that the given image and the "xed point of F are very close to each other. Theorem 2. Let a be the xxed point of the PIFS codes F and x be the 6given image. Also let `oa be the given distortion6 measure. Under this setup, if o(x, F(x)))a 6 6 then a o(x, a)) . 1!s 6 6 .!9
866
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
Where s "MaxMs , s ,2, s 2N, s being the eventual .!9 1 2 w i contractivity of the ith element of x. 6 Proof. Let x"(x , x ,2, x 2)@, 1 2 w 6 F(x)"( xY , xY ,2, xY 2)@ 1 2 w 6 F2(x)"( xYY , xYY ,2, xYY 2)@ 1 2 w 6 Y Y F3(x)"( xY , xY ,2, xYY 2)@ 1 2 w 6 In the PIFS scheme, the given image is partitioned into range blocks R of sizes b]b. So, there are n"(w/b)2 i range blocks each having b2 pixel values. Pixel values are nothing but the elements of x. Also the distortion measure `oa is Root Mean Square Error (RMSE). `oa, de"ned on S, is as follows:
S
o(u, v)" 6 6
1 2 +w b(u , v ) ∀ u, v3S, i i w2 i/1 6 6
where, b(u , v )"Du !v D2. i i i i Now,
S
o(x, F(x))" 6 6
1 2 +w b (x , f (x )) i i j w2 i/1
[x is being mapped from x with forward map f .] i j i
S S S
"
)
"
1 2 +w b (x , x( ) i i w2 i/1 1 2 +w Max Dx !x( D2 i i i w2 i/1 1 2 +w (a)2 where, w2 i/1
a"Max M Dx !x( D. i| 1, 2,2, w2N i i )a. Again,
S
o(x, F(x))" 6 6
1 1 2 +n +b b (x , f (x )), j@i j@i k@l n i/1 b2 j/1
[as w2"nb2], where, x is the jth pixel value of ith range block and j@i f is the forward map associated with x which is being j@i j@i mapped from x , the kth element of lth range block. k@l Thus,
S
1 1 2 +n +b b(x , f (x )) n i/1 b2 j/1 j@i j@i k@l
S
"
1 1 2 +n +b b (x , xY ). j@i j@i n i/1 b2 j/1
It implies that
S
1 1 2 +n +b b (x , xY ) )a. j@i j@i n i/1 b2 j/1
(8)
Again,
S
o(F(x), F2(x))" 6 6
1 1 2 +n +b b ( f (x ), f ( xY )). j@i k@l j@i k@l n i/1 b2 j/1
Note that the size of the range block and the contracted domain block [from where this range block is being mapped] is same (Section 3). Moreover, the number of range blocks and the number of matched contracted domain blocks is same as there is only one matched contracted domain block for each range block. Also for each element there is an eventual contractivity factor and s is the maximum of these factors. .!9 So, o(F(x), F2(x))) s .!9 6 6
S
1 1 2 +n +b b (x , xY ) k@l k@l n l/1 b2 k/1
)s a [by Eq. (8)]. .!9 Similarly, o(F2(x), tF3(x)) 6 6 1 1 2 " +n +b b ( f ( xY ), f ( xYY )) j@i k@l j@i k@l n i/1 b2 j/1
S
S S
)s .!9
1 1 2 +n +b b ( xY , xYY ) k@l k@l n l/1 b2 k/1
"s .!9
1 1 2 +n +b b (f (x ), f ( xY )) k@l p@q k@l p@q n l/1 b2 k/1
[where x is being mapped from x with forward k@l p@q map f ] k@l
S
)s2 .!9
1 1 2 +n +b b (x , x Y ) p@q p@q n q/1 b2 p/1
)s2 a [by Eq. (8)]. .!9 So, "nally we have o(x, a) 6 6 "o(x, FN(o)) , ∀ o3S and for a large N [by Eq. (7)] 6 6 6 "o(x, FN(x)) 6 6 )o(x, F(x))#o(F(x), F2(x)) 6 6 6 6 #o(F2(x), F3(x))#2 6 6 "a#s a#s2 a#2 .!9 .!9 "a(1#s #s2 #2) .!9 .!9 a " . h 1!s .!9
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
867
In the next section we have presented the experimental results in support of the mathematical formulation of PIFS.
6. Experimental results In the context of PIFS, a technique for fractal image compression using Genetic Algorithm (GA) has been proposed by Mitra et al. [10,11]. Using this technique, the PIFS codes for 256]256, 8 bits/pixel `Lenaa image and `LFAa(Low Flying Aircraft) image are found. Both the images have been reconstructed iteratively starting from di!erent images. In particular, we have used `Blanka image (having all the gray-values zero), `Seagulla image, `Lenaa image and `LFAa image as starting images. At the time of reconstruction, RMSE (distortion measure) between two successive iterations has been computed. In all the cases, computed values of RMSE are gradually decreasing with the increasing iteration number. It has been found that the PIFS codes almost achieved a "xed point after 10 iterations in both the images. Finally, the distance (RMSE), as expected, between the given image and the attractor is found to be very small. The distances are found to be on an average 7.75 and 11.44 for `Lenaa image and `LFAa image, respectively. The computed values of RMSE have been presented in tabulated form in Tables 1 and 2. The notation A used i,j in the tables denotes the RMSE value between ith and jth iterations. Thus we have A "o(Fi(o), Fj(o)), i,j 6 6
Fig. 5. Original `Lenaa image.
where F is the PIFS codes used and o is the starting image. As we have stopped the process of6 iterations after 10 iterations, the value of A provides the distance 0,10 between the attractor and the given image. The value of A will provide an approximate value of a if the starting 01 image is the given image itself. The approximate values of a are found to be 7.18 and 10.67 for `Lenaa and `LFAa images, respectively. To judge the validity of the PIFS technique, the original and reconstructed images have also been checked visually. Figs. 5}7 show the original images of `Lenaa,
Table 1 RMSE values between successive iterations using PIFS code of `Lenaa image Starting image
A 0,10
A 0,1
A 1,2
A 2,3
A 3,4
A 4,5
A 5,6
A 6,7
A 7,8
A 8,9
A 9,10
Lena Blank Seagull LFA
7.73 7.86 7.71 7.74
7.18 X 89.24 66.47
2.91 43.93 60.40 41.59
1.12 32.05 39.31 23.07
0.45 19.72 17.16 11.26
0.24 11.30 8.89 4.97
0.15 6.25 3.91 2.13
0.10 3.48 1.62 0.97
0.07 1.97 0.87 0.48
0.04 1.13 0.41 0.27
0.03 0.66 0.26 0.17
Table 2 RMSE values between successive iterations using PIFS code of `LFAa image Starting image
A 0,10
A 0,1
A 1,2
A 2,3
A 3,4
A 4,5
A 5,6
A 6,7
A 7,8
A 8,9
A 9,10
LFA Blank Seagull Lena
11.42 11.48 11.41 11.43
10.67 X 141.92 64.95
4.32 53.78 99.69 45.50
1.56 36.51 44.12 27.23
0.63 21.87 21.14 13.38
0.34 12.70 9.91 5.66
0.23 7.39 4.11 2.36
0.16 4.28 2.14 1.09
0.12 2.46 1.12 0.60
0.09 1.42 0.78 0.38
0.06 0.84 0.54 0.26
868
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
Fig. 8. Decode `Lenaa image.
Fig. 6. Original `LFAa image.
Fig. 9. Decoded `LFAa image. Fig. 7. `Seagulla image.
7. Conclusions `LFAa and `Seagulla, respectively. Figs. 8 and 9 show the reconstructed images of `Lenaa and `LFAa, respectively. In both the cases the starting image is the `Seagulla image. Both the reconstructed images are found to be of good quality. Also the compression ratios are found to be 10.5 and 5.5 for `Lenaa and `LFAa images, respectively, using GA-based PIFS technique [10]. Note that the compression ratio depends upon several factors like image size, range block size, bits per pixel and the gray-level variation present in the given image. The next section concludes the present article.
The present article provides an elaborate and direct proof for the existence of the attractor and the closeness of the attractor to the given image in the partitioned IFS scheme. The upper bound of the di!erence, between the original image and the attractor evolved through its PIFS code, is almost same as that in the IFS set up (Collage theorem [1]). The only distinction between these two schemes is the contractivity factor which is eventually contractive in the case of PIFS whereas it is strictly contractive for IFS.
S.K. Mitra, C.A. Murthy / Pattern Recognition 33 (2000) 859}869
In PIFS technique the estimates of all the range blocks are obtained by "nding the self-similarities present in the given image. The domain block which is most similar to a range block is named as appropriately matched domain block for that range block. The similarity between the range block and its estimate is measured by RMSE. Thus the e$ciency of PIFS technique depends on two factors. The "rst one is the e$ciency of the distortion measure. The second one is the extent of similarity present in the given image after choosing an `appropriatea distortion measure. RMSE, being a global measure has its own limitations [13]. In this context a better and reliable distortion measure can probably make the PIFS technique more e$cient. Regarding the second factor, it may so happen that there is hardly any domain block which is appropriately matched with the concerned range block. In other words, the domain block closest to a range block in the sense of similarity, may provide a quantitatively large distortion. This may lead to ine$cient coding. In such a case, many authors proposed subdivision of the concerned range block [7,9] and coding of those divided blocks. The continuation of this process [9] will ultimately lead to a coding with less distortion, though sometimes it may lead to a coding with a very small compression ratio. Thus a proper choice of the size of the range block is important for obtaining good compression ratios. The theory of PIFS technique is also applicable to onedimensional signals. In this context the technique has already been applied to code the fractal curves [6] and EEG signals [14]. But either case, computational time of the PIFS scheme for two-dimensional signal (gray-level image) and one-dimensional signal (curve) is quite large. So, many attempts have been made to reduce the computational time [10,15].
Acknowledgements Dr. Murthy acknowledges the center for Multivariate Analysis, Pennsylvania State University, University Park, P.A. 16802, U.S.A. for the academic and "nancial assistance received in carrying out this work.
869
References [1] M.F. Barnsley, Fractals Everywhere, Academic Press, New York, 1988. [2] M.F. Barnsley, L.P. Hurd, Fractal Image Compression, AK Press, Massachusetts, 1993. [3] J. Feder, Fractals, Plenum Press, New York, 1988. [4] G.A. Edger, Measure, Topology, and Fractal Geometry, Springer, New York, 1990. [5] K. Falconer, Fractals Geometry Mathematical Foundations and Applications, Wiley, New York, 1990. [6] A.E. Jacquin, Fractal theory of iterated Markov operators with applications to digital image coding, Ph.D. thesis, Georgia Institute of Technology, August 1989. [7] A.E. Jacquin, Image coding based on a fractal theory of iterated contractive image transformations, IEEE Trans. Image Process. 1 (1) (1992) 18}30. [8] A.E. Jacquin, Fractal image coding: a review, Proc. IEEE 81 (10) (1993) 1451}1465. [9] Y. Fisher, E.W. Jacbos, R.D. Boss, Fractal image compression using iterated transforms, in: J.A. Storer (Ed.), Image and Text Compression, Kluwer Academic Publishers, Dordrecht, 1992, pp. 35}61. [10] S.K. Mitra, C.A. Murthy, M.K. Kundu, Technique for fractal image compression using genetic algorithm, IEEE Trans Image Process 7 (4) (1998) 586}593. [11] S.K. Mitra, C.A. Murthy, M.K. Kundu, Fractal based image coding using genetic algorithm, in: P.P. Das, B.N. Chatterji (Eds.), Pattern Recognition, Image Processing and Computer Vision. Recent Advances, Narosa Publishing House, New Delhi, 1995, pp. 86}91. [12] L. Thomas, F. Deravi, Region-based fractal image compression using heuristic search, IEEE Trans. Image Process. 4 (6) (1995) 832}838. [13] S. Daly, The visual di!erence predictor: an algorithm for the assessment of image "delity. SPIE conference on Human Vision, Visual Processing and Digital Display III, San Jose, CA, 1992, pp. 2}15. [14] S.K. Mitra, S.N. Sarbadhikari, Iterative function system and genetic algorithm based eeg compression, Med. Eng. Phy. 19 (7) (1997) 605}617. [15] C.J. Wein, I.F. Blake, On the performance of fractal compression with clustering, IEEE Trans. Image Process. 5 (3) (1996) 522}526.
About the Author*SUMAN K. MITRA was born in Howrah, India in 1968. He received his B.Sc. and M.Sc. Degrees in Statistics from the University of Calcutta, India. He is currently a Senior Research Fellow in the Machine Intelligence Unit of Indian Statistical Institute, Calcutta. His research interests include Image Processing, Fractals, Pattern Recognition and Genetic Algorithms. About the Author*C.A. MURTHY was born in Ongle, India in 1958. He received his M. Stat and Ph.D. Degrees from the Indian Statistical Institute, Calcutta. He is currently an Associate Professor in the Machine Intelligence Unit of the Indian Statistical Institute. He visited The Michigan State University, East Lansing, in 1991}1992, for six months. He also visited the Pennsylvania State University, University Park, in 1996}1997. His "elds of interest include Pattern Recognition, Image Processing, Fuzzy Sets, Neural Networks, Fractals and Genetic Algorithms.
Pattern Recognition 33 (2000) 871}873
Using geometry towards stereo dense matching Boubakeur S. Boufama* School of Computer Science, University of Windsor, Windsor, Ontario, Canada N9B 3PA
1. Identi5cation and dense matching of planar regions Stereo matching is the problem of "nding matching points in two images of the same scene. Automatic stereo matching is a central problem in stereovision and is one of the most di$cult themes of computer vision. Because stereo matching is inherently complicated and noise sensitive, classical approaches were either limited to a sparse matching [1] or made additional assumptions. Sparse matching methods consist of matching key points such as corners. They are usually based on correlation techniques [2] and work relatively well because key points have an information-rich surrounding. Dense matching methods are very costly in CPU-time and work only on texture-rich images with small displacements. Other dense matching methods assume that planes are already identi"ed or that the scense is planar [3]. Because most of our environment is man made, and therefore is abundant in planes, we focused on dense matching of planes in the scene. Identifying and matching all points that belong to various planes in the scene is a "rst and major step towards a full dense matching in stereo images. Our method uses two uncalibrated images as inputs, identi"es various planes in the scene and then performs a dense-matching of the points belonging to the planes. 1.1. Region identixcation We worked on both the grey level and edge images. The edge pixels are linked together to form a set of discrete edges. When a region is de"ned by several edges, the latter are merged into a single edge. The identi"cation alorithm consists of two steps: 1. For each edge forming a region in the left image, select three noncollinear points on that edge.
2. Find their corresponding points in the right image. This is done by the use of correlation and epipolar geometry (calculated using sparse matching of extracted corners [1]). 1.2. Finding the plane homography We do not assume that we are given four coplanar points, nor do we assume that the identi"ed region in the left image is matched with a region in the right image. Thanks to the former step, three points belonging to a given edge in the left image, call them p , p and p , 1 2 3 have been matched with their corresponding points in the right image, call them p@ , p@ and p@ . These image 1 2 3 points are the projections of three space points, call them P , P and P . Our goal is to "nd the homography 1 2 3 between the left and the right image of the plane ) 123 de"ned by P , P and P . 1 2 3 Consider a 3D point P and let the virtual point Q be the intersection of the plane ) with the line SOPT 123 (Fig. 1). The projection of Q on the right image, q@, is given by Hp (p is the projection of Q on the left image and H is the plane homography linking the left image of ) to its right image). 123 For each couple (p , p@ ), the relation Hp Kp@ (K i i i i stands for equality up to a scale factor) yields two independent linear equations in the nine unknown coe$cients of H (only eight of them are independent). Thus, P , P and P provide for six linear equations that can 1 2 3 be used to constrain and simplify H. By using two particular coordinate systems in the two images such that: p "(0, 0, 1)T, p@ "(0, 0, 1)T, p "(1, 0, 0)T, p@ "(1, 0, 0)T, 1 1 2 2 p "(0, 1, 0)T, p@ "(0, 1, 0)T, p "(1, 1, 1)T, and p@ " 3 3 4 4 (1, 1, 1)T, H simpli"es to
A B a 0
H" 0 b 0 . 0 0
* Corresponding author. Tel.: #1-902-566-0522; fax: #1902-566-0420. E-mail address:
[email protected] (B.S. Boufama)
0
(1)
c
Note that (p , p@ ) was not used to simplify H since the 4 4 space point P is not assumed to be on ) . 4 123
0031-3203/00/$20.00 ( 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 9 9 ) 0 0 1 5 0 - 8
872
B.S. Boufama / Pattern Recognition 33 (2000) 871}873
for the four independent unknowns of Eq. (4). A linearization of the above equation can be done by adding one extra parameter (for more details on this linearization the reader can consult [4] where we have used the parallax idea to calculate the epipolar geometry).
1.3. The coplanarity test and dense matching
Fig. 1. Three points located on an edge de"ne a plane.
The above matrix has three parameters, but only two of them are independent. Furthermore, none of these three parameters can be zero for a nonsingular homography. In the following we set c equal to 1. Let (x, y, t)T and (x@, y@, t@)T be the known homogeneous coordinates of p and p@, respectively. The coordinates of q@ (the projection of the virtual point Q in the right image) are given by HpKq@K(ax, by, t).
(2)
Let (e@ , e@ , e@ )T be the unknown homogeneous coordinates x y t of the epipole e@ in the right image. It is clear from Fig. 1 that in the right image q@ belongs to the line (e@p@). This can be written as: (e@]p@) ) q@"0
Now that we have calculated the plane homography H, we can check, for each region, whether the edge containing the points p , p and p , is planar or not. For each point 1 2 3 p belonging to the edge in the left image, its match point p@ in the right image is given by Hp. Because this is only true when the edge's points are coplanar, we need to check if Hp is located on (close enough to) an edge in the right image. Once the edge surrounding a region is found to be planar, the dense matching is carried out by a projective mapping of that region.
2. Experimental results Two pairs of images have been used here for testing the method (see Fig. 2). The identi"ed and matched planes are shown on Fig. 3. These planes were matched by mapping their points, using the appropriate calculated homography for each plane, from the left image to the right image.
(3)
where ] is the cross product and ) the scalar product. By expanding Eq. (3) and using the coordinates of q@ given in Eq. (2), we obtain the following equation (e@ y@!t@e@ )ax#(t@e@ !e@ x@)by#(e@ x@!y@e@ )t"0. (4) t y x t y x Eq. (4) has "ve unknowns; a, b, e@ , e@ and e@ ; only four of x y t them are independent. So, in addition to the three couples of matched points used to simplify H, at least four couples of matched points in the two images are necessary to solve
Fig. 3. Results of the dense matching of planes: "rst pair (left) and second pair (right).
Fig. 2. The "rst pair of images (left) and the second pair of images (right).
B.S. Boufama / Pattern Recognition 33 (2000) 871}873
References [1] Z. Zhang, R. Deriche, O.D. Faugeras, Q.T. Luong, A robust technique for matching two uncalibrated images through the recovery of the unknown epipolar geometry, Artif. Intell. 78 (1}2) (1994) 87}119. [2] P. Aschwanden, W. GuggenbuK hl, Experimental results from a comparative study on correlation type registration algorithms, in: FoK rstner, Ruwiedel (Eds.), Robust Computer Vision, Wichmann Verlag, Heidelberg, Germany, 1992, pp. 268}282.
873
[3] P. Meer, S. Ramakrishna, R. Lenz, Correspondence of coplanar features through p2-invariant representations, in: Proceedings of the 12th International Conference on Pattern Recognition, Jerusalem, Israel 1994, pp. A196}202. [4] B. Boufama, R. Mohr, Epipole and fundamental matrix estimation using the virtual parallax property, in: Proceedings of the 5th International Conference on Computer Vision, Cambridge, Massachusetts, USA June 1995, pp. 1030}1036.